Large-Scale Machine Learning for Classification and Search
Large-Scale Machine Learning for Classification and Search
- Thesis Advisor(s):
- Persistent URL:
Ph.D., Columbia University.
With the rapid development of the Internet, nowadays tremendous amounts of data including images and videos, up to millions or billions, can be collected for training machine learning models. Inspired by this trend, this thesis is dedicated to developing large-scale machine learning techniques for the purpose of making classification and nearest neighbor search practical on gigantic databases. Our first approach is to explore data graphs to aid classification and nearest neighbor search. A graph offers an attractive way of representing data and discovering the essential information such as the neighborhood structure. However, both of the graph construction process and graph-based learning techniques become computationally prohibitive at a large scale. To this end, we present an efficient large graph construction approach and subsequently apply it to develop scalable semi-supervised learning and unsupervised hashing algorithms. Our unique contributions on the graph-related topics include: 1. Large Graph Construction: Conventional neighborhood graphs such as kNN graphs require a quadratic time complexity, which is inadequate for large-scale applications mentioned above. To overcome this bottleneck, we present a novel graph construction approach, called Anchor Graphs, which enjoys linear space and time complexities and can thus be constructed over gigantic databases efficiently. The central idea of the Anchor Graph is introducing a few anchor points and converting intensive data-to-data affinity computation to drastically reduced data-to-anchor affinity computation. A low-rank data-to-data affinity matrix is derived using the data-to-anchor affinity matrix. We also theoretically prove that the Anchor Graph lends itself to an intuitive probabilistic interpretation by showing that each entry of the derived affinity matrix can be considered as a transition probability between two data points through Markov random walks. 2. Large-Scale Semi-Supervised Learning: We employ Anchor Graphs to develop a scalable solution for semi-supervised learning, which capitalizes on both labeled and unlabeled data to learn graph-based classification models. We propose several key methods to build scalable semi-supervised kernel machines such that real-world linearly inseparable data can be tackled. The proposed techniques take advantage of the Anchor Graph from a kernel point of view, generating a set of low-rank kernels which are made to encompass the neighborhood structure unveiled by the Anchor Graph. By linearizing these low-rank kernels, training nonlinear kernel machines in semi-supervised settings can be simplified to training linear SVMs in supervised settings, so the computational cost for classifier training is substantially reduced. We accomplish excellent classification performance by applying the proposed semi-supervised kernel machine - a linear SVM with a linearized Anchor Graph warped kernel. 3. Unsupervised Hashing: To achieve fast point-to-point search, compact hashing with short hash codes has been suggested, but how to learn codes such that good search performance is achieved remains a challenge. Moreover, in many cases real-world data sets are assumed to live on manifolds, which should be taken into account in order to capture meaningful nearest neighbors. To this end, we present a novel unsupervised hashing approach based on the Anchor Graph which captures the underlying manifold structure. The Anchor Graph Hashing approach allows constant time hashing of a new data point by extrapolating graph Laplacian eigenvectors to eigenfunctions. Furthermore, a hierarchical threshold learning procedure is devised to produce multiple hash bits for each eigenfunction, thus leading to higher search accuracy. As a result, Anchor Graph Hashing exhibits good search performance in finding semantically similar neighbors. To address other practical application scenarios, we further develop advanced hashing techniques that incorporate supervised information or leverage unique formulations to cope with new forms of queries such as hyperplanes. 4. Supervised Hashing: Recent research has shown that the hashing quality could be boosted by leveraging supervised information into hash function learning. However, the existing methods either lack adequate performance or often incur cumbersome model training. To this end, we present a novel kernel-based supervised hashing model which requires a limited amount of supervised information in the form of similar and dissimilar data pairs, and is able to achieve high hashing quality at a practically feasible training cost. The idea is to map the data to compact binary codes whose Hamming distances are simultaneously minimized on similar pairs and maximized on dissimilar pairs. Our approach is distinct from prior work in utilizing the equivalence between optimizing the code inner products and the Hamming distances. This enables us to sequentially and efficiently train the hash functions one bit at a time, yielding very short yet discriminative codes. The presented supervised hashing approach is general, allowing search of both semantically similar neighbors and metric distance neighbors. 5. Hyperplane Hashing: Hyperplane hashing aims at rapidly searching the database points near a given hyperplane query, and has shown practical impact on large-scale active learning with SVMs. The existing hyperplane hashing methods are randomized and need long hash codes to achieve reasonable search accuracy, thus resulting in reduced search speed and large memory overhead. We present a novel hyperplane hashing technique which yields high search accuracy with compact hash codes. The key idea is a novel bilinear form used in designing the hash functions, leading to a higher collision probability than all of the existing hyperplane hash functions when using random projections. To further increase the performance, we develop a learning based framework in which the bilinear functions are directly learned from the input data. This results in compact yet discriminative codes, as demonstrated by the superior search performance over all random projection based solutions. We divide the thesis into two parts: scalable classification with graphs (topics 1 and 2 mentioned above), and nearest neighbor search with hashing (topics 3, 4 and 5 mentioned above). The two parts are connected in the sense that the idea of Anchor Graphs in Part I enables not only scalable classification but also unsupervised hashing, and hyperplane hashing in Part II can directly benefit classification under an active learning framework. All of the machine learning techniques developed in this thesis emphasize and pursue excellent performance in both speed and accuracy, which are verified through extensive experiments carried out on various large-scale tasks of classification and search. The addressed problems, classification and nearest neighbor search, are fundamental for many real-world applications. Therefore, we expect that the proposed solutions based on graphs and hashing will have a tremendous impact on a great number of realistic large-scale applications.
- Item views
text | xml
- Suggested Citation:
Large-Scale Machine Learning for Classification and Search, Columbia University Academic Commons,