LargeScale Machine Learning for Classification and Search

Title:

LargeScale Machine Learning for Classification and Search

Author(s):

Liu, Wei

Date:

2012

Type:

Dissertations

Department:

Electrical Engineering

Permanent URL:

http://hdl.handle.net/10022/AC:P:14976

Notes:

Ph.D., Columbia University.

Abstract:

With the rapid development of the Internet, nowadays tremendous amounts of data including images and videos, up to millions or billions, can be collected for training machine learning models. Inspired by this trend, this thesis is dedicated to developing largescale machine learning techniques for the purpose of making classification and nearest neighbor search practical on gigantic databases. Our first approach is to explore data graphs to aid classification and nearest neighbor search. A graph offers an attractive way of representing data and discovering the essential information such as the neighborhood structure. However, both of the graph construction process and graphbased learning techniques become computationally prohibitive at a large scale. To this end, we present an efficient large graph construction approach and subsequently apply it to develop scalable semisupervised learning and unsupervised hashing algorithms. Our unique contributions on the graphrelated topics include: 1. Large Graph Construction: Conventional neighborhood graphs such as kNN graphs require a quadratic time complexity, which is inadequate for largescale applications mentioned above. To overcome this bottleneck, we present a novel graph construction approach, called Anchor Graphs, which enjoys linear space and time complexities and can thus be constructed over gigantic databases efficiently. The central idea of the Anchor Graph is introducing a few anchor points and converting intensive datatodata affinity computation to drastically reduced datatoanchor affinity computation. A lowrank datatodata affinity matrix is derived using the datatoanchor affinity matrix. We also theoretically prove that the Anchor Graph lends itself to an intuitive probabilistic interpretation by showing that each entry of the derived affinity matrix can be considered as a transition probability between two data points through Markov random walks. 2. LargeScale SemiSupervised Learning: We employ Anchor Graphs to develop a scalable solution for semisupervised learning, which capitalizes on both labeled and unlabeled data to learn graphbased classification models. We propose several key methods to build scalable semisupervised kernel machines such that realworld linearly inseparable data can be tackled. The proposed techniques take advantage of the Anchor Graph from a kernel point of view, generating a set of lowrank kernels which are made to encompass the neighborhood structure unveiled by the Anchor Graph. By linearizing these lowrank kernels, training nonlinear kernel machines in semisupervised settings can be simplified to training linear SVMs in supervised settings, so the computational cost for classifier training is substantially reduced. We accomplish excellent classification performance by applying the proposed semisupervised kernel machine  a linear SVM with a linearized Anchor Graph warped kernel. 3. Unsupervised Hashing: To achieve fast pointtopoint search, compact hashing with short hash codes has been suggested, but how to learn codes such that good search performance is achieved remains a challenge. Moreover, in many cases realworld data sets are assumed to live on manifolds, which should be taken into account in order to capture meaningful nearest neighbors. To this end, we present a novel unsupervised hashing approach based on the Anchor Graph which captures the underlying manifold structure. The Anchor Graph Hashing approach allows constant time hashing of a new data point by extrapolating graph Laplacian eigenvectors to eigenfunctions. Furthermore, a hierarchical threshold learning procedure is devised to produce multiple hash bits for each eigenfunction, thus leading to higher search accuracy. As a result, Anchor Graph Hashing exhibits good search performance in finding semantically similar neighbors. To address other practical application scenarios, we further develop advanced hashing techniques that incorporate supervised information or leverage unique formulations to cope with new forms of queries such as hyperplanes. 4. Supervised Hashing: Recent research has shown that the hashing quality could be boosted by leveraging supervised information into hash function learning. However, the existing methods either lack adequate performance or often incur cumbersome model training. To this end, we present a novel kernelbased supervised hashing model which requires a limited amount of supervised information in the form of similar and dissimilar data pairs, and is able to achieve high hashing quality at a practically feasible training cost. The idea is to map the data to compact binary codes whose Hamming distances are simultaneously minimized on similar pairs and maximized on dissimilar pairs. Our approach is distinct from prior work in utilizing the equivalence between optimizing the code inner products and the Hamming distances. This enables us to sequentially and efficiently train the hash functions one bit at a time, yielding very short yet discriminative codes. The presented supervised hashing approach is general, allowing search of both semantically similar neighbors and metric distance neighbors. 5. Hyperplane Hashing: Hyperplane hashing aims at rapidly searching the database points near a given hyperplane query, and has shown practical impact on largescale active learning with SVMs. The existing hyperplane hashing methods are randomized and need long hash codes to achieve reasonable search accuracy, thus resulting in reduced search speed and large memory overhead. We present a novel hyperplane hashing technique which yields high search accuracy with compact hash codes. The key idea is a novel bilinear form used in designing the hash functions, leading to a higher collision probability than all of the existing hyperplane hash functions when using random projections. To further increase the performance, we develop a learning based framework in which the bilinear functions are directly learned from the input data. This results in compact yet discriminative codes, as demonstrated by the superior search performance over all random projection based solutions. We divide the thesis into two parts: scalable classification with graphs (topics 1 and 2 mentioned above), and nearest neighbor search with hashing (topics 3, 4 and 5 mentioned above). The two parts are connected in the sense that the idea of Anchor Graphs in Part I enables not only scalable classification but also unsupervised hashing, and hyperplane hashing in Part II can directly benefit classification under an active learning framework. All of the machine learning techniques developed in this thesis emphasize and pursue excellent performance in both speed and accuracy, which are verified through extensive experiments carried out on various largescale tasks of classification and search. The addressed problems, classification and nearest neighbor search, are fundamental for many realworld applications. Therefore, we expect that the proposed solutions based on graphs and hashing will have a tremendous impact on a great number of realistic largescale applications.

Subject(s):

Computer science
Information technology
Computer engineering
 Item views:

570
 Metadata:

text  xml