Academic Commons Search Results
https://academiccommons.columbia.edu/catalog?action=index&controller=catalog&f%5Bdepartment_facet%5D%5B%5D=Electrical+Engineering&f%5Bsubject_facet%5D%5B%5D=Computer+science&format=rss&fq%5B%5D=has_model_ssim%3A%22info%3Afedora%2Fldpd%3AContentAggregator%22&q=&rows=500&sort=record_creation_date+desc
Academic Commons Search Resultsen-usMultiple-Symbol Decision-Feedback Space-Time Differential Decoding in Fading Channels
https://academiccommons.columbia.edu/catalog/ac:194767
Liu, Yan; Wang, Xiaodong10.7916/D84M931KWed, 28 Jun 2017 20:57:51 +0000Space-time differential coding (STDC) is an effective technique for exploiting transmitter diversity while it does not require the channel state information at the receiver. However, like conventional differential modulation schemes, it exhibits an error floor in fading channels. In this paper, we develop an STDC decoding technique based on multiple-symbol detection and decision-feedback, which makes use of the second-order statistic of the fading processes and has a very low computational complexity. This decoding method can significantly lower the error floor of the conventional STDC decoding algorithm, especially in fast fading channels. The application of the proposed multiple-symbol decision-feedback STDC decoding technique in orthogonal frequency-division multiplexing (OFDM) system is also discussed.Space time codes, Wireless communication systems--Technological innovations, Orthogonal frequency division multiplexing, Communication of technical information, Electrical engineering, Computer sciencexw2008Electrical EngineeringArticlesPerformance Comparisons of MIMO Techniques with Application to WCDMA Systems
https://academiccommons.columbia.edu/catalog/ac:184444
Li, Chuxiang; Wang, Xiaodong10.7916/D8F769Z1Wed, 28 Jun 2017 20:09:25 +0000Multiple-input multiple-output (MIMO) communication techniques have received great attention and gained significant development in recent years. In this paper, we analyze and compare the performances of different MIMO techniques. In particular, we compare the performance of three MIMO methods, namely, BLAST, STBC, and linear precoding/decoding. We provide both an analytical performance analysis in terms of the average receiver and simulation results in terms of the BER. Moreover, the applications of MIMO techniques in WCDMA systems are also considered in this study. Specifically, a subspace tracking algorithm and a quantized feedback scheme are introduced into the system to simplify implementation of the beamforming scheme. It is seen that the BLAST scheme can achieve the best performance in the high data rate transmission scenario; the beamforming scheme has better performance than the STBC strategies in the diversity transmission scenario; and the beamforming scheme can be effectively realized in WCDMA systems employing the subspace tracking and the quantized feedback approach.Electrical engineering, Computer sciencexw2008Electrical EngineeringArticlesEfficient Point-to-Subspace Query in ℓ1 with Application to Robust Face Recognition
https://academiccommons.columbia.edu/catalog/ac:153463
Sun, Ju; Zhang, Yuqian; Wright, John N.10.7916/D80291VRMon, 26 Jun 2017 20:41:28 +0000Motivated by vision tasks such as robust face and object recognition, we consider the following general problem: given a collection of low-dimensional linear subspaces in a high-dimensional ambient (image) space, and a query point (image), efficiently determine the nearest subspace to the query in ℓ1 distance. We show in theory this problem can be solved with a simple two-stage algorithm: (1) random Cauchy projection of query and subspaces into low-dimensional spaces followed by efficient distance evaluation (ℓ1 regression); (2) getting back to the high-dimensional space with very few candidates and performing exhaustive search. We present preliminary experiments on robust face recognition to corroborate our theory.Computer science, Artificial intelligencejs4038, yz2409, jw2966Electrical EngineeringArticlesDialect and Accent Recognition using Phonetic-Segmentation Supervectors
https://academiccommons.columbia.edu/catalog/ac:163778
Biadsy, Fadi; Hirschberg, Julia Bell; Ellis, Daniel P. W.10.7916/D8P84MCWMon, 19 Jun 2017 20:51:38 +0000We describe a new approach to automatic dialect and accent recognition which exceeds state-of-the-art performance in three recognition tasks. This approach improves the accuracy and substantially lower the time complexity of our earlier phonetic based kernel approach for dialect recognition. In contrast to state-of-the-art acoustic-based systems, our approach employs phone labels and segmentation to constrain the acoustic models. Given a speaker’s utterance, we first obtain phone hypotheses using a phone recognizer and then extract GMM-supervectors for each phone type, effectively summarizing the speaker’s phonetic characteristics in a single vector of phone-type supervectors. Using these vectors, we design a kernel function that computes the phonetic similarities between pairs of utterances to train SVM classifiers to identify dialects. Comparing this approach to the state-of-the-art, we obtain a 12.9% relative improvement in EER on Arabic dialects, and a 17.9% relative improvement for American vs. Indian English dialects. We also see a 53.5% relative improvement over a GMM-UBM on American Southern vs. Non-Southern English.Computer science, Information technology, Linguisticsfb2175, jbh2019, de171Computer Science, Electrical EngineeringPresentations (Communicative Events)On Communicating Computational Research
https://academiccommons.columbia.edu/catalog/ac:159304
Ellis, Daniel P. W.10.7916/D8R218QNMon, 19 Jun 2017 17:43:47 +0000Prof. Ellis's presentation focuses on the challenges, and the benefits, of sharing the results of computational research through various methods, including: traditional publications, public talks, interactive online demos, APIs and libraries, and code sharing. He particularly emphasizes the potential of code sharing in a world where commodity machines can make reproducibility increasingly affordable and attainable.Communication of technical information, Computer sciencede171Electrical EngineeringPresentations (Communicative Events)Noise Robust Pitch Tracking by Subband Autocorrelation Classification
https://academiccommons.columbia.edu/catalog/ac:159382
Ellis, Daniel P. W.; Lee, Byung Suk10.7916/D8RX9MDDMon, 19 Jun 2017 17:43:40 +0000A neural net classifier is trained to identify the pitch of a frame of subband autocorrelation principal components. Accuracy is greatly improved for noisy, bandlimited speech, matched to the training data.Audiology, Computer sciencede171, bl2012Electrical EngineeringPresentations (Communicative Events)Inharmonic Speech: A Tool for the Study of Speech Perception and Separation
https://academiccommons.columbia.edu/catalog/ac:159364
Ellis, Daniel P. W.; McDermott, Josh; Kawahara, Hideki10.7916/D8N87K45Mon, 19 Jun 2017 17:43:40 +0000This was the talk I gave for the paper I did with Josh McDermott and Hideki Kawahara on using the STRAIGHT analysis-synthesis framework to create "realistic" speech tokens where the voiced speech was composed of inharmonically-arranged components.Audiology, Computer sciencede171Electrical EngineeringPresentations (Communicative Events)Recognizing and Classifying Environmental Sounds
https://academiccommons.columbia.edu/catalog/ac:159322
Ellis, Daniel P. W.10.7916/D80G3TG6Mon, 19 Jun 2017 17:43:40 +0000Prof. Ellis presents a summary of LabROSA's new work, with a focus on recognizing environmental sounds, particularly for video classification by soundtrack.Audiology, Computer sciencede171Electrical EngineeringPresentations (Communicative Events)Handling Speech in the Wild
https://academiccommons.columbia.edu/catalog/ac:159334
Ellis, Daniel P. W.10.7916/D8VQ3B1RMon, 19 Jun 2017 17:43:40 +0000This is a broad overview of recent work related to processing speech embedded in noisy environments, delivered to Steve Colburn's group at Boston University.Audiology, Computer sciencede171Electrical EngineeringPresentations (Communicative Events)Music Information Retrieval for Jazz
https://academiccommons.columbia.edu/catalog/ac:159316
Ellis, Daniel P. W.10.7916/D8474K6XMon, 19 Jun 2017 17:43:39 +0000Despite its promising title, this talk was a first introduction aimed at jazz musicologists explaining what MIR techniques exist and how they might be useful for jazz, as a starting point for a recently-started project where we try applying them to jazz -- i.e., we don't know yet.Prayer, Computer sciencede171Electrical EngineeringPresentations (Communicative Events)Mining Audio
https://academiccommons.columbia.edu/catalog/ac:159388
Ellis, Daniel P. W.10.7916/D8HH6TFWMon, 19 Jun 2017 17:43:37 +0000We have a new program in "Data to Solutions", and this was my introduction to the way that "big data" problems appear in audio - looking first at managing large music collections, then discussing the issues of video classification and retrieval by soundtrack features.Computer sciencede171Electrical EngineeringPresentations (Communicative Events)Capacity Region and Degrees of Freedom of Bidirectional Networks
https://academiccommons.columbia.edu/catalog/ac:206224
Ashraphijuo, Mehdi10.7916/D88W3DTBThu, 15 Jun 2017 16:10:51 +0000The increasing complexity of communication networks in size and density provides us enormous opportunities to exploit interaction among multiple nodes, thus enabling higher rate of data streams. On the flip side, however, this complexity comes with challenges in managing interference that multiple source-destination pairs in the network may cause to each other. In this dissertation, we make progress on how to exploit the opportunities, as well as how to overcome the challenges for various communication networks.
In the first part, we focus on developing fundamental principles for communication network design, especially networks with multiple antenna transceivers, with an emphasis on (1) understanding the role of feedback and cooperation, and (2) developing interference management methods. In this part, we find that feedback and cooperation have promising roles in improving the capacity performance of several interference networks. We show that in stark contrast to the point-to-point case, a limited feedback can improve the capacity of interference-limited networks. In fact, the improvement can be unbounded. This result shows that feedback can have a potentially significant role to play in mitigating interference.
Then, in part two we study several bidirectional networks. We study the bidirectional diamond network and show that for deterministic and some Gaussian models the capacity is doubled for full-duplex channel in comparison with one-way networks. In addition, we study the degrees of freedom of two-way four-unicast MIMO networks, and provide upper and lower bounds that are tight in several cases. We also study the impact of caching in relay nodes for these models. We find a number of cases that bidirectional links can double the degrees of freedom with the help of relay caching and/or multiple relay antennas.Computer science, Engineering, Mathematicsma3189Electrical EngineeringThesesImage analytic tools for tissue characterization using optical coherence tomography
https://academiccommons.columbia.edu/catalog/ac:206897
Gan, Yu10.7916/D8VM4HXTThu, 15 Jun 2017 16:07:43 +0000Optical coherence tomography (OCT) has been emerging as a promising imaging technique, with a strong capability of non-invasive, in vivo, high resolution, depth-resolved imaging. There is a great potential to use OCT to guide the treatment of arrhythmias, to prevent preterm birth, and to detect breast cancer. To facilitate the clinical applications, this thesis presents three image analytic tools to characterize biological tissue: 1) automated fiber direction analysis; 2) automated volumetric stitching; 3) automated tissue classification. The fiber direction analysis consists of a particle-filter-based 3D tractography scheme and a pixel-wise fiber analysis scheme. The stitching algorithm enlarges the field of view of current OCT system from millimeter to centimeter level by volumetric stitching using scale-invariant feature transform. Based on relevance vector machine, a region-based classification scheme and a grid-based classification scheme are developed to automatically identify tissue composition in human cardiac tissue and human breast tissue. These tools are collaboratively used to study OCT images from cardiac, cervical, and breast tissue.
In cardiac tissue, we apply the fiber orientation analysis to reconstruct 3D cardiac myofibers tractography and perform pixel-wise fiber analysis on the collagen region within human heart. In addition, we apply the region-based algorithm to segment and classify tissue compositions, such as collagen, adipose tissue, fibrotic myocardium, and normal myocardium, over a single or a stitched OCT volume. Using our algorithm, we observe fiber directionality change over depths and find that the fiber orientation changes more dramatically in atria than in ventricle. We also observe different dispersion patterns within collagen layer.
In cervical tissue, our stitching algorithm enables a paramount 3D view of entire axial slices. Together with pixel-wise fiber orientation scheme, we analyze the difference of dispersion property within inner/outer regions of four quadrants. We observe two dispersion patterns in pregnant and non-pregnant cervical tissue at the location close to upper cervix. In addition, we discover that an increasing trend of dispersion and an increasing trend of penetration depth from internal orifice (os) to external os.
In breast tissue, we visualize various features in both benign and malignant tissues such as invasive ductal carcinoma (IDC), ductal carcinoma in situ, cyst, and terminal duct lobule unit in stitched OCT images. Focusing on the automated detection of IDC, we propose a hierarchy framework of classification model and apply our classifier in two OCT systems and achieve both reasonable sensitivity and specificity in identifying cancerous region.Imaging systems in medicine, Diagnostic imaging, Tissues--Imaging, Optical coherence tomography, Electrical engineering, Biomedical engineering, Computer scienceyg2327Electrical EngineeringThesesUnderstanding Music Semantics and User Behavior with Probabilistic Latent Variable Models
https://academiccommons.columbia.edu/catalog/ac:202383
Liang, Dawen10.7916/D8TH8MZPThu, 15 Jun 2017 16:07:25 +0000Bayesian probabilistic modeling provides a powerful framework for building flexible models to incorporate latent structures through likelihood model and prior. When we specify a model, we make certain assumptions about the underlying data-generating process with respect to these latent structures. For example, the latent Dirichlet allocation (LDA) model assumes that when generating a document, we first select a latent topic and then select a word that often appears in the selected topic. We can uncover the latent structures conditioned on the observed data via posterior inference. In this dissertation, we apply the tools of probabilistic latent variable models and try to understand complex real-world data about music semantics and user behavior.
We first look into the problem of automatic music tagging -- inferring the semantic tags (e.g., "jazz'', "piano'', "happy'', etc.) from the audio features. We treat music tagging as a matrix completion problem and apply the Poisson matrix factorization model jointly on the vector-quantized audio features and a "bag-of-tags'' representation. This approach exploits the shared latent structure between semantic tags and acoustic codewords. We present experimental results on the Million Song Dataset for both annotation and retrieval tasks, illustrating the steady improvement in performance as more data is used.
We then move to the intersection between music semantics and user behavior: music recommendation. The leading performance in music recommendation is achieved by collaborative filtering methods which exploit the similarity patterns in user's listening history. We address the fundamental cold-start problem of collaborative filtering: it cannot recommend new songs that no one has listened to. We train a neural network on semantic tagging information as a content model and use it as a prior in a collaborative filtering model. The proposed system is evaluated on the Million Song Dataset and shows comparably better result than the collaborative filtering approaches, in addition to the favorable performance in the cold-start case.
Finally, we focus on general recommender systems. We examine two different types of data: implicit and explicit feedback, and introduce the notion of user exposure (whether or not a user is exposed to an item) as part of the data-generating process, which is latent for implicit data and observed for explicit data. For implicit data, we propose a probabilistic matrix factorization model and infer the user exposure from data. In the language of causal analysis (Imbens and Rubin, 2015), user exposure has close connection to the assignment mechanism. We leverage this connection more directly for explicit data and develop a causal inference approach to recommender systems. We demonstrate that causal inference for recommender systems leads to improved generalization to new data.
Exact posterior inference is generally intractable for latent variables models. Throughout this thesis, we will design specific inference procedure to tractably analyze the large-scale data encountered under each scenario.Bayesian statistical decision theory--Industrial applications, Music and technology, Semantics, Information technology, Recommender systems (Information filtering), Bayesian statistical decision theory, Artificial intelligence, Computer sciencedl2771Electrical EngineeringThesesLearning on Graphs with Partially Absorbing Random Walks: Theory and Practice
https://academiccommons.columbia.edu/catalog/ac:208722
Wu, Xiaoming10.7916/D8JW8F0CThu, 15 Jun 2017 15:06:52 +0000Learning on graphs has been studied for decades with abundant models proposed, yet many of their behaviors and relations remain unclear. This thesis fills this gap by introducing a novel second-order Markov chain, called partially absorbing random walks (ParWalk). Different from ordinary random walk, ParWalk is absorbed at the current state $i$ with probability $p_i$, and follows a random edge out with probability $1-p_i$. The partial absorption results in absorption probability between any two vertices, which turns out to encompass various popular models including PageRank, hitting times, label propagation, and regularized Laplacian kernels. The unified treatment reveals the distinguishing characteristics of these models arising from different contexts, and allows comparing them and transferring findings from one paradigm to another.
The key for learning on graphs is capitalizing on the cluster structure of the underlying graph. The absorption probabilities of ParWalk, turn out to be highly effective in capturing the cluster structure. Given a query vertex $q$ in a cluster $\mathcal{S}$, we show that when the absorbing capacity ($p_i$) of each vertex on the graph is small, the probabilities of ParWalk to be absorbed at $q$ have small variations in region of high conductance (within clusters), but have large gaps in region of low conductance (between clusters). And the less absorbent the vertices of $\mathcal{S}$ are, the better the absorption probabilities can represent the local cluster $\mathcal{S}$. Our theory induces principles for designing reliable similarity measures and provides justification to a number of popular ones such as hitting times and the pseudo-inverse of graph Laplacian. Furthermore, it reveals their new important properties. For example, we are the first to show that hitting times is better in retrieving sparse clusters, while the pseudo-inverse of graph Laplacian is better for dense ones.
The theoretical insights instilled from ParWalk guide us in developing robust algorithms for various applications including local clustering, semi-supervised learning, and ranking. For local clustering, we propose a new method for salient object segmentation. By taking a noisy saliency map as the probability distribution of query vertices, we compute the absorption probabilities of ParWalk to the queries, producing a high-quality refined saliency map where the objects can be easily segmented. For semi-supervised learning, we propose a new algorithm for label propagation. The algorithm is justified by our theoretical analysis and guaranteed to be superior than many existing ones. For ranking, we design a new similarity measure using ParWalk, which combines the strengths of both hitting times and the pseudo-inverse of graph Laplacian. The hybrid similarity measure can well adapt to complex data of diverse density, thus performs superiorly overall. For all these learning tasks, our methods achieve substantial improvements over the state-of-the-art on extensive benchmark datasets.Markov processes, Graph theory, Computer sciencexw2223Electrical EngineeringThesesWhen Are Nonconvex Optimization Problems Not Scary?
https://academiccommons.columbia.edu/catalog/ac:199718
Sun, Ju10.7916/D8251J7HThu, 15 Jun 2017 15:04:10 +0000Nonconvex optimization is NP-hard, even the goal is to compute a local minimizer. In applied disciplines, however, nonconvex problems abound, and simple algorithms, such as gradient descent and alternating direction, are often surprisingly effective. The ability of simple algorithms to find high-quality solutions for practical nonconvex problems remains largely mysterious.
This thesis focuses on a class of nonconvex optimization problems which CAN be solved to global optimality with polynomial-time algorithms. This class covers natural nonconvex formulations of central problems in signal processing, machine learning, and statistical estimation, such as sparse dictionary learning (DL), generalized phase retrieval (GPR), and orthogonal tensor decomposition. For each of the listed problems, the nonconvex formulation and optimization lead to novel and often improved computational guarantees.
This class of nonconvex problems has two distinctive features: (i) All local minimizer are also global. Thus obtaining any local minimizer solves the optimization problem; (ii) Around each saddle point or local maximizer, the function has a negative directional curvature. In other words, around these points, the Hessian matrices have negative eigenvalues. We call smooth functions with these two properties (qualitative) X functions, and derive concrete quantities and strategy to help verify the properties, particularly for functions with random inputs or parameters. As practical examples, we establish that certain natural nonconvex formulations for complete DL and GPR are X functions with concrete parameters.
Optimizing X functions amounts to finding any local minimizer. With generic initializations, typical iterative methods at best only guarantee to converge to a critical point that might be a saddle point or local maximizer. Interestingly, the X structure allows a number of iterative methods to escape from saddle points and local maximizers and efficiently find a local minimizer, without special initializations. We choose to describe and analyze the second-order trust-region method (TRM) that seems to yield the strongest computational guarantees. Intuitively, second-order methods can exploit Hessian to extract negative curvature directions around saddle points and local maximizers, and hence are able to successfully escape from the saddles and local maximizers of X functions. We state the TRM in a Riemannian optimization framework to cater to practical manifold-constrained problems. For DL and GPR, we show that under technical conditions, the TRM algorithm finds a global minimizer in a polynomial number of steps, from arbitrary initializations.Mathematical optimization, Nonconvex programming, Electrical engineering, Computer science, Mathematicsjs4038Electrical EngineeringThesesLearning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching
https://academiccommons.columbia.edu/catalog/ac:200635
Raffel, Colin10.7916/D8N58MHVThu, 15 Jun 2017 15:02:04 +0000Sequences of feature vectors are a natural way of representing temporal data. Given a database of sequences, a fundamental task is to find the database entry which is the most similar to a query. In this thesis, we present learning-based methods for efficiently and accurately comparing sequences in order to facilitate large-scale sequence search. Throughout, we will focus on the problem of matching MIDI files (a digital score format) to a large collection of audio recordings of music. The combination of our proposed approaches enables us to create the largest corpus of paired MIDI files and audio recordings ever assembled.
Dynamic time warping (DTW) has proven to be an extremely effective method for both aligning and matching sequences. However, its performance is heavily affected by factors such as the feature representation used and its adjustable parameters. We therefore investigate automatically optimizing DTW-based alignment and matching of MIDI and audio data. Our approach uses Bayesian optimization to tune system design and parameters over a synthetically-created dataset of audio and MIDI pairs. We then perform an exhaustive search over DTW score normalization techniques to find the optimal method for reporting a reliable alignment confidence score, as required in matching tasks. This results in a DTW-based system which is conceptually simple and highly accurate at both alignment and matching. We also verify that this system achieves high performance in a large-scale qualitative evaluation of real-world alignments.
Unfortunately, DTW can be far too inefficient for large-scale search when sequences are very long and consist of high-dimensional feature vectors. We therefore propose a method for mapping sequences of continuously-valued feature vectors to downsampled sequences of binary vectors. Our approach involves training a pair of convolutional networks to map paired groups of subsequent feature vectors to a Hamming space where similarity is preserved. Evaluated on the task of matching MIDI files to a large database of audio recordings, we show that this technique enables 99.99\% of the database to be discarded with a modest false reject rate while only requiring 0.2\% of the time to compute.
Even when sped-up with a more efficient representation, the quadratic complexity of DTW greatly hinders its feasibility for very large-scale search. This cost can be avoided by mapping entire sequences to fixed-length vectors in an embedded space where sequence similarity is approximated by Euclidean distance. To achieve this embedding, we propose a feed-forward attention-based neural network model which can integrate arbitrarily long sequences. We show that this approach can extremely efficiently prune 90\% of our audio recording database with high confidence.
After developing these approaches, we applied them together to the practical task of matching 178,561 unique MIDI files to the Million Song Dataset. The resulting ``Lakh MIDI Dataset'' provides a potential bounty of ground truth information for audio content-based music information retrieval. This can include transcription, meter, lyrics, and high-level musicological features. The reliability of the resulting annotations depends both on the quality of the transcription and the accuracy of the score-to-audio alignment. We therefore establish a baseline of reliability for score-derived information for different content-based MIR tasks. Finally, we discuss potential future uses of our dataset and the learning-based sequence comparison methods we developed.Signal processing, Neural networks (Computer science), Machine learning, Computer science, Prayer, Electrical engineeringcar2221Electrical EngineeringThesesResource Allocation in Wireless Networks: Theory and Applications
https://academiccommons.columbia.edu/catalog/ac:202119
Marasevic, Jelena Rajko10.7916/D85T3KP0Thu, 15 Jun 2017 15:01:50 +0000Limited wireless resources, such as spectrum and maximum power, give rise to various resource allocation problems that are interesting both from theoretical and application viewpoints. While the problems in some of the wireless networking applications are amenable to general resource allocation methods, others require a more specialized approach suited to their unique structural characteristics. We study both types of the problems in this thesis.
We start with a general problem of alpha-fair packing, namely, the problem of maximizing sum_j {w_j f_α(x_j)}, where w_j > 0, ∀j, and (i) f_α(x_j)=ln(x_j), if α = 1, (ii) f_α(x_j)= {x_j^(1-α)}/{1-α}, if α ≠ 1,α > 0, subject to positive linear constraints of the form Ax ≤ b, x ≥ 0, where A and b are non-negative. This problem has broad applications within and outside wireless networking. We present a distributed algorithm for general alpha that converges to an epsilon-approximate solution in time (number of distributed iterations) that has an inverse polynomial dependence on the approximation parameter epsilon and poly-logarithmic dependence on the problem size. This is the first distributed algorithm for weighted alpha-fair packing with poly-logarithmic convergence in the input size. We also obtain structural results that characterize alpha-fair allocations as the value of alpha is varied. These results deepen our understanding of fairness guarantees in alpha-fair packing allocations, and also provide insights into the behavior of alpha-fair allocations in the asymptotic cases when alpha tends to zero, one, and infinity.
With these general tools on hand, we consider an application in wireless networks where fairness is of paramount importance: rate allocation and routing in energy-harvesting networks. We discuss the importance of fairness in such networks and cases where our results on alpha-fair packing apply. We then turn our focus to rate allocation in energy harvesting networks with highly variable energy sources and that are used for applications such as monitoring and tracking. In such networks, it is essential to guarantee fairness over both the network nodes and the time slots and to be as fair as possible -- in particular, to require max-min fairness. We first develop an algorithm that obtains a max-min fair rate assignment for any routing that is specified at the input. Then, we consider the problem of determining a "good'' routing. We consider various routing types and either provide polynomial-time algorithms for finding such routings or prove that the problems are NP-hard. Our results reveal an interesting trade-off between the complexities of computation and implementation. The results can also be applied to other related fairness problems.
The second part of the thesis is devoted to the study of resource allocation problems that require a specialized approach. The problems we focus on arise in wireless networks employing full-duplex communication -- the simultaneous transmission and reception on the same frequency channel. Our primary goal is to understand the benefits and complexities tied to using this novel wireless technology through the study of resource (power, time, and channel) allocation problems. Towards that goal, we introduce a new realistic model of a compact (e.g., smartphone) full-duplex receiver and demonstrate its accuracy via measurements. First, we focus on the resource allocation problems with the objective of maximizing the sum of uplink and downlink rates, possibly over multiple orthogonal channels. For the single-channel case, we quantify the rate improvement as a function of the remaining self-interference and signal-to-noise ratios and provide structural results that characterize the sum of uplink and downlink rates on a full-duplex channel. Building on these results, we consider the multi-channel case and develop a polynomial time algorithm which is nearly optimal in practice under very mild restrictions. To reduce the running time, we develop an efficient nearly-optimal algorithm under the high SINR approximation.
Then, we study the achievable capacity regions of full-duplex links in the single- and multi-channel cases. We present analytical results that characterize the uplink and downlink capacity region and efficient algorithms for computing rate pairs at the region's boundary. We also provide near-optimal and heuristic algorithms that "convexify'' the capacity region when it is not convex. The convexified region corresponds to a combination of a few full-duplex rates (i.e., to time sharing between different operation modes). The analytical results provide insights into the properties of the full-duplex capacity region and are essential for future development of fair resource allocation and scheduling algorithms in Wi-Fi and cellular networks incorporating full-duplex.Resource allocation, Electrical engineering, Energy harvesting, Wireless sensor networks, Operations research, Computer sciencejrm2207Electrical EngineeringThesesAn Open Pipeline for Generating Executable Neural Circuits from Fruit Fly Brain Data
https://academiccommons.columbia.edu/catalog/ac:197713
Givon, Lev E.10.7916/D8P26Z34Thu, 15 Jun 2017 15:01:49 +0000Despite considerable progress in mapping the fly’s connectome and elucidating the patterns of information flow in its brain, the complexity of the fly brain’s structure and the still-incomplete state of knowledge regarding its neural circuitry pose significant challenges beyond satisfying the computational resource requirements of current fly brain models that must be addressed to successfully reverse the information processing capabilities of the fly brain. These include the need to explicitly facilitate collaborative development of brain models by combining the efforts of multiple researchers, and the need to enable programmatic generation of brain models that effectively utilize the burgeoning amount of increasingly detailed publicly available fly connectome data.
This thesis presents an open pipeline for modular construction of executable models of the fruit fly brain from incomplete biological brain data that addresses both of the above requirements. This pipeline consists of two major open-source components respectively called Neurokernel and NeuroArch.
Neurokernel is a framework for collaborative construction of executable connectome-based fly brain models by integration of independently developed models of different functional units in the brain into a single emulation that can be executed upon multiple Graphics Processing Units (GPUs). Neurokernel enforces a programming model that enables functional unit models that comply with its interface requirements to communicate during execution regardless of their internal design. We demonstrate the power of this programming model by using it to integrate independently developed models of the fly retina and lamina into a single vision processing system. We also show how Neurokernel’s communication performance can scale over multiple GPUs, number of functional units in a brain emulation, and over the number of communication ports exposed by a functional unit model.
Although the increasing amount of experimentally obtained biological data regarding the fruit fly brain affords brain modelers a potentially valuable resource for model development, the actual use of this data to construct executable neural circuit models is currently challenging because of the disparate nature of different data sources, the range of storage formats they use, and the limited query features of those formats complicates the process of inferring executable circuit designs from biological data. To overcome these limitations, we created a software package called NeuroArch that defines a data model for concurrent representation of both biological data and model structure and the relationships between them within a single graph database. Coupled with a powerful interface for querying both types of data within the database in a uniform high-level manner, this representation enables construction and dispatching of executable neural circuits to Neurokernel for execution and evaluation.
We demonstrate the utility of the NeuroArch/Neurokernel pipeline by using the packages to generate an executable model of the central complex of the fruit fly brain from both published and hypothetical data regarding overlapping neuron arborizations in different regions of the central complex neuropils. We also show how the pipeline empowers circuit model designers to devise computational analogues to biological experiments such as parallel concurrent recording from multiple neurons and emulation of genetic mutations that alter the fly’s neural circuitry.Neural circuitry--Computer simulation, Neural circuitry, Neural networks (Computer science), Fruit-flies, Neurosciences, Computer scienceleg22Electrical EngineeringThesesLarge-scale Affective Computing for Visual Multimedia
https://academiccommons.columbia.edu/catalog/ac:200641
Jou, Brendan Wesley10.7916/D8474B0BThu, 15 Jun 2017 15:01:49 +0000In recent years, Affective Computing has arisen as a prolific interdisciplinary field for engineering systems that integrate human affections. While human-computer relationships have long revolved around cognitive interactions, it is becoming increasingly important to account for human affect, or feelings or emotions, to avert user experience frustration, provide disability services, predict virality of social media content, etc. In this thesis, we specifically focus on Affective Computing as it applies to large-scale visual multimedia, and in particular, still images, animated image sequences and video streams, above and beyond the traditional approaches of face expression and gesture recognition. By taking a principled psychology-grounded approach, we seek to paint a more holistic and colorful view of computational affect in the context of visual multimedia. For example, should emotions like 'surprise' and `fear' be assumed to be orthogonal output dimensions? Or does a 'positive' image in one culture's view elicit the same feelings of positivity in another culture? We study affect frameworks and ontologies to define, organize and develop machine learning models with such questions in mind to automatically detect affective visual concepts.
In the push for what we call "Big Affective Computing," we focus on two dimensions of scale for affect -- scaling up and scaling out -- which we propose are both imperative if we are to scale the Affective Computing problem successfully. Intuitively, simply increasing the number of data points corresponds to "scaling up". However, less intuitive, is when problems like Affective Computing "scale out," or diversify. We show that this latter dimension of introducing data variety, alongside the former of introducing data volume, can yield particular insights since human affections naturally depart from traditional Machine Learning and Computer Vision problems where there is an objectively truthful target. While no one might debate a picture of a 'dog' should be tagged as a 'dog,' but not all may agree that it looks 'ugly'. We present extensive discussions on why scaling out is critical and how it can be accomplished while in the context of large-volume visual data.
At a high-level, the main contributions of this thesis include:
Multiplicity of Affect Oracles:
Prior to the work in this thesis, little consideration has been paid to the affective label generating mechanism when learning functional mappings between inputs and labels. Throughout this thesis but first in Chapter 2, starting in Section 2.1.2, we make a case for a conceptual partitioning of the affect oracle governing the label generation process in Affective Computing problems resulting a multiplicity of oracles, whereas prior works assumed there was a single universal oracle. In Chapter 3, the differences between intended versus expressed versus induced versus perceived emotion are discussed, where we argue that perceived emotion is particularly well-suited for scaling up because it reduces the label variance due to its more objective nature compared to other affect states. And in Chapter 4 and 5, a division of the affect oracle along cultural lines with manifestations along both language and geography is explored. We accomplish all this without sacrificing the 'scale up' dimension, and tackle significantly larger volume problems than prior comparable visual affective computing research.
Content-driven Visual Affect Detection:
Traditionally, in most Affective Computing work, prediction tasks use psycho-physiological signals from subjects viewing the stimuli of interest, e.g., a video advertisement, as the system inputs. In essence, this means that the machine learns to label a proxy signal rather than the stimuli itself. In this thesis, with the rise of strong Computer Vision and Multimedia techniques, we focus on the learning to label the stimuli directly without a human subject provided biometric proxy signal (except in the unique circumstances of Chapter 7). This shift toward learning from the stimuli directly is important because it allows us to scale up with much greater ease given that biometric measurement acquisition is both low-throughput and somewhat invasive while stimuli are often readily available. In addition, moving toward learning directly from the stimuli will allow researchers to precisely determine which low-level features in the stimuli are actually coupled with affect states, e.g., which set of frames caused viewer discomfort rather a broad sense that a video was discomforting. In Part I of this thesis, we illustrate an emotion prediction task with a psychology-grounded affect representation. In particular, in Chapter 3, we develop a prediction task over semantic emotional classes, e.g., 'sad,' 'happy' and 'angry,' using animated image sequences given annotations from over 2.5 million users. Subsequently, in Part II, we develop visual sentiment and adjective-based semantics models from million-scale digital imagery mined from a social multimedia platform.
Mid-level Representations for Visual Affect:
While discrete semantic emotions and sentiment are classical representations of affect with decades of psychology grounding, the interdisciplinary nature of Affective Computing, now only about two decades old, allows for new avenues of representation. Mid-level representations have been proposed in numerous Computer Vision and Multimedia problems as an intermediary, and often more computable, step toward bridging the semantic gap between low-level system inputs and high-level label semantic abstractions. In Part II, inspired by this work, we adapt it for vision-based Affective Computing and adopt a semantic construct called adjective-noun pairs. Specifically, in Chapter 4, we explore the use of such adjective-noun pairs in the context of a social multimedia platform and develop a multilingual visual sentiment ontology with over 15,000 affective mid-level visual concepts across 12 languages associated with over 7.3 million images and representations from over 235 countries, resulting in the largest affective digital image corpus in both depth and breadth to date. In Chapter 5, we develop computational methods to predict such adjective-noun pairs and also explore their usefulness in traditional sentiment analysis but with a previously unexplored cross-lingual perspective. And in Chapter 6, we propose a new learning setting called 'cross-residual learning' building off recent successes in deep neural networks, and specifically, in residual learning; we show that cross-residual learning can be used effectively to jointly learn across even multiple related tasks in object detection (noun), more traditional affect modeling (adjectives), and affective mid-level representations (adjective-noun pairs), giving us a framework for better grounding the adjective-noun pair bridge in both vision and affect simultaneously.Computer vision, Affect (Psychology)--Computer simulation, Large scale systems, Machine learning, Electrical engineering, Computer sciencebwj2105Electrical EngineeringThesesMeasuring and Improving the Quality of Experience of Adaptive Rate Video
https://academiccommons.columbia.edu/catalog/ac:202347
Nam, Hyunwoo10.7916/D82B8Z7VWed, 14 Jun 2017 21:09:37 +0000Today's popular over-the-top (OTT) video streaming services such as YouTube, Netflix and Hulu deliver video contents to viewers using adaptive bitrate (ABR) technologies. In ABR streaming, a video player running on a viewer's device adaptively changes bitrates to match given network conditions. However, providing reliable streaming is challenging. First, an ABR player may select an inappropriate bitrate during playback due to the lack of direct knowledge of access networks, frequent user mobility and rapidly changing channel conditions. Second, OTT content is delivered to viewers without any cooperation with Internet service providers (ISPs). Last, there are no appropriate tools that evaluate the performance of ABR streaming along with video quality of experience (QoE).
This thesis describes how to improve the video QoE of OTT video streaming services using ABR technologies. Our analysis starts from understanding ABR heuristics. How does ABR streaming work? What factors does an ABR player consider when switching bitrates during a download? Then, we propose our solutions to improve existing ABR streaming from the perspective of network operators who deliver video content through their networks and video service providers who build ABR players running on viewers' devices.
From the network operators' point of view, we propose to find a better video content server based on round trip times (RTTs) between an edge node of a wireless network and available video content servers when a viewer requests a video. The edge node can be an Internet Service Provider (ISP) router in a Wi-Fi network and a packet data network gateway (P-GW) in a 4G network. During the experiments, our solution showed better TCP performance (e.g., higher TCP throughput during playback) 146 times out of 200 experiments (73%) over Wi-Fi networks and 162 times out of 200 experiments (81%) over 3G networks. In addition, we claim that the wireless edge nodes can assist an ABR video player in selecting the best available bitrate by controlling the available bandwidth in the radio access network between a base station and a viewer's device. In our Wi-Fi testbed, the proposed solution saved up to 21% of radio bandwidth on mobile devices and enhanced the viewing experience by reducing rebufferings during playback. Last, we assert that software-defined networking (SDN) can improve video QoE by dynamically controlling routing paths of video streaming flows based on the provisioned networking information collected from SDN-enabled networking devices. Using an off-the-shelf SDN platform, we showed that our proposed solution can reduce rebufferings by 50% and provide higher bitrates during a download.
From the perspective of video service providers, higher video QoE can be achieved by improving ABR heuristics implemented in an ABR player. To support this idea, we investigated the role of playout buffer size in ABR streaming and its impact on video QoE. Through our video QoE survey, we proved that a large buffer does not always outperform a small buffer, especially under rapidly varying network conditions. Based on this finding, we suggest to dynamically change the maximum buffer size in an ABR player depending on the current capacity of its playout buffer for improving the QoE of viewers. During the experiments, our proposed solution improved the viewing experience by offering 15% higher average played bitrate, 70% fewer bitrate changes and 50% shorter rebuffering duration.
Our experimental results show that even small changes of ABR heuristics and new features of network systems can greatly affect video QoE. However, it is still difficult for video service providers or network operators to evaluate new ABR heuristics or network system changes due to lack of accurate QoE monitoring systems. In order to solve this issue, we have developed YouSlow ("YouTube Too Slow!? - YouSlow") as a new approach to monitoring video QoE for the analysis of ABR performance. The lightweight web browser plug-in and mobile application are designed to monitor various playback events (e.g., rebuffering duration and frequency of bitrate changes) directly from within ABR video players and calculate statistics along with video QoE. Using YouSlow, we investigate the impact of the above playback events on video abandonment: about 10% of viewers abandoned the YouTube videos when the pre-roll ads lasted for 15 seconds. Even increasing the bitrate can annoy viewers; they prefer a high starting bitrate with no bitrate changes during playback. Our regression analysis shows that bitrate changes do not affect video abandonment significantly and the abandonment rate can be estimated accurately using the rebuffering ratio and the number of rebufferings.
The thesis includes four main contributions. First, we investigate today's popular OTT video streaming services (e.g., YouTube and Netflix) that use ABR streaming technologies. Second, we propose to build QoS and QoE aware video streaming that can be implemented in existing wireless networks (e.g., Wi-Fi, 3G and 4G) and in SDN-enabled networks. Third, we propose to improve current ABR heuristics by dynamically changing the playout buffer size under varying network conditions. Last, we designed and implemented a new monitoring system for measuring video QoE.Internet service providers, Streaming video, Computer science, Electrical engineeringhn2203Electrical EngineeringThesesLarge-Scale Video Event Detection
https://academiccommons.columbia.edu/catalog/ac:189685
Ye, Guangnan10.7916/D8BG2NGVWed, 14 Jun 2017 19:55:42 +0000Because of the rapid growth of large scale video recording and sharing, there is a growing need for robust and scalable solutions for analyzing video content. The ability to detect and recognize video events that capture real-world activities is one of the key and complex problems. This thesis aims at the development of robust and efficient solutions for large scale video event detection systems. In particular, we investigate the problem in two areas: first, event detection with automatically discovered event specific concepts with organized ontology, and second, event detection with multi-modality representations and multi-source fusion.
Existing event detection works use various low-level features with statistical learning models, and achieve promising performance. However, such approaches lack the capability of interpreting the abundant semantic content associated with complex video events. Therefore, mid-level semantic concept representation of complex events has emerged as a promising method for understanding video events. In this area, existing works can be categorized into two groups: those that manually define a specialized concept set for a specific event, and those that apply a general concept lexicon directly borrowed from existing object, scene and action concept libraries. The first approach seems to require tremendous manual efforts, whereas the second approach is often insufficient in capturing the rich semantics contained in video events. In this work, we propose an automatic event-driven concept discovery method, and build a large-scale event and concept library with well-organized ontology, called EventNet. This method is different from past work that applies a generic concept library independent of the target while not requiring tedious manual annotations. Extensive experiments over the zero-shot event retrieval task when no training samples are available show that the proposed EventNet library consistently and significantly outperforms the state-of-the-art methods.
Although concept-based event representation can interpret the semantic content of video events, in order to achieve high accuracy in event detection, we also need to consider and combine various features of different modalities and/or across different levels. One one hand, we observe that joint cross-modality patterns (e.g., audio-visual pattern) often exist in videos and provide strong multi-modal cues for detecting video events. We propose a joint audio-visual bi-modal codeword representation, called bi-modal words, to discover cross-modality correlations. On the other hand, combining features from multiple sources often produces performance gains, especially when the features complement with each other. Existing multi-source late fusion methods usually apply direct combination of confidence scores from different sources. This becomes limiting because heterogeneous results from various sources often produce incomparable confidence scores at different scales. This makes direct late fusion inappropriate, thus posing a great challenge. Based upon the above considerations, we propose a robust late fusion method with rank minimization, that not only achieves isotonicity among various scores from different sources, but also recovers a robust prediction score for individual test samples. We experimentally show that the proposed multi-modality representation and multi-source fusion methods achieve promising results compared with other benchmark baselines.
The main contributions of the thesis include the following.
1. Large scale event and concept ontology: a) propose an automatic framework for discovering event-driven concepts; b) build the largest video event ontology, EventNet, which includes 500 complex events and 4,490 event-specific concepts; c) build the first interactive system that allows users to explore high-level events and associated concepts in videos with event browsing, search, and tagging functions.
2. Event detection with multi-modality representations and multi-source fusion: a) propose novel bi-modal codeword construction for discovering multi-modality correlations; b) propose novel robust late fusion with rank minimization method for combining information from multiple sources.
The two parts of the thesis are complimentary. Concept-based event representation provides rich semantic information for video events. Cross-modality features also provide complementary information from multiple sources. The combination of those two parts in a unified framework can offer great potential for advancing state-of-the-art in large-scale event detection.Computer sciencegy2179Electrical EngineeringThesesThree-Dimensional Object Search, Understanding, and Pose Estimation with Low-Cost Sensors
https://academiccommons.columbia.edu/catalog/ac:188352
Wang, Yan10.7916/D8RX9B7VWed, 14 Jun 2017 19:55:13 +0000With the recent development of low-cost depth sensors, an entirely new type of 3D data is being generated rapidly by regular consumers. Traditionally, 3D data is produced by a small number of professional designers (i.e., the Computer Aided Design (CAD) model); however, 3D data from massive consumer-level sensors has the potential of introducing many new applications, such as user-captured 3D warehouse and search engines, robots with 3D sensing capability, and customized 3D printing. Nevertheless, the low-cost sensors used by general consumers also pose new technological challenges. First, they have relatively high levels of sensor noise. Second, the use of such consumer devices is often in uncontrolled settings, resulting in challenging conditions, such as poor lighting, cluttered scenes, and object occlusion. To address such emerging opportunities and associated challenges, this dissertation is dedicated to the development of novel algorithms and systems for 3D data understanding and processing, using input from a consumer-level 3D sensor.
In particular, the key problems of 3D shape retrieval, scene understanding, and pose recognition are explored in order to present a comprehensive coverage of the key aspects of content-based 3D shape analysis. To resolve the aforementioned challenges, we propose a flexible Markov Random Field (MRF) framework that uses local information to allow partial matching, and thus address the model incompleteness problem; the framework also uses higher-order correlation to provide additional robustness against sensor noise. With the MRF framework, these 3D analysis problems can be transformed into a unified potential energy minimization problem, while preserving the flexibility to adapt to different settings and resolve the unique challenges of each problem. The contributions of the dissertation include:
a. Cross-Domain 3D Retrieval: First we tackle the problem of searching 3D noise- free models using noisy data captured by low-cost 3D sensors – a unique cross-domain setting. To manage the challenges of sensor noise and model incompleteness from consumer-level sensors, we propose a novel MRF formulation for the retrieval problem. The potential function of the random field is designed to capture both the local shape and global spatial consistency in order to preserve the local matching capability, while offering robustness against the sensor noise. The specific form of the potential functions is determined efficiently by a series of weak classifiers, thus forming a variant of the Regression Tree Field (RTF). We achieve better retrieval precision and recall in the cross-domain settings with a consumer-level depth sensor compared with state-of-the-art approaches.
b. 3D Scene Understanding: We develop a scene understanding system based on input from consumer-level depth sensors. To resolve the key challenge of the lack of annotated 3D training data, we construct an MRF that connects the input 3D point cloud and the associated 2D reference images, based on which the 3D point cloud is stitched. A series of weak classifiers are trained to obtain an approximate semantic segmentation result from the reference images. The potential function of the field is designed to integrate the results from the classifiers, while taking advantage of the 3D spatial consistency in order to output a comprehensive scene understanding result. We achieve comparable accuracy and much faster speed compared with state-of-the-art 3D scene understanding systems, with the difference that we do not require annotated 3D training data.
c. Pose Recognition of Deformable Objects: We develop a method for supporting a robotics system to recognize pose and manipulate deformable objects. More specifically, garment pose is recognized with the help of an offline simulated database and the proposed retrieval approach. We use a novel binary feature representation extracted from the reconstructed 3D surfaces in order to allow efficient matching, thus achieving real-time performance. A spatial weight is further learned in order to integrate the local matching result. The system shows superior recognition accuracy and faster speed than the state-of-the-art approaches.
d. Application with 2D Data: In addition to the traditional 3D applications, we explore the possibility of extending MRF formulation to 2D data, especially those used in classical low-level 2D vision problems, such as image deblurring and denoising. One well-known technique that uses image prior, the probabilistic patched-based prior, is known to have bottlenecks in finding the most similar model from a model set, which can be posed as a retrieval problem. Therefore, we apply the MRF formulation originally developed for 3D shape retrieval, and extend it to this 2D problem by introducing a grid-like random field structure. We can achieve 40x acceleration compared with the state-of-the-art algorithm, while preserving quality.
We organize the dissertation as follows. First, the core problems of 3D shape retrieval, scene understanding, and pose recognition, and with the proposed solutions that use MRF and RTF are explored in Part I. In Part II, the extension to 2D data is discussed. Extensive evaluation is performed in each specific task in order to compare the proposed approaches with state-of-the-art algorithms and systems, and also to justify the components of the proposed methods. Finally, in Part III, we include the conclusion remarks and discussion of open issues and future work.Computer science, Roboticsyw2383Electrical EngineeringThesesScalable Machine Learning for Visual Data
https://academiccommons.columbia.edu/catalog/ac:189406
Yu, Xinnan10.7916/D8F47NDBWed, 14 Jun 2017 19:53:36 +0000Recent years have seen a rapid growth of visual data produced by social media, large-scale surveillance cameras, biometrics sensors, and mass media content providers. The unprecedented availability of visual data calls for machine learning methods that are effective and efficient for such large-scale settings.
The input of any machine learning algorithm consists of data and supervision. In a large-scale setting, on the one hand, the data often comes with a large number of samples, each with high dimensionality. On the other hand, the unconstrained visual data requires a large amount of supervision to make machine learning methods effective. However, the supervised information is often limited and expensive to acquire. The above hinder the applicability of machine learning methods for large-scale visual data. In the thesis, we propose innovative approaches to scale up machine learning to address challenges arising from both the scale of the data and the limitation of the supervision. The methods are developed with a special focus on visual data, yet they are also widely applicable to other domains that require scalable machine learning methods.
Learning with high-dimensionality:
The "large-scale" of visual data comes not only from the number of samples but also from the dimensionality of the features. While a considerable amount of effort has been spent on making machine learning scalable for more samples, few approaches are addressing learning with high-dimensional data. In Part I, we propose an innovative solution for learning with very high-dimensional data. Specifically, we use a special structure, the circulant structure, to speed up linear projection, the most widely used operation in machine learning. The special structure dramatically improves the space complexity from quadratic to linear, and the computational complexity from quadratic to linearithmic in terms of the feature dimension. The proposed approach is successfully applied in various frameworks of large-scale visual data analysis, including binary embedding, deep neural networks, and kernel approximation. The significantly improved efficiency is achieved with minimal loss of the performance. For all the applications, we further propose to optimize the projection parameters with training data to further improve the performance.
The scalability of learning algorithms is often fundamentally limited by the amount of supervision available. The massive visual data comes unstructured, with diverse distribution and high-dimensionality -- it is required to have a large amount of supervised information for the learning methods to work. Unfortunately, it is difficult, and sometimes even impossible to collect a sufficient amount of high-quality supervision, such as instance-by-instance labels, or frame-by-frame annotations of the videos.
Learning from label proportions:
To address the challenge, we need to design algorithms utilizing new types of supervision, often presented in weak forms, such as relatedness between classes, and label statistics over the groups. In Part II, we study a learning setting called Learning from Label Proportions (LLP), where the training data is provided in groups, and only the proportion of each class in each group is known. The task is to learn a model to predict the class labels of the individuals. Besides computer vision, this learning setting has broad applications in social science, marketing, and healthcare, where individual-level labels cannot be obtained due to privacy concerns. We provide theoretical analysis under an intuitive framework called Empirical Proportion Risk Minimization (EPRM), which learns an instance level classifier to match the given label proportions on the training data. The analysis answers the fundamental question, when and why LLP is possible. Under EPRM, we propose the proportion-SVM (∝SVM) algorithm, which jointly optimizes the latent instance labels and the classification model in a large-margin framework. The approach avoids making restrictive assumptions on the data, leading to the state-of-the-art results. We have successfully applied the developed tools to challenging problems in computer vision including instance-based event recognition, and attribute modeling.
Scaling up mid-level visual attributes:
Besides learning with weak supervision, the limitation on the supervision can also be alleviated by leveraging the knowledge from different, yet related tasks. Specifically, "visual attributes" have been extensively studied in computer vision. The idea is that the attributes, which can be understood as models trained to recognize visual properties can be leveraged in recognizing novel categories (being able to recognize green and orange is helpful for recognizing apple). In a large-scale setting, the unconstrained visual data requires a high-dimensional attribute space that is sufficiently expressive for the visual world. Ironically, though designed to improve the scalability of visual recognition, conventional attribute modeling requires expensive human efforts for labeling the detailed attributes and is inadequate for designing and learning a large set of attributes. To address such challenges, in Part III, we propose methods that can be used to automatically design a large set of attribute models, without user labeling burdens. We propose weak attribute, which combines various types of existing recognition models to form an expressive space for visual recognition and retrieval. In addition, we develop category-level attribute to characterize distinct properties separating multiple categories. The attributes are optimized to be discriminative to the visual recognition task over known categories, providing both better efficiency and higher recognition rate over novel categories with a limited number of training samples.Artificial intelligence, Computer science, Electrical engineeringxy2154Electrical EngineeringThesesComputational Methods for Nonlinear Optimization Problems: Theory and Applications
https://academiccommons.columbia.edu/catalog/ac:189943
Madani, Ramtin10.7916/D88S4PDMWed, 14 Jun 2017 19:53:19 +0000This dissertation is motivated by the lack of efficient global optimization techniques for polynomial optimization problems. The objective is twofold. First, a new mathematical foundation for obtaining a global or near-global solution will be developed. Second, several case studies will be conducted on a variety of real-world problems. Global optimization, convex relaxation and distributed computation are at the heart of this PhD dissertation. Some of the specific problems to be addressed in this thesis on both the theory and the application of nonlinear optimization are explained below:
Graph theoretic algorithms for low-rank optimization problems: There is a rapidly growing interest in the recovery of an unknown low-rank matrix from limited information and measurements. This problem occurs in many areas of engineering and applied science such as machine learning, control, and computer vision. We develop a graph-theoretic technique in Part I that is able to generate a low-rank solution for a sparse Linear Matrix Inequality (LMI), which is directly applicable to a large set of problems such as low-rank matrix completion with many unknown entries. Our approach finds a solution with a guarantee on its rank, using the recent advances in graph theory.
Resource allocation for energy systems: The flows in an electrical grid are described by nonlinear AC power flow equations. Due to the nonlinear interrelation among physical parameters of the network, the feasibility region represented by power flow equations may be nonconvex and disconnected. Since 1962, the nonlinearity of the network constraints has been studied, and various heuristic and local-search algorithms have been proposed in order to perform optimization over an electrical grid [Baldick, 2006; Pandya and Joshi, 2008]. Part II is concerned with finding convex formulations of the power flow equations using semidefinite programming (SDP). The potential of SDP relaxation for problems in power systems has been manifested in [Lavaei and Low, 2012], with further studies conducted in [Lavaei, 2011; Sojoudi and Lavaei, 2012]. A variety of graph-theoretic and algebraic methods are developed in Part II in order to facilitate performing fundamental, yet challenging tasks such as optimal power flow (OPF) problem, security-constrained OPF and the classical power flow problem.
Synthesis of distributed control systems: Real-world systems mostly consist of many interconnected subsystems, and designing an optimal controller for them pose several challenges to the field of control theory. The area of distributed control is created to address the challenges arising in the control of these systems. The objective is to design a constrained controller whose structure is specified by a set of permissible interactions between the local controllers with the aim of reducing the computation or communication complexity of the overall controller. It has been long known that the design of an optimal distributed (decentralized) controller is a daunting task because it amounts to an NP-hard optimization problem in general [Witsenhausen, 1968; Tsitsiklis and Athans, 1984]. Part III is devoted to study the potential of the SDP relaxation for the optimal distributed control (ODC) problem Our approach rests on formulating each of different variations of the ODC problem as rank-constrained optimization problems from which SDP relaxations can be derived. As the first contribution, we show that the ODC problem admits a sparse SDP relaxation with solutions of rank at most 3. Since a rank-1 SDP matrix can be mapped back into a globally-optimal controller, the low-rank SDP solution may be deployed to retrieve a near-global controller.
Parallel computation for sparse semidefinite programs: While small- to medium-sized semidefinite programs are efficiently solvable by second-order-based interior point methods in polynomial time up to any arbitrary precision [Vandenberghe and Boyd, 1996a], these methods are impractical for solving large-scale SDPs due to computation time and memory issues. In Part IV of this dissertation, a parallel algorithm for solving an arbitrary SDP is introduced based on the alternating direction method of multipliers. The proposed algorithm has a guaranteed convergence under very mild assumptions. Each iteration of this algorithm has a simple closed-form solution, and consists of scalar multiplication and eigenvalue decomposition over matrices whose sizes are not greater than the treewdith of the sparsity graph of the SDP problem. The cheap iterations of the proposed algorithm enable solving real-world large-scale conic optimization problems.Engineering, Mathematics, Computer sciencerm3122Electrical EngineeringThesesDistributed and Large-Scale Optimization
https://academiccommons.columbia.edu/catalog/ac:193921
Ali Younis Kalbat, Abdulrahman Younis10.7916/D8D79B7VWed, 14 Jun 2017 19:48:33 +0000This dissertation is motivated by the pressing need for solving real-world large-scale optimization problems with the main objective of developing scalable algorithms that are capable of solving such problems efficiently. Large-scale optimization problems naturally appear in complex systems such as power networks and distributed control systems, which are the main systems of interest in this work. This dissertation aims to address four problems with regards to the theory and application of large-scale optimization problems, which are explained below:
Chapter 2: In this chapter, a fast and parallelizable algorithm is developed for an arbitrary decomposable semidefinite program (SDP). Based on the alternating direction method of multipliers, we design a numerical algorithm that has a guaranteed convergence under very mild assumptions. We show that each iteration of this algorithm has a simple closed-form solution, consisting of matrix multiplications and eigenvalue decompositions performed by individual agents as well as information exchanges between neighboring agents. The cheap iterations of the proposed algorithm enable solving a wide spectrum of real-world large-scale conic optimization problems that could be reformulated as SDP.
Chapter 3: Motivated by the application of sparse SDPs to power networks, the objective of this chapter is to design a fast and parallelizable algorithm for solving the SDP relaxation of a large-scale optimal power flow (OPF) problem. OPF is fundamental problem used for the operation and planning of power networks, which is non-convex and NP-hard in the worst case. The proposed algorithm would enable a real-time power network management and improve the system's reliability. In particular, this algorithm helps with the realization of Smart Grid by allowing to make optimal decisions very fast in response to the stochastic nature of renewable energy. The proposed algorithm is evaluated on IEEE benchmark systems.
Chapter 4: The design of an optimal distributed controller using an efficient computational method is one of the most fundamental problems in the area of control systems, which remains as an open problem due to its NP-hardness in the worst case. In this chapter, we first study the infinite-horizon optimal distributed control (ODC) problem (for deterministic systems) and then generalize the results to a stochastic ODC problem (for stochastic systems). Our approach rests on formulating each of these problems as a rank-constrained optimization from which an SDP relaxation can be derived. We show that both problems admit sparse SDP relaxations with solutions of rank at most~3. Since a rank-1 SDP matrix can be mapped back into a globally-optimal controller, the rank-3 solution may be deployed to retrieve a near-global controller. We also propose computationally cheap SDP relaxation for each problem and then develop effective heuristic methods to recover a near-optimal controller from the low-rank SDP solution. The design of several near-optimal structured controllers with global optimality degrees above 99\% will be demonstrated.
Chapter 5: The frequency control problem in power networks aims to control the global frequency of the system within a tight range by adjusting the output of generators in response to the uncertain and stochastic demand. The intermittent nature of distributed power generation in smart grid makes the traditional decentralized frequency controllers less efficient and demands distributed controllers that are able to deal with the uncertainty in the system introduced by non-dispatchable supplies (such as renewable energy), fluctuating loads, and measurement noise. Motivated by this need, we study the frequency control problem using the results developed in Chapter 4. In particular, we formulate the problem and then conduct a case study on the IEEE 39-Bus New England system. The objective is to design a near-global optimal distributed frequency controller for the New England test system by optimally adjusting the mechanical power input to each generator based on the real-time measurement received from neighboring generators through a user-defined communication topology.Mathematical optimization, Mathematical optimization--Methodology, Computer algorithms, Semidefinite programming, Electrical engineering, Mathematics, Computer scienceak3369Electrical EngineeringThesesFacilitating Formal Verification of Cooperative Driving Applications: Techniques and Case Study
https://academiccommons.columbia.edu/catalog/ac:193257
Lin, Shou-pon10.7916/D8X63MQGWed, 14 Jun 2017 19:47:16 +0000The next generation of intelligent vehicles will evolve from being able to drive autonomously to ones that communicate with other vehicles and execute joint behaviors. Before allowing these vehicles on public roads, we must guarantee that they will not cause accidents. We will apply formal methods to ensure the degree of safety that cannot be assured with simulation or closed-track testing. However, there are challenges that need to be addressed when applying formal verification techniques to cooperative driving systems.
This thesis focuses on the techniques that address the following challenges: 1. Automotive applications interact with the physical world in different ways; 2. Cooperative driving systems are time-critical; 3. The problem of state explosion when we apply formal verification to systems with more participants.
First, we describe the multiple stack architecture. It combines several stacks, each of which addresses a particular way of interaction with the physical world. The layered structure in each stack makes it possible for engineers to implement cooperative driving applications without being bogged down by the details of low-level devices. Having functions arranged in a layered fashion helps us divide the verification of the whole system into smaller subproblems of independent module verification.
Secondly, we present a framework for modeling the protocol systems that uses GPS clocks for synchronization. We introduce the timing stack, which separates a process into two parts: the part modeled as an finite-state machine that controls state transitions and messages exchanges, and the part that determines the exact moment that a timed event should occur. The availability of accurate clocks at different locations allows processes to execute actions simultaneously, reducing interleaving that often arises in systems that use multiple timers to control timed events. With accurate clocks, we create a lock protocol that resolves conflicting merge requests for driver-assisted merging.
Thirdly, we introduce stratified probabilistic verification that mitigates state explosion. It greatly improves the probability bound obtained in the original probabilistic verification algorithm. Unlike most techniques that aim at reducing state space, it is a directed state traversal, prioritizing the states that are more likely to be encountered during system execution. When state traversal stops upon depleting the memory, the unexplored states are the ones that are less likely to be reached. We construct a linear program whose solution is the upper bound for the probability of reaching those unexplored states. The stratified algorithm is particularly useful when considering a protocol system that depends on several imperfect components that may fail with small but hard-to-quantify probabilities. In that case, we adopt a compositional approach to verify a collection of components, assuming that the components have inexact probability guarantees.
Finally, we present our design of driver-assisted merging. Its design is reasonably simplified by using the multiple stack architecture and GPS clocks. We use a stratified algorithm to show that merging system fails less than once every 5 × 10¹³ merge attempts.Automobile driving, Automobile driving--Steering--Automatic control, Automobiles--Automatic control, Motor vehicles--Automatic control, Automatic control--Computer programs, Artificial intelligence, Electrical engineering, Computer sciencesl3357Electrical EngineeringThesesCross-layer resource allocation in wireless and optical networks
https://academiccommons.columbia.edu/catalog/ac:186473
Birand, Berk10.7916/D8VQ31K6Mon, 12 Jun 2017 17:44:18 +0000The success of the Internet can be largely attributed to its modular architecture and its high level of abstraction. As a result, the Internet is an extremely heterogeneous network in which a multitude of wireless, electronic, and optical devices coexist. Yet, wireless and optical technologies are approaching their capacity limits. In this thesis, we study cross-layer and cross-domain optimizations in wireless and optical networks to improve the scalability of heterogeneous networks. Specifically, we investigate the benefits in capacity improvement and energy efficiency of improved interaction between different layers, as well as different domains.
First, we use the Local Pooling (LoP) conditions to identify all the network graphs under primary interference constraints in which Greedy Maximal Scheduling (GMS) achieves 100% throughput. In addition, we show that in all bipartite graphs of size up to 7 x— n, GMS is guaranteed to achieve 66% throughput. Finally, we study the performance of GMS in interference graphs and show that it can perform arbitrarily bad.
We study the properties of evolving graphs of networks whose structure changes due to node mobility. We present several graph metrics that quantify change in an evolving graph sequence and apply these metrics to several sources of mobility. We relate our results on the effect of the rate of graph change to the performance of higher-layer network algorithms in dynamic networks.
We then consider optical networks, and formulate a global optimization problem that captures the QoT constraints in future dynamic optical networks. We design a power control algorithm for solving this problem by using feedback from Optical Performance Monitors (OPMs). We evaluate this algorithm via extensive simulations on a network-scale optical network simulator, as well as experiments with commercial optical network equipment.
Finally, we consider a cellular network with Coordinated Multi-Point (CoMP) Joint Transmission (JT) capabilities that allow multiple BSs to transmit simultaneously to a single user. We formulate the OFDMA Joint Scheduling (OJS) problem of determining a subframe schedule and deciding if to use JT, and we prove hardness results for this problem. Based on a decomposition framework, we develop efficient scheduling algorithms for bipartite and series-parallel planar graphs, and approximation algorithms for general graphs. We then consider a queueing model that evolves over time, and prove that solving the OJS problem with a specific queue-based utility function (in every subframe) achieves maximum throughput in CoMP-enabled networks.Computer science, Information technology, Electrical engineeringbb2408Electrical EngineeringThesesContinuous-Time and Companding Digital Signal Processors Using Adaptivity and Asynchronous Techniques
https://academiccommons.columbia.edu/catalog/ac:165165
Vezyrtzis, Christos10.7916/D8RN3G61Thu, 08 Jun 2017 16:11:14 +0000The fully synchronous approach has been the norm for digital signal processors (DSPs) for many decades. Due to its simplicity, the classical DSP structure has been used in many applications. However, due to its rigid discrete-time operation, a classical DSP has limited efficiency or inadequate resolution for some emerging applications, such as processing of multimedia and biological signals. This thesis proposes fundamentally new approaches to designing DSPs, which are different from the classical scheme. The defining characteristic of all new DSPs examined in this thesis is the notion of "adaptivity" or "adaptability." Adaptive DSPs dynamically change their behavior to adjust to some property of their input stream, for example the rate of change of the input. This thesis presents both enhancements to existing adaptive DSPs, as well as new adaptive DSPs. The main class of DSPs that are examined throughout the thesis are continuous-time (CT) DSPs. CT DSPs are clock-less and event-driven; they naturally adapt their activity and power consumption to the rate of their inputs. The absence of a clock also provides a complete avoidance of aliasing in the frequency domain, hence improved signal fidelity. The core of this thesis deals with the complete and systematic design of a truly general-purpose CT DSP. A scalable design methodology for CT DSPs is presented. This leads to the main contribution of this thesis, namely a new CT DSP chip. This chip is the first general-purpose CT DSP chip, able to process many different classes of CT and synchronous signals. The chip has the property of handling various types of signals, i.e. various different digital modulations, both synchronous and asynchronous, without requiring any reconfiguration; such property is presented for the first time CT DSPs and is impossible for classical DSPs. As opposed to previous CT DSPs, which were limited to using only one type of digital format, and whose design was hard to scale for different bandwidths and bit-widths, this chip has a formal, robust and scalable design, due to the systematic usage of asynchronous design techniques. The second contribution of this thesis is a complete methodology to design adaptive delay lines. In particular, it is shown how to make the granularity, i.e. the number of stages, adaptive in a real-time delay line. Adaptive granularity brings about a significant improvement in the line's power consumption, up to 70% as reported by simulations on two design examples. This enhancement can have a direct large power impact on any CT DSP, since a delay line consumes the majority of a CT DSP's power. The robust methodology presented in this thesis allows safe dynamic reconfiguration of the line's granularity, on-the-fly and according to the input traffic. As a final contribution, the thesis also examines two additional DSPs: one operating the CT domain and one using the companding technique. The former operates only on level-crossing samples; the proposed methodology shows a potential for high-quality outputs by using a complex interpolation function. Finally, a companding DSP is presented for MPEG audio. Companding DSPs adapt their dynamic range to the amplitude of their input; the resulting can offer high-quality outputs even for small inputs. By applying companding to MPEG DSPs, it is shown how the DSP distortion can be made almost inaudible, without requiring complex arithmetic hardware.Electrical engineering, Computer science, Computer engineeringcv2176Electrical EngineeringThesesLarge Scale Nearest Neighbor Search - Theories, Algorithms, and Applications
https://academiccommons.columbia.edu/catalog/ac:168490
He, Junfeng10.7916/D83776PGThu, 08 Jun 2017 16:10:58 +0000We are witnessing a data explosion era, in which huge data sets of billions or more samples represented by high-dimensional feature vectors can be easily found on the Web, enterprise data centers, surveillance sensor systems, and so on. On these large scale data sets, nearest neighbor search is fundamental for lots of applications including content based search/retrieval, recommendation, clustering, graph and social network research, as well as many other machine learning and data mining problems.
Exhaustive search is the simplest and most straightforward way for nearest neighbor search, but it can not scale up to huge data set at the sizes as mentioned above. To make large scale nearest neighbor search practical, we need the online search step to be sublinear in terms of the database size, which means offline indexing is necessary. Moreover, to achieve sublinear search time, we usually need to make some sacrifice on the search accuracy, and hence we can often only obtain approximate nearest neighbor instead of exact nearest neighbor. In other words, by large scale nearest neighbor search, we aim at approximate nearest neighbor search methods with sublinear online search time via offline indexing.
To some extent, indexing a vector dataset for (sublinear time) approximate search can be achieved by partitioning the feature space to different regions, and mapping each point to its closet regions. There are different kinds of partition structures, for example, tree based partition, hashing based partition, clustering/quantization based partition, etc. From the viewpoint of how the data partition function is generated, the partition methods can be grouped into two main categories: 1. data independent (random) partition such as locality sensitive hashing, randomized trees/forests methods, etc.; 2. data dependent (optimized) partition, such as compact hashing, quantization based indexing methods, and some tree based methods like kd-tree, pca tree, etc.
With the offline indexing/partitioning, online approximate nearest neighbor search usually consists of three steps: locate the query region that the query point falls in, obtain candidates which are the database points in the regions near the query region, and rerank/return candidates. For large scale nearest neighbor search, the key question is: how to design the optimal offline indexing, such that the online search performance is the best, or more specifically, the online search can be as fast as possible, while meeting a required accuracy?
In this thesis, we have studied theories, algorithms, systems and applications for (approximate) nearest neighbor search on large scale data sets, for both indexing with random partition and indexing with learning based partition.
Our specific main contributions are:
1. We unify various nearest neighbor search methods into the data partition framework, and provide a general formulation of optimal data partition, which supports fastest search speed while satisfying a required search accuracy. The formulation is general, and can be used to explain most existing (sublinear) large scale approximate nearest neighbor search methods.
2. For indexing with data-independent partitions, we have developed theories on their lower and upper bounds of time and space complexity, based on the optimal data partition formulation. The bounds are applicable for a general group of methods called Nearest Neighbor Preferred Hashing and Nearest Neighbor Preferred Partition, including, locality sensitive hashing, random forest, and many other random hashing methods, etc. Moreover, we also extend the theory to study how to choose the parameters for indexing methods with random partitions.
3. For indexing with data-dependent partitions, I have applied the same formulation to develop a joint optimization approach with two important criteria: nearest neighbor preserving and region size balancing. we have applied the joint optimization to different partition structures such as hashing and clustering, and achieved several new nearest neighbor search methods, outperforming (or at least comparable) to state-of-the-art solutions for large scale nearest neighbor search.
4. we have further studied fundamental problems for nearest neighbor search beyond search methods, for example, what is the difficulty of nearest neighbor search on a given data set (independent of search methods)? What data properties affect the difficulty and how? How will the theoretical analysis and algorithm design of large scale nearest neighbor search problem be affected by the data set difficulty?
5. Finally, we have applied our nearest neighbor search methods for practical applications. We focus on the development of large visual search engines using new indexing methods developed in this thesis. The techniques can be applied to other domains with data-intensive applications, and moreover, be extended to other applications beyond visual search engine, such as large scale machine learning, data mining, and social network analysis, etc.Computer science, Electrical engineeringjh2700Electrical EngineeringThesesSelected machine learning reductions
https://academiccommons.columbia.edu/catalog/ac:172841
Choromanska, Anna Ewa10.7916/D8QF8QZ9Thu, 08 Jun 2017 16:10:57 +0000Machine learning is a field of science aiming to extract knowledge from the data. Optimization lies in the core of machine learning as many learning problems are formulated as optimization problems, where the goal is to minimize/maximize an objective function. More complex machine learning problems are then often solved by reducing them to simpler sub-problems solvable by known optimization techniques. This dissertation addresses two elements of the machine learning system 'pipeline', designing efficient basic optimization tools tailored to solve specific learning problems, or in other words optimize a specific objective function, and creating more elaborate learning tools with sub-blocks being essentially optimization solvers equipped with such basic optimization tools. In the first part of this thesis we focus on a very specific learning problem where the objective function, either convex or non-convex, involves the minimization of the partition function, the normalizer of a distribution, as is the case in conditional random fields (CRFs) or log-linear models. Our work proposes a tight quadratic bound on the partition function which parameters are easily recovered by a simple algorithm that we propose. The bound gives rise to the family of new optimization learning algorithms, based on bound majorization (we developed batch, both full-rank and low-rank, and semi-stochastic variants), with linear convergence rate that successfully compete with state-of-the-art techniques (among them gradient descent methods, Newton and quasi-Newton methods like L-BFGS, etc.). The only constraint we introduce is on the number of classes which is assumed to be finite and enumerable. The bound majorization method we develop is simultaneously the first reduction scheme discussed in this thesis, where throughout this thesis by 'reduction' we understand the learning approach or algorithmic technique converting a complex machine learning problem into a set of simpler problems (that can be as small as a single problem). Secondly, we focus on developing two more sophisticated machine learning tools, for solving harder learning problems. The tools that we develop are built from basic optimization sub-blocks tailored to solve simpler optimization sub-problems. We first focus on the multi class classification problem where the number of classes is very large. We reduce this problem to a set of simpler sub-problems that we solve using basic optimization methods performing additive update on the parameter vector. Secondly we address the problem of learning data representation when the data is unlabeled for any classification task. We reduce this problem to a set of simpler sub-problems that we solve using basic optimization methods, however this time the parameter vector is updated multiplicatively. In both problems we assume that the data come in a stream that can even be infinite. We will now provide more specific description of each of these problems and describe our approach for solving them. In the multi class classification problem it is desirable to achieve train and test running times which are logarithmic in the label complexity. The existing approaches to this problem are either intractable or do not adapt well to the data. We propose a reduction of this problem to a set of binary regression problems organized in a tree structure and introduce a new splitting criterion (objective function) allowing gradient descent style optimization (bound optimization methods can also be used). A decision tree algorithm that we obtain differs from traditional decision trees in the objective optimized, and in how that optimization is done. The different objective has useful properties, such us it guarantees balanced and small-error splits, while the optimization uses an online learning algorithm that is queried and trained simultaneously as we pass over the data. Furthermore, we prove an upper-bound on the number of splits required to reduce the entropy of the tree leafs below small threshold. We empirically show that the trees we obtain have logarithmic depth, which implies logarithmic training and testing running times, and significantly smaller error than random trees. Finally, we consider the problem of unsupervised (clustering) learning of data representation, where the quality of obtained clustering is measured using a very simple, intuitive and widely cited clustering objective, k-means clustering objective. We introduce a family of online clustering algorithms by extending algorithms for online supervised learning, with access to expert predictors (which are basic sub-blocks of our learning system), to the unsupervised learning setting. The parameter vector corresponds to the probability distribution over the experts. Different update rules for the parameter vector depend on an approximation to the current value of the k-means clustering objective obtained by each expert, and model different levels of non-stationarity in the data. We show that when the experts are batch clustering algorithms with approximation guarantees with respect to the k-means clustering objective, applied to a sliding window of the data stream, our algorithms obtain approximation guarantees with respect to the k-means clustering objective. Thus simultaneously we address an open problem posed by Dasgupta for approximating k-means clustering objective on data streams. We experimentally show that our algorithms' empirical performance tracks that of the best clustering algorithm in its experts set and that our algorithms outperform widely used online algorithms.Computer scienceaec2163Electrical EngineeringThesesDesign of Scalable On-Demand Video Streaming Systems Leveraging Video Viewing Patterns
https://academiccommons.columbia.edu/catalog/ac:166939
Hwang, Kyung-Wook10.7916/D84T6RQBThu, 08 Jun 2017 16:10:57 +0000The explosive growth in on-demand access of video across all forms of delivery (Internet, traditional cable, IPTV, wireless) has renewed the interest in scalable delivery methods. Approaches using Content Delivery Networks (CDNs), Peer-to-Peer (P2P) approaches, and their combinations have been proposed as viable options to ease the load on servers and network links. However, there has been little focus on how to take advantage of user viewing patterns to understand their impact on existing mechanisms and to design new solutions that improve the streaming service quality.
In this dissertation, we leverage on the observation that users watch only a small portion of videos to understand the limits of existing designs and to optimize two scalable approaches -- the content placement and P2P Video-on-Demand (VoD) streaming. Then, we present our novel scalable system called Joint-Family which enables adaptive bitrate streaming (ABR) in P2P VoD, supporting user viewing patterns.
We first provide evidence of such user viewing behavior from data collected from a nationally deployed VoD service. In contrast to using a simplistic popularity-based placement and traditionally proposed caching strategies (such as CDNs), we use a Mixed Integer Programming formulation to model the placement problem and employ an innovative approach that scales well. We have performed detailed simulations using actual traces of user viewing sessions (including stream control operations such as pause, fast-forward, and rewind). Our results show that the use of segment-based placement strategy yields substantial savings in both disk storage requirements at origin servers/VHOs as well as network bandwidth use. For example, compared to a simple caching scheme using full videos, our MIP-based placement using segments can achieve up to 71% reduction in peak link bandwidth usage.
Secondly, we note that the policies adopted in existing P2P VoD systems have not taken user viewing behavior -- that users abandon videos -- into account. We show that abandonment can result in increased interruptions and wasted resources. As a result, we reconsider the set of policies to use in the presence of abandonment. Our goal is to balance the conflicting needs of delivering videos without interruptions while minimizing wastage. We find that an Earliest-First chunk selection policy in conjunction with the Earliest-Deadline peer selection policy allows us to achieve high download rates. We take advantage of abandonment by converting peers to "partial seeds"; this increases capacity. We minimize wastage by using a playback lookahead window. We use analysis and simulation experiments using real-world traces to show the effectiveness of our approach.
Finally, we propose Joint-Family, a protocol that combines P2P and adaptive bitrate (ABR) streaming for VoD. While P2P for VoD and ABR have been proposed previously, they have not been studied together because they attempt to tackle problems with seemingly orthogonal goals. We motivate our approach through analysis that overcomes a misconception resulting from prior analytical work, and show that the popularity of a P2P swarm and seed staying time has a significant bearing on the achievable per-receiver download rate. Specifically, our analysis shows that popularity affects swarm efficiency when seeds stay "long enough". We also show that ABR in a P2P setting helps viewers achieve higher playback rates and/or fewer interruptions.
We develop the Joint-Family protocol based on the observations from our analysis. Peers in Joint-Family simultaneously participate in multiple swarms to exchange chunks of different bitrates. We adopt chunk, bitrate, and peer selection policies that minimize occurrence of interruptions while delivering high quality video and improving the efficiency of the system. Using traces from a large-scale commercial VoD service, we compare Joint-Family with existing approaches for P2P VoD and show that viewers in Joint-Family enjoy higher playback rates with minimal interruption, irrespective of video popularity.Computer science, Computer engineering, Electrical engineeringElectrical EngineeringThesesEnergy Harvesting Networked Nodes: Measurements, Algorithms, and Prototyping
https://academiccommons.columbia.edu/catalog/ac:161643
Gorlatova, Maria10.7916/D84J0NB4Thu, 08 Jun 2017 13:59:27 +0000Recent advances in ultra-low-power wireless communications and in energy harvesting will soon enable energetically self-sustainable wireless devices. Networks of such devices will serve as building blocks for different Internet of Things (IoT) applications, such as searching for an object on a network of objects and continuous monitoring of object configurations. Yet, numerous challenges need to be addressed for the IoT vision to be fully realized. This thesis considers several challenges related to ultra-low-power energy harvesting networked nodes: energy source characterization, algorithm design, and node design and prototyping. Additionally, the thesis contributes to engineering education, specifically to project-based learning. We summarize our contributions to light and kinetic (motion) energy characterization for energy harvesting nodes. To characterize light energy, we conducted a first-of-its kind 16 month-long indoor light energy measurements campaign. To characterize energy of motion, we collected over 200 hours of human and object motion traces. We also analyzed traces previously collected in a study with over 40 participants. We summarize our insights, including light and motion energy budgets, variability, and influencing factors. These insights are useful for designing energy harvesting nodes and energy harvesting adaptive algorithms. We shared with the community our light energy traces, which can be used as energy inputs to system and algorithm simulators and emulators. We also discuss resource allocation problems we considered for energy harvesting nodes. Inspired by the needs of tracking and monitoring IoT applications, we formulated and studied resource allocation problems aimed at allocating the nodes' time-varying resources in a uniform way with respect to time. We mainly considered deterministic energy profile and stochastic environmental energy models, and focused on single node and link scenarios. We formulated optimization problems using utility maximization and lexicographic maximization frameworks, and introduced algorithms for solving the formulated problems. For several settings, we provided low-complexity solution algorithms. We also examined many simple policies. We demonstrated, analytically and via simulations, that in many settings simple policies perform well. We also summarize our design and prototyping efforts for a new class of ultra-low-power nodes - Energy Harvesting Active Networked Tags (EnHANTs). Future EnHANTs will be wireless nodes that can be attached to commonplace objects (books, furniture, clothing). We describe the EnHANTs prototypes and the EnHANTs testbed that we developed, in collaboration with other research groups, over the last 4 years in 6 integration phases. The prototypes harvest energy of the indoor light, communicate with each other via ultra-low-power transceivers, form small multihop networks, and adapt their communications and networking to their energy harvesting states. The EnHANTs testbed can expose the prototypes to light conditions based on real-world light energy traces. Using the testbed and our light energy traces, we evaluated some of our energy harvesting adaptive policies. Our insights into node design and performance evaluations may apply beyond EnHANTs to networks of various energy harvesting nodes. Finally, we present our contributions to engineering education. Over the last 4 years, we engaged high school, undergraduate, and M.S. students in more than 100 research projects within the EnHANTs project. We summarize our approaches to facilitating student learning, and discuss the results of evaluation surveys that demonstrate the effectiveness of our approaches.Electrical engineering, Computer sciencemag2206Electrical EngineeringThesesCharacterizing Audio Events for Video Soundtrack Analysis
https://academiccommons.columbia.edu/catalog/ac:156982
Cotton, Courtenay10.7916/D8HM5GN3Thu, 08 Jun 2017 13:54:41 +0000There is an entire emerging ecosystem of amateur video recordings on the internet today, in addition to the abundance of more professionally produced content. The ability to automatically scan and evaluate the content of these recordings would be very useful for search and indexing, especially as amateur content tends to be more poorly labeled and tagged than professional content. Although the visual content is often considered to be of primary importance, the audio modality contains rich information which may be very helpful in the context of video search and understanding. Any technology that could help to interpret video soundtrack data would also be applicable in a number of other scenarios, such as mobile device audio awareness, surveillance, and robotics. In this thesis we approach the problem of extracting information from these kinds of unconstrained audio recordings. Specifically we focus on techniques for characterizing discrete audio events within the soundtrack (e.g. a dog bark or door slam), since we expect events to be particularly informative about content. Our task is made more complicated by the extremely variable recording quality and noise present in this type of audio. Initially we explore the idea of using the matching pursuit algorithm to decompose and isolate components of audio events. Using these components we develop an approach for non-exact (approximate) fingerprinting as a way to search audio data for similar recurring events. We demonstrate a proof of concept for this idea. Subsequently we extend the use of matching pursuit to build an actual audio fingerprinting system, with the goal of identifying simultaneously recorded amateur videos (i.e. videos taken in the same place at the same time by different people, which contain overlapping audio). Automatic discovery of these simultaneous recordings is one particularly interesting facet of general video indexing. We evaluate this fingerprinting system on a database of 733 internet videos. Next we return to searching for features to directly characterize soundtrack events. We develop a system to detect transient sounds and represent audio clips as a histogram of the transients it contains. We use this representation for video classification over a database of 1873 internet videos. When we combine these features with a spectral feature baseline system we achieve a relative improvement of 7.5% in mean average precision over the baseline. In another attempt to devise features to better describe and compare events, we investigate decomposing audio using a convolutional form of non-negative matrix factorization, resulting in event-like spectro-temporal patches. We use the resulting representation to build an event detection system that is more robust to additive noise than a comparative baseline system. Lastly we investigate a promising feature representation that has been used by others previously to describe event-like sound effect clips. These features derive from an auditory model and are meant to capture fine time structure in sound events. We compare these features and a related but simpler feature set on the task of video classification over 9317 internet videos. We find that combinations of these features with baseline spectral features produce a significant improvement in mean average precision over the baseline.Electrical engineering, Computer sciencecvc2106Electrical EngineeringThesesLarge-Scale Pattern Discovery in Music
https://academiccommons.columbia.edu/catalog/ac:156137
Bertin-Mahieux, Thierry10.7916/D8NC67CTThu, 08 Jun 2017 13:54:37 +0000This work focuses on extracting patterns in musical data from very large collections. The problem is split in two parts. First, we build such a large collection, the Million Song Dataset, to provide researchers access to commercial-size datasets. Second, we use this collection to study cover song recognition which involves finding harmonic patterns from audio features. Regarding the Million Song Dataset, we detail how we built the original collection from an online API, and how we encouraged other organizations to participate in the project. The result is the largest research dataset with heterogeneous sources of data available to music technology researchers. We demonstrate some of its potential and discuss the impact it already has on the field. On cover song recognition, we must revisit the existing literature since there are no publicly available results on a dataset of more than a few thousand entries. We present two solutions to tackle the problem, one using a hashing method, and one using a higher-level feature computed from the chromagram (dubbed the 2DFTM). We further investigate the 2DFTM since it has potential to be a relevant representation for any task involving audio harmonic content. Finally, we discuss the future of the dataset and the hope of seeing more work making use of the different sources of data that are linked in the Million Song Dataset. Regarding cover songs, we explain how this might be a first step towards defining a harmonic manifold of music, a space where harmonic similarities between songs would be more apparent.Electrical engineering, Computer sciencetb2332Electrical EngineeringThesesOn Optimal Quantization and its Effect on Anomaly Detection and Image Classification
https://academiccommons.columbia.edu/catalog/ac:156976
Beigi, Mandis10.7916/D8891D1DThu, 08 Jun 2017 13:53:49 +0000This thesis presents the use of density estimation for performing data classification in different applications such as stream processing as well as image classification. The first half of this thesis presents a system that can process and analyze streaming data and extract the time frames that contain potential events of interest or anomalies without requiring any prior domain knowledge. The proposed method performs real time monitoring and mining of streaming data at multiple temporal scales simultaneously to maximize the probability of detection of anomalous events that span different lengths of time. The method does not assume the data segments containing anomalies belong to any particular distribution and therefore does not require prior domain knowledge. The system learns the evolution of normal behavior in streaming data and builds a model over time and uses it to determine whether the new incoming data fits that model. When analyzing streaming data, it is important for the algorithm to be fast with low computational complexity and therefore such aspects as well as the detection accuracy are studied and the results are presented. The algorithm is general and can be used for any type of streaming data. In the second half of this thesis, the feasibility of using density estimation in higher dimensions and in particular for visual descriptors is presented. A method for classifying images is proposed which uses density estimation to optimally quantize the feature space to generate a codebook used by a bag-of-features (BoF) image classifier. This thesis shows that the optimal smoothing calculation in density estimation can be used to systematically quantize the feature space to generate codebooks that can be used in image classification.Electrical engineering, Computer scienceElectrical EngineeringThesesLarge-Scale Machine Learning for Classification and Search
https://academiccommons.columbia.edu/catalog/ac:153513
Liu, Wei10.7916/D8F195S6Wed, 07 Jun 2017 16:59:17 +0000With the rapid development of the Internet, nowadays tremendous amounts of data including images and videos, up to millions or billions, can be collected for training machine learning models. Inspired by this trend, this thesis is dedicated to developing large-scale machine learning techniques for the purpose of making classification and nearest neighbor search practical on gigantic databases. Our first approach is to explore data graphs to aid classification and nearest neighbor search. A graph offers an attractive way of representing data and discovering the essential information such as the neighborhood structure. However, both of the graph construction process and graph-based learning techniques become computationally prohibitive at a large scale. To this end, we present an efficient large graph construction approach and subsequently apply it to develop scalable semi-supervised learning and unsupervised hashing algorithms. Our unique contributions on the graph-related topics include: 1. Large Graph Construction: Conventional neighborhood graphs such as kNN graphs require a quadratic time complexity, which is inadequate for large-scale applications mentioned above. To overcome this bottleneck, we present a novel graph construction approach, called Anchor Graphs, which enjoys linear space and time complexities and can thus be constructed over gigantic databases efficiently. The central idea of the Anchor Graph is introducing a few anchor points and converting intensive data-to-data affinity computation to drastically reduced data-to-anchor affinity computation. A low-rank data-to-data affinity matrix is derived using the data-to-anchor affinity matrix. We also theoretically prove that the Anchor Graph lends itself to an intuitive probabilistic interpretation by showing that each entry of the derived affinity matrix can be considered as a transition probability between two data points through Markov random walks. 2. Large-Scale Semi-Supervised Learning: We employ Anchor Graphs to develop a scalable solution for semi-supervised learning, which capitalizes on both labeled and unlabeled data to learn graph-based classification models. We propose several key methods to build scalable semi-supervised kernel machines such that real-world linearly inseparable data can be tackled. The proposed techniques take advantage of the Anchor Graph from a kernel point of view, generating a set of low-rank kernels which are made to encompass the neighborhood structure unveiled by the Anchor Graph. By linearizing these low-rank kernels, training nonlinear kernel machines in semi-supervised settings can be simplified to training linear SVMs in supervised settings, so the computational cost for classifier training is substantially reduced. We accomplish excellent classification performance by applying the proposed semi-supervised kernel machine - a linear SVM with a linearized Anchor Graph warped kernel. 3. Unsupervised Hashing: To achieve fast point-to-point search, compact hashing with short hash codes has been suggested, but how to learn codes such that good search performance is achieved remains a challenge. Moreover, in many cases real-world data sets are assumed to live on manifolds, which should be taken into account in order to capture meaningful nearest neighbors. To this end, we present a novel unsupervised hashing approach based on the Anchor Graph which captures the underlying manifold structure. The Anchor Graph Hashing approach allows constant time hashing of a new data point by extrapolating graph Laplacian eigenvectors to eigenfunctions. Furthermore, a hierarchical threshold learning procedure is devised to produce multiple hash bits for each eigenfunction, thus leading to higher search accuracy. As a result, Anchor Graph Hashing exhibits good search performance in finding semantically similar neighbors. To address other practical application scenarios, we further develop advanced hashing techniques that incorporate supervised information or leverage unique formulations to cope with new forms of queries such as hyperplanes. 4. Supervised Hashing: Recent research has shown that the hashing quality could be boosted by leveraging supervised information into hash function learning. However, the existing methods either lack adequate performance or often incur cumbersome model training. To this end, we present a novel kernel-based supervised hashing model which requires a limited amount of supervised information in the form of similar and dissimilar data pairs, and is able to achieve high hashing quality at a practically feasible training cost. The idea is to map the data to compact binary codes whose Hamming distances are simultaneously minimized on similar pairs and maximized on dissimilar pairs. Our approach is distinct from prior work in utilizing the equivalence between optimizing the code inner products and the Hamming distances. This enables us to sequentially and efficiently train the hash functions one bit at a time, yielding very short yet discriminative codes. The presented supervised hashing approach is general, allowing search of both semantically similar neighbors and metric distance neighbors. 5. Hyperplane Hashing: Hyperplane hashing aims at rapidly searching the database points near a given hyperplane query, and has shown practical impact on large-scale active learning with SVMs. The existing hyperplane hashing methods are randomized and need long hash codes to achieve reasonable search accuracy, thus resulting in reduced search speed and large memory overhead. We present a novel hyperplane hashing technique which yields high search accuracy with compact hash codes. The key idea is a novel bilinear form used in designing the hash functions, leading to a higher collision probability than all of the existing hyperplane hash functions when using random projections. To further increase the performance, we develop a learning based framework in which the bilinear functions are directly learned from the input data. This results in compact yet discriminative codes, as demonstrated by the superior search performance over all random projection based solutions. We divide the thesis into two parts: scalable classification with graphs (topics 1 and 2 mentioned above), and nearest neighbor search with hashing (topics 3, 4 and 5 mentioned above). The two parts are connected in the sense that the idea of Anchor Graphs in Part I enables not only scalable classification but also unsupervised hashing, and hyperplane hashing in Part II can directly benefit classification under an active learning framework. All of the machine learning techniques developed in this thesis emphasize and pursue excellent performance in both speed and accuracy, which are verified through extensive experiments carried out on various large-scale tasks of classification and search. The addressed problems, classification and nearest neighbor search, are fundamental for many real-world applications. Therefore, we expect that the proposed solutions based on graphs and hashing will have a tremendous impact on a great number of realistic large-scale applications.Computer science, Information technology, Computer engineeringwl2223Electrical EngineeringThesesNoise Robust Pitch Tracking by Subband Autocorrelation Classification
https://academiccommons.columbia.edu/catalog/ac:153352
Lee, Byung Suk10.7916/D8SJ1SPJWed, 07 Jun 2017 16:58:53 +0000Speech pitch tracking is one of the elementary tasks of the Computational Auditory Scene Analysis (CASA). While a human can easily listen to the voiced pitch in highly noisy recordings, the performance of automatic speech pitch tracking degrades in unknown noisy audio conditions. Traditional pitch trackers use either autocorrelation or the Fourier transform to calculate periodicity, which works well for clean recordings. For noisy recordings, however, the accuracy of these pitch trackers degrades in general. For example, the information in parts of the frequency spectrum may be lost due to analog radio band transmission and/or contain additive noise of various kinds. Instead of explicitly using the most obvious features of autocorrelation, we propose a trained classier-based approach, which we call Subband Autocorrelation Classification (SAcC). A multi-layer perceptron (MLP) classier is trained on the principal components of the autocorrelations of subbands from an auditory filterbank. The output of the MLP classifier is temporally smoothed to produce the pitch track by finding the Viterbi path of a Hidden Markov Model (HMM). Training on various types of noisy speech recordings leads to a great increase in performance over state-of-the-art algorithms, according to both the traditional Gross Pitch Error (GPE) measure, and a proposed novel Pitch Tracking Error (PTE) which more fully reflects the accuracy of both pitch estimation/extraction and voicing detection in a single measure. To verify the generalization and specificity of SAcC, we test SAcC on a real world problem that has a large-scale noisy speech corpus. The data is from the DARPA Robust Automatic Transcription of Speech (RATS) program. The experiments on the performance evaluation of SAcC pitch tracking confirm the generalization power of SAcC across various unknown noise conditions and distinct speech corpora. We also report the use of SAcC output adds a significant improvement to a Speaker Identification (SID) system for RATS as well, suggesting the potential contribution of SAcC pitch tracking in the higher-level tasks.Electrical engineering, Computer sciencebl2012Electrical EngineeringThesesMitigating Network Service Disruptions in High-bandwidth, Intermittently Connected, and Peer-to-Peer Networks
https://academiccommons.columbia.edu/catalog/ac:153329
Hong, Se Gi10.7916/D8ZC8908Wed, 07 Jun 2017 16:58:03 +0000Users demand high-bandwidth, ubiquitous and low-cost network services. This demand has pushed ISPs and application providers to offer more bandwidth, allow users to access the Internet almost everywhere, and provide cheap or free network services using peer-to-peer networks. These three trends underlie the growing success of today's Internet. However, (1) high-bandwidth can empower more effective denial-of-service attacks; (2) Internet access is widespread, but still not ubiquitous; and (3) peer-to-peer network services need to solve the service discovery problem. This thesis addresses these three challenges. First, we tackle denial-of-service attacks. The high bandwidth available in many parts of the Internet allows denial-of-service attacks to be effective, and the large scale of the Internet makes detecting and preventing these attacks difficult. Anonymity and openness of the Internet worsens this problem because anyone can send anything to anybody. To prevent these denial-of-service attacks, we propose Permission-Based-Sending (PBS), a signaling architecture for network traffic authorization. PBS uses the explicit permission to give legitimate users the authority to send packets. Signaling is used to configure this permission in the data path. This signaling approach enables easy installation for granting authorization to flows, and allows PBS to be deployed in existing networks. In addition, a monitoring mechanism provides a second line of defense against attacks. Next, we strive to make Internet access more ubiquitous. When public transportation stations have access points to provide Internet access to passengers, public transportation becomes a more attractive travel and commute option. However, the Internet connectivity is intermittent because passengers can access the Internet only when a bus or train is within the networking coverage of an AP at a stop. To efficiently handle this intermittent network for the public transit system, we develop Internet Cache on Wheels (ICOW), a system that provides a low-cost way for bus and train operators to offer access to Internet content. Each bus and train car is equipped with a smart cache that serves popular content to passengers. The cache updates its content based on passenger requests when it is within range of Internet access points placed at bus stops, train stations or depots. This aggregated Internet access is significantly more efficient than having passengers contact Internet access points individually and ensures continuous availability of content throughout the journey. Finally, we consider peer-to-peer services. Typical service discovery mechanisms in peer-to-peer networks cause significant overhead, consuming energy and bandwidth: (1) in highly mobile networks, service discovery consumes the energy of mobile devices to discover services that newly joined members provide; and (2) peer-to-peer network systems consumes bandwidth during service discovery. To resolve and analyze these service discovery problems, (1) we design an efficient service discovery mechanism that reduces energy consumption of mobile devices; and (2) we evaluate the bandwidth consumption caused by service discovery in real-world peer-to-peer networks.Electrical engineering, Computer sciencesh2208Electrical EngineeringThesesSemi-Supervised Learning for Scalable and Robust Visual Search
https://academiccommons.columbia.edu/catalog/ac:130780
Wang, Jun10.7916/D80K2GH5Fri, 02 Jun 2017 15:28:04 +0000Unlike textual document retrieval, searching of visual data is still far from satisfactory. There exist major gaps between the available solutions and practical needs in both accuracy and computational cost. This thesis aims at the development of robust and scalable solutions for visual search and retrieval. Specifically, we investigate two classes of approaches: graph-based semi-supervised learning and hashing techniques. The graph-based approaches are used to improve accuracy, while hashing approaches are used to improve efficiency and cope with large-scale applications. A common theme shared between these two subareas of our work is the focus on semi-supervised learning paradigm, in which a small set of labeled data is complemented with large unlabeled datasets. Graph-based approaches have emerged as methods of choice for general semi-supervised tasks when no parametric information is available about the data distribution. It treats both labeled and unlabeled samples as vertices in a graph and then instantiates pairwise edges between these vertices to capture affinity between the corresponding samples. A quadratic regularization framework has been widely used for label prediction over such graphs. However, most of the existing graph-based semi-supervised learning methods are sensitive to the graph construction process and the initial labels. We propose a new bivariate graph transduction formulation and an efficient solution via an alternating minimization procedure. Based on this bivariate framework, we also develop new methods to filter unreliable and noisy labels. Extensive experiments over diverse benchmark datasets demonstrate the superior performance of our proposed methods. However, graph-based approaches suffer from the critical bottleneck in scalability since graph construction requires a quadratic complexity and the inference procedure costs even more. The widely used graph construction method relies on nearest neighbor search, which is prohibitive for large-scale applications. In addition, most large-scale visual search problems involve handling high-dimensional visual descriptors, thereby causing another challenge in excessive storage requirement. To handle the scalability issue of both computation and storage, the second part of the thesis focuses on efficient techniques for conducting approximate nearest neighbor (ANN) search, which is key to many machine learning algorithms, including graph-based semi-supervised learning and clustering. Specifically, we propose Semi-Supervised Hashing (SSH) methods that leverage semantic similarity over a small set of labeled data while preventing overfitting. We derive a rigorous formulation in which a supervised term minimizes the empirical errors on the labeled data and an unsupervised term provides effective regularization by maximizing variance and independence of individual bits. Experiments on several large datasets demonstrate the clear performance gain over several state-of-the-art methods without significant increase of the computational cost. The main contributions of the thesis include the following. Bivariate graph transduction: a) a bivariate formulation for graph-based semi-supervised learning with an efficient solution by alternating optimization; b) theoretic analysis from the view of graph cut for the bivariate optimization procedure; c) novel applications of the proposed techniques, such as interactive image retrieval, automatic re-ranking for text based image search, and a brain computer interface (BCI) for image retrieval. Semi-supervised hashing: a) a rigorous semi-supervised paradigm for hash functions learning with a tradeoff between empirical fitness on pair-wise label consistence and an information-theoretic regularizer; b) several efficient solutions for deriving semi-supervised hash functions, including an orthogonal solution using eigen-decomposition, a revised strategy for learning non-orthogonal hash functions, a sequential learning algorithm to derive boosted hash functions, and an extension to unsupervised cases by using pseudo labels. Two parts of the thesis - bivariate graph transduction and semi-supervised hashing - are complimentary and can be combined to achieve significant performance improvement in both speed and accuracy. Hash methods can help build sparse graphs in a linear time fashion and greatly reduce the data size, but they lack sufficient accuracy. Graph-based methods provide unique capabilities to handle non-linear data structures with noisy labels but suffer from high computational complexity. The synergistic combination of the two offers great potential for advancing the state-of-the-art in large-scale visual search and many other applications.Computer science, Information sciencejw2494Electrical EngineeringThesesSingle Channel auditory source separation with neural network
https://academiccommons.columbia.edu/catalog/ac:02v6wwpzgs
Chen, Zhuo10.7916/D8W09C8NFri, 05 May 2017 22:45:17 +0000Although distinguishing diﬀerent sounds in noisy environment is a relative easy task for human, source separation has long been extremely diﬃcult in audio signal processing. The problem is challenging for three reasons: the large variety of sound type, the abundant mixing conditions and the unclear mechanism to distinguish sources, especially for similar sounds.
In recent years, the neural network based methods achieved impressive successes in various problems, including the speech enhancement, where the task is to separate the clean speech out of the noise mixture. However, the current deep learning based source separator does not perform well on real recorded noisy speech, and more importantly, is not applicable in a more general source separation scenario such as overlapped speech.
In this thesis, we ﬁrstly propose extensions for the current mask learning network, for the problem of speech enhancement, to ﬁx the scale mismatch problem which is usually occurred in real recording audio. We solve this problem by combining two additional restoration layers in the existing mask learning network. We also proposed a residual learning architecture for the speech enhancement, further improving the network generalization under diﬀerent recording conditions. We evaluate the proposed speech enhancement models on CHiME 3 data. Without retraining the acoustic model, the best bi-direction LSTM with residue connections yields 25.13% relative WER reduction on real data and 34.03% WER on simulated data.
Then we propose a novel neural network based model called “deep clustering” for more general source separation tasks. We train a deep network to assign contrastive embedding vectors to each time-frequency region of the spectrogram in order to implicitly predict the segmentation labels of the target spectrogram from the input mixtures. This yields a deep network-based analogue to spectral clustering, in that the embeddings form a low-rank pairwise aﬃnity matrix that approximates the ideal aﬃnity matrix, while enabling much faster performance. At test time, the clustering step “decodes” the segmentation implicit in the embeddings by optimizing K-means with respect to the unknown assignments. Experiments on single channel mixtures from multiple speakers show that a speaker-independent model trained on two-speaker and three speakers mixtures can improve signal quality for mixtures of held-out speakers by an average over 10dB.
We then propose an extension for deep clustering named “deep attractor” network that allows the system to perform eﬃcient end-to-end training. In the proposed model, attractor points for each source are ﬁrstly created the acoustic signals which pull together the time-frequency bins corresponding to each source by ﬁnding the centroids of the sources in the embedding space, which are subsequently used to determine the similarity of each bin in the mixture to each source. The network is then trained to minimize the reconstruction error of each source by optimizing the embeddings. We showed that this frame work can achieve even better results.
Lastly, we introduce two applications of the proposed models, in singing voice separation and the smart hearing aid device. For the former, a multi-task architecture is proposed, which combines the deep clustering and the classiﬁcation based network. And a new state of the art separation result was achieved, where the signal to noise ratio was improved by 11.1dB on music and 7.9dB on singing voice. In the application of smart hearing aid device, we combine the neural decoding with the separation network. The system ﬁrstly decodes the user’s attention, which is further used to guide the separator for the targeting source. Both objective study and subjective study show the proposed system can accurately decode the attention and significantly improve the user experience.Computer science, Statistics, Neural networks (Computer science), Source separation (Signal processing)zc2204Electrical EngineeringTheses