Theses Doctoral

Kernel Approximation Methods for Speech Recognition

May, Avner

Over the past five years or so, deep learning methods have dramatically improved the state of the art performance in a variety of domains, including speech recognition, computer vision, and natural language processing. Importantly, however, they suffer from a number of drawbacks:
1. Training these models is a non-convex optimization problem, and thus it is difficult to guarantee that a trained model minimizes the desired loss function.
2. These models are difficult to interpret. In particular, it is difficult to explain, for a given model, why the computations it performs make accurate predictions.
In contrast, kernel methods are straightforward to interpret, and training them is a convex optimization problem. Unfortunately, solving these optimization problems exactly is typically prohibitively expensive, though one can use approximation methods to circumvent this problem. In this thesis, we explore to what extent kernel approximation methods can compete with deep learning, in the context of large-scale prediction tasks. Our contributions are as follows:
1. We perform the most extensive set of experiments to date using kernel approximation methods in the context of large-scale speech recognition tasks, and compare performance with deep neural networks.
2. We propose a feature selection algorithm which significantly improves the performance of the kernel models, making their performance competitive with fully-connected feedforward neural networks.
3. We perform an in-depth comparison between two leading kernel approximation strategies — random Fourier features [Rahimi and Recht, 2007] and the Nyström method [Williams and Seeger, 2001] — showing that although the Nyström method is better at approximating the kernel, it performs worse than random Fourier features when used for learning.
We believe this work opens the door for future research to continue to push the boundary of what is possible with kernel methods. This research direction will also shed light on the question of when, if ever, deep models are needed for attaining strong performance.

Files

  • thumnail for May_columbia_0054D_14355.pdf May_columbia_0054D_14355.pdf application/pdf 2.22 MB Download File

More About This Work

Academic Units
Computer Science
Thesis Advisors
Collins, Michael J.
Degree
Ph.D., Columbia University
Published Here
January 19, 2018