2019 Theses Doctoral
Dynamic Machine Learning with Least Square Objectives
As of the writing of this thesis, machine learning has become one of the most active research fields. The interest comes from a variety of disciplines which include computer science, statistics, engineering, and medicine. The main idea behind learning from data is that, when an analytical model explaining the observations is hard to find ---often in contrast to the models in physics such as Newton's laws--- a statistical approach can be taken where one or more candidate models are tuned using data.
Since the early 2000's this challenge has grown in two ways: (i) The amount of collected data has seen a massive growth due to the proliferation of digital media, and (ii) the data has become more complex. One example for the latter is the high dimensional datasets, which can for example correspond to dyadic interactions between two large groups (such as customer and product information a retailer collects), or to high resolution image/video recordings.
Another important issue is the study of dynamic data, which exhibits dependence on time. Virtually all datasets fall into this category as all data collection is performed over time, however I use the term dynamic to hint at a system with an explicit temporal dependence. A traditional example is target tracking from signal processing literature. Here the position of a target is modeled using Newton's laws of motion, which relates it to time via the target's velocity and acceleration.
Dynamic data, as I defined above, poses two important challenges. Firstly, the learning setup is different from the standard theoretical learning setup, also known as Probably Approximately Correct (PAC) learning. To derive PAC learning bounds one assumes a collection of data points sampled independently and identically from a distribution which generates the data. On the other hand, dynamic systems produce correlated outputs. The learning systems we use should accordingly take this difference into consideration. Secondly, as the system is dynamic, it might be necessary to perform the learning online. In this case the learning has to be done in a single pass. Typical applications include target tracking and electricity usage forecasting.
In this thesis I investigate several important dynamic and online learning problems, where I develop novel tools to address the shortcomings of the previous solutions in the literature. The work is divided into three parts for convenience. The first part is about matrix factorization for time series analysis which is further divided into two chapters. In the first chapter, matrix factorization is used within a Bayesian framework to model time-varying dyadic interactions, with examples in predicting user-movie ratings and stock prices. In the next chapter, a matrix factorization which uses autoregressive models to forecast future values of multivariate time series is proposed, with applications in predicting electricity usage and traffic conditions. Inspired by the machinery we use in the first part, the second part is about nonlinear Kalman filtering, where a hidden state is estimated over time given observations. The nonlinearity of the system generating the observations is the main challenge here, where a divergence minimization approach is used to unify the seemingly unrelated methods in the literature, and propose new ones. This has applications in target tracking and options pricing. The third and last part is about cost sensitive learning, where a novel method for maximizing area under receiver operating characteristics curve is proposed. Our method has theoretical guarantees and favorable sample complexity. The method is tested on a variety of benchmark datasets, and also has applications in online advertising.
- Gultekin_columbia_0054D_15172.pdf application/pdf 11.3 MB Download File
More About This Work
- Academic Units
- Electrical Engineering
- Thesis Advisors
- Paisley, John W
- Ph.D., Columbia University
- Published Here
- April 24, 2019