Theses Doctoral

Essays on the Applications of Machine Learning in Financial Markets

Wang, Muye

We consider the problems commonly encountered in asset management such as optimal execution, portfolio construction, and trading strategy implementation. These problems are generally difficult in practice, in large part due to the uncertainties in financial markets. In this thesis, we develop data-driven approaches via machine learning to better address these problems and improve decision making in financial markets. Machine learning refers to a class of statistical methods that capture patterns in data. Conventional methods, such as regression, have been widely used in finance for many decades. In some cases, these methods have become important building blocks for many fundamental theories in empirical financial studies. However, newer methods such as tree-based models and neural networks remain elusive in financial literature, and their usabilities in finance are still poorly understood. The objective of this thesis is to understand the various tradeoffs these newer machine learning methods bring, and to what extent they can improve a market participant’s utility.

In the first part of this thesis, we consider the decision between the use of market orders and limit orders. This is an important question in practical optimal trading problems. A key ingredient in making this decision is understanding the uncertainty of the execution of a limit order, that is, the fill probability or the probability that an order will be executed within a certain time horizon. Equivalently, one can estimate the distribution of the time-to-fill. We propose a data-driven approach based on a recurrent neural network to estimate the distribution of time-to-fill for a limit order conditional on the current market conditions. Using a historical data set, we demonstrate the superiority of this approach to several benchmark techniques. This approach also leads to significant cost reduction while implementing a trading strategy in a prototypical trading problem.

In the second part of the thesis, we formulate a high-frequency optimal execution problem as an optimal stopping problem. Through reinforcement learning, we develop a data-driven approach that incorporates price predictabilities and limit order book dynamics. A deep neural network is used to represent continuation values. Our approach outperforms benchmark methods including a supervised learning method based on price prediction. With a historic NASDAQ ITCH data set, we empirically demonstrate a significant cost reduction. Various tradeoffs between Temporal Difference learning and Monte Carlo method are also discussed. Another interesting insight is the existence of a certain universality across stocks — the patterns learned from trading one stock can be generalized to another stock.

In the last part of the thesis, we consider the problem of estimating the covariance matrix of high-dimensional asset return. One of the conventional methods is through the use of linear factor models and their principal component analysis estimation. In this chapter, we generalize linear factor models to a general framework of nonlinear factor models using variational autoencoders. We show that linear factor models are equivalent to a class of linear variational autoencoders. Further- more, nonlinear variational autoencoders can be viewed as an extension to linear factor models by relaxing the linearity assumption. An application of covariance estimation is to construct minimum variance portfolio. Through numerical experiments, we demonstrate that variational autoencoder improves upon linear factor models and leads to a more superior minimum variance portfolio.


  • thumnail for Wang_columbia_0054D_16562.pdf Wang_columbia_0054D_16562.pdf application/pdf 786 KB Download File

More About This Work

Academic Units
Thesis Advisors
Moallemi, Ciamac Cyrus
Maglaras, Costis CM
Ph.D., Columbia University
Published Here
June 16, 2021