Theses Doctoral

New Algorithms and Analysis Techniques for Reinforcement Learning

Jia, Randy

In the advent of Big Data and Machine Learning, there is a demand for improved decision making in unknown, complex environments. Decision making under uncertainty is a common principle underlying many important decisions made by individuals, businesses, and society as a whole. These problems are typically modeled as multi-armed bandits (MAB), or, more generally, reinforcement learning (RL). In the MAB problem, an agent is faced with many options or arms, each with its own unknown reward distribution, and must determine the sequence of arms to pull, taking into account history of rewards of past pulls. The agent must balance exploration (pulling less-explored arms to learn the model), with exploitation (pulling the current reward maximizing arm so far). RL is a generalized, more complex extension of the MAB problem in which the current state, in addition to the arm or action, impacts the obtained reward. In this thesis, we focus on designing new algorithms to better address RL problems. In particular, we design an algorithm inspired by Thompson sampling for finite communicating MDPs and an algorithm inspired by stochastic convex optimization for some fundamental problems in operations management, including a problem in inventory control. We develop intuitive algorithms and prove theoretical bounds on their regret; in doing so, we derive some theoretically interesting analytical results that may be of independent interest.


  • thumnail for Jia_columbia_0054D_16201.pdf Jia_columbia_0054D_16201.pdf application/pdf 1000 KB Download File

More About This Work

Academic Units
Industrial Engineering and Operations Research
Thesis Advisors
Agrawal, Shipra
Ph.D., Columbia University
Published Here
September 21, 2020