Theses Doctoral

Reinforcement Learning: New Algorithms and An Application for Integer Programming

Tang, Yunhao

Reinforcement learning (RL) is a generic paradigm for the modeling and optimization of sequential decision making. In the recent decade, progress in RL research has brought about breakthroughs in several applications, ranging from playing video games, mastering board games, to controlling simulated robots. To bring the potential benefits of RL to other domains, two elements are critical: (1) Efficient and general-purpose RL algorithms; (2) Formulations of the original applications into RL problems. These two points are the focus of this thesis.

We start by developing more efficient RL algorithms. In Chapter 2, we propose Taylor Expansion Policy Optimization, a model-free algorithmic framework that unifies a number of important prior work as special cases. This unifying framework also allows us to develop a natural algorithmic extension to prior work, with empirical performance gains. In Chapter 3, we propose Monte-Carlo Tree Search as Regularized Policy Optimization, a model-based framework that draws close connections between policy optimization and Monte-Carlo tree search. Building on this insight, we propose Policy Optimization Zero (POZero), a novel algorithm which leverages the strengths of regularized policy search to achieve significant performance gains over MuZero.

To showcase how RL can be applied to other domains where the original applications could benefit from learning systems, we study the acceleration of integer programming (IP) solvers with RL. Due to the ubiquity of IP solvers in industrial applications, such research holds the promise of significant real life impacts and practical values. In Chapter 4, we focus on a particular formulation of Reinforcement Learning for Integer Programming: Learning to Cut. By combining cutting plane methods with selection rules learned by RL, we observe that the RL-augmented cutting plane solver achieves significant performance gains over traditional heuristics. This serves as a proof-of-concept of how RL can be combined with general IP solvers, and how learning augmented optimization systems might achieve significant acceleration in general.


  • thumnail for Tang_columbia_0054D_16884.pdf Tang_columbia_0054D_16884.pdf application/pdf 2.3 MB Download File

More About This Work

Academic Units
Industrial Engineering and Operations Research
Thesis Advisors
Agrawal, Shipra
Ph.D., Columbia University
Published Here
October 20, 2021