2024 Theses Doctoral
Advances in Non-Stationary Sequential Decision-Making
We study the problem of sequential decision-making (e.g. multi-armed bandits, contextual bandits, reinforcement learning) under changing environments, or distribution shifts. Ideally, one aims to automatically adapt/self-tune to unknown changes in distribution, and restart exploration as needed. While recent theoretical breakthroughs show this is possible in a broad sense, such works contend that the learner should restart procedures upon experiencing any change leading to worst-case (regret) rates. This leaves open whether faster rates are possible, adaptively, if few changes in distribution are actually severe, e.g., involve no change in best action.
This thesis initiates a broad research program giving positive answers to these open questions across several instances. In particular, we begin at non-stationary bandits and show a much weaker notion of change can be adapted to, which can yield significantly faster rates than previously known, whether as expressed in terms of number of best action switches--for which no adaptive procedure was known, or in terms of previously studied variation or smoothness measures. We then generalize these results to non-parametric contextual bandits and dueling bandits. As a result, we substantially improve the theoretical state-of-the-art performance guarantees for these problems and, in many cases, tightly characterize the statistical limits of sequential decision-making under changing environments.
Files
- Suk_columbia_0054D_18692.pdf application/pdf 2.66 MB Download File
More About This Work
- Academic Units
- Statistics
- Thesis Advisors
- Kpotufe, Samory K.
- Degree
- Ph.D., Columbia University
- Published Here
- July 24, 2024