Theses Doctoral

Advances in Non-Stationary Sequential Decision-Making

Suk, Joseph

We study the problem of sequential decision-making (e.g. multi-armed bandits, contextual bandits, reinforcement learning) under changing environments, or distribution shifts. Ideally, one aims to automatically adapt/self-tune to unknown changes in distribution, and restart exploration as needed. While recent theoretical breakthroughs show this is possible in a broad sense, such works contend that the learner should restart procedures upon experiencing any change leading to worst-case (regret) rates. This leaves open whether faster rates are possible, adaptively, if few changes in distribution are actually severe, e.g., involve no change in best action.

This thesis initiates a broad research program giving positive answers to these open questions across several instances. In particular, we begin at non-stationary bandits and show a much weaker notion of change can be adapted to, which can yield significantly faster rates than previously known, whether as expressed in terms of number of best action switches--for which no adaptive procedure was known, or in terms of previously studied variation or smoothness measures. We then generalize these results to non-parametric contextual bandits and dueling bandits. As a result, we substantially improve the theoretical state-of-the-art performance guarantees for these problems and, in many cases, tightly characterize the statistical limits of sequential decision-making under changing environments.

Files

  • thumnail for Suk_columbia_0054D_18692.pdf Suk_columbia_0054D_18692.pdf application/pdf 2.66 MB Download File

More About This Work

Academic Units
Statistics
Thesis Advisors
Kpotufe, Samory K.
Degree
Ph.D., Columbia University
Published Here
July 24, 2024