Academic Commons

Theses Doctoral

Sequential Optimization in Changing Environments: Theory and Application to Online Content Recommendation Services

Gur, Yonatan

Recent technological developments allow the online collection of valuable information that can be efficiently used to optimize decisions "on the fly" and at a low cost. These advances have greatly influenced the decision-making process in various areas of operations management, including pricing, inventory, and retail management. In this thesis we study methodological as well as practical aspects arising in online sequential optimization in the presence of such real-time information streams. On the methodological front, we study aspects of sequential optimization in the presence of temporal changes, such as designing decision making policies that adopt to temporal changes in the underlying environment (that drives performance) when only partial information about this changing environment is available, and quantifying the added complexity in sequential decision making problems when temporal changes are introduced. On the applied front, we study practical aspects associated with a class of online services that focus on creating customized recommendations (e.g., Amazon, Netflix). In particular, we focus on online content recommendations, a new class of online services that allows publishers to direct readers from articles they are currently reading to other web-based content they may be interested in, by means of links attached to said article.
In the first part of the thesis we consider a non-stationary variant of a sequential stochastic optimization problem, where the underlying cost functions may change along the horizon. We propose a measure, termed {\it variation budget}, that controls the extent of said change, and study how restrictions on this budget impact achievable performance. As a yardstick to quantify performance in non-stationary settings we propose a regret measure relative to a dynamic oracle benchmark. We identify sharp conditions under which it is possible to achieve long-run-average optimality and more refined performance measures such as rate optimality that fully characterize the complexity of such problems. In doing so, we also establish a strong connection between two rather disparate strands of literature: adversarial online convex optimization; and the more traditional stochastic approximation paradigm (couched in a non-stationary setting). This connection is the key to deriving well performing policies in the latter, by leveraging structure of optimal policies in the former. Finally, tight bounds on the minimax regret allow us to quantify the "price of non-stationarity," which mathematically captures the added complexity embedded in a temporally changing environment versus a stationary one.
In the second part of the thesis we consider another core stochastic optimization problem couched in a multi-armed bandit (MAB) setting. We develop a MAB formulation that allows for a broad range of temporal uncertainties in the rewards, characterize the (regret) complexity of this class of MAB problems by establishing a direct link between the extent of allowable reward "variation" and the minimal achievable worst-case regret, and provide an optimal policy that achieves that performance. Similarly to the first part of the thesis, our analysis draws concrete connections between two strands of literature: the adversarial and the stochastic MAB frameworks.
The third part of the thesis studies applied optimization aspects arising in online content recommendations, that allow web-based publishers to direct readers from articles they are currently reading to other web-based content. We study the content recommendation problem and its unique dynamic features from both theoretical as well as practical perspectives. Using a large data set of browsing history at major media sites, we develop a representation of content along two key dimensions: clickability, the likelihood to click to an article when it is recommended; and engageability, the likelihood to click from an article when it hosts a recommendation. Based on this representation, we propose a class of user path-focused heuristics, whose purpose is to simultaneously ensure a high instantaneous probability of clicking recommended articles, while also optimizing engagement along the future path. We rigorously quantify the performance of these heuristics and validate their impact through a live experiment. The third part of the thesis is based on a collaboration with a leading provider of content recommendations to online publishers.


  • thumnail for Gur_columbia_0054D_12104.pdf Gur_columbia_0054D_12104.pdf binary/octet-stream 1.49 MB Download File

More About This Work

Academic Units
Thesis Advisors
Besbes, Omar
Zeevi, Assaf J.
Ph.D., Columbia University
Published Here
July 7, 2014