2024 Theses Doctoral
Data-driven Decision-making: New Insights on Algorithm Performance and Data Value
With the rise of data-driven algorithms, both industrial practitioners and academicians have aimed at understanding how one can use past information to make better future decisions. This question is particularly challenging, as any answer necessarily depends on several parameters, such as the features of the data used (e.g., the quantity and relevance of data), the downstream problem being solved, and the type of algorithms deployed to leverage the data. Most of the current literature analyzes the value of data by anchoring their methods in the large data regime, making the implicit assumption that data is widely available in practice.
In this work, we depart from this implicit assumption and posit that, in fact, relevant data is a scarce resource in many practical settings. For instance, data is usually aggregated across different times, product categories, and geographies, and therefore the effective size of datasets is orders of magnitude lower than it may appear to be. The goal of this thesis is to bridge the gap between the theoretical understanding of data-driven decisions and practical performance by developing a problem-centric theory of data-driven decision-making in which we assess the value of data by quantifying its impact on our downstream decisions. In particular, we design methodological tools tailored to the problem at hand and derive fine-grained and problem-specific guarantees for algorithms.
In the first chapter, we study the data-driven newsvendor problem under the modeling assumption that data is identically and independently distributed. We are interested in analyzing central policies in the literature, such as Sample Average Approximation (SAA), along with optimal ones, and in characterizing the performance achievable across data sizes, both small and large. Specifically, we characterize exactly the performance of SAA and uncover novel fundamental insights on the value of data. Indeed, our analysis reveals that tens of samples are sufficient to perform very efficiently, but also that more data can lead to worse out-of-sample performance for SAA. In turn, we derive an optimal algorithm in the minimax sense, enhancing decision quality with limited data.
The second chapter explores the impact of data relevance on decision quality, addressing the challenge of using historical data from varying sources that may not be fully indicative of the future. We quantify the performance of SAA in these heterogeneous environments and design rate-optimal policies in settings where SAA falters. We illustrate the versatility of our framework by analyzing several prototypical problems across various fields: the newsvendor, pricing, and ski rental problems. Our analysis shows that the type of achievable asymptotic performance varies significantly across different problem classes and heterogeneity notions.
Finally, the third chapter develops a framework for contextual decision-making, examining how past data relevance and quantity affect policy performance. Focusing on the contextual newsvendor problem, we analyze the wide class of Weighted Empirical Risk Minimization (WERM) policies, which weigh past data according to their relevance. This class of policies includes the SAA policy (also referred to as ERM), k-Nearest Neighbors, and kernel-based methods. While past literature focuses on upper bounds via concentration inequalities, we instead take an optimization approach and isolate a structure in the newsvendor loss function that allows us to reduce the infinite-dimensional optimization problem over worst-case distributions to a simple line search. In addition to this methodological contribution, our exact analysis offers new granular insights into the learning curve of algorithms in contextual settings. Through these contributions, the thesis advances our understanding of data-driven decision-making, offering both theoretical foundations and practical insights for diverse operational applications.
Subjects
Files
- Mouchtaki_columbia_0054D_18628.pdf application/pdf 1.38 MB Download File
More About This Work
- Academic Units
- Industrial Engineering and Operations Research
- Thesis Advisors
- Besbes, Omar
- Ma, Will
- Degree
- Ph.D., Columbia University
- Published Here
- August 28, 2024