2021 Theses Doctoral
Variable Clustering Methods and Applications in Portfolio Selection
This thesis introduces three variable clustering methods designed in the context of diversified portfolio selection. The motivation is to cluster financial assets in order to identify a small set of assets to approximate the level of diversification of the whole universe of stocks.
First, we develop a data-driven approach to variable clustering based on a correlation blockmodel, in which assets in the same cluster have the same correlations with all other assets. Under the correlation blockmodel, the assets in the same cluster are controlled by the same latent factor. In addition, each cluster forms an equivalent class among assets, in the sense that the portfolio consisting of one stock from each cluster will have the same correlation matrix, regardless of the specific stocks chosen. We devise an algorithm named ACC (Asset Clustering through Correlation) to detect the clusters, with theoretical analysis and practical guidance for tuning the parameter for the algorithm.
Our second method studies a multi-factor block model, which is a generalization of the correlation blockmodel. Under this multi-factor block model, assets in the same cluster are governed by a set of multiple latent factors, instead of a single factor, as in the correlation blockmodel. Observations of the asset returns lie near a union of low-dimensional subspaces under this model. We propose a subspace clustering method that utilizes square-root LASSO nodewise regression to identify these subspaces and recover the corresponding clusters. Through theoretical analysis, we provide a practical and straightforward guidance for choosing the regularization parameters.
Existing subspace clustering methods based on regularized nodewise regression often arbitrarily choose the form of the regularization. The parameter that controls the regularization is also often determined exogenously or by cross-validation.Our third method theoretically unifies the choices of the regularizer and its parameter by formulating a distributionally robust version of nodewise regression. In this new formulation, we optimize the worst-case square loss within a region of distributional uncertainty around the empirical distribution. We show that this formulation naturally leads to a spectral-norm regularized optimization problem. In addition, the parameter that controls the regularization is nothing but the radius of the uncertainty region and can be determined easily based on the degree of uncertainty in the data. We also propose an alternating direction method of multipliers (ADMM) algorithm for efficient implementation.
Finally, we design and implement an empirical analysis framework to verify the performance of the three proposed clustering methods. This framework consists of four main steps: clustering, stock selection, asset allocation, and portfolio backtesting. The main idea is to select stocks from each cluster to construct a portfolio and then assess the clustering method by analyzing the portfolio's performance. Using this framework, we can easily compare new clustering methods with existing ones by creating portfolios with the same selection and allocation strategies. We apply this framework to the daily returns of the S&P 500 stock universe. Specifically, we compare portfolios constructed using different clustering methods and asset allocation strategies with the S&P 500 Index benchmark. Portfolios from our proposed clustering methods outperform the benchmark significantly. They also perform favorably compared to other existing clustering algorithms in terms of the risk-adjusted return.
This item is currently under embargo. It will be available starting 2026-10-12.
More About This Work
- Academic Units
- Industrial Engineering and Operations Research
- Thesis Advisors
- Zhou, Xunyu
- Ph.D., Columbia University
- Published Here
- October 13, 2021