2024 Theses Doctoral

# Nonparametric Methods for Measuring Conditional Dependence, Multi-Sample Dissimilarity, and Testing for Symmetry

We describe new nonparametric methods for (i) quantifying conditional dependence, (ii) quantifying multi-sample dissimilarity, and (iii) testing multivariate symmetry. In the first part of the thesis, we propose a kernel partial correlation (KPC) to quantify conditional dependence, and a kernel measure of dissimilarity between multiple distributions (KMD) to quantify the difference between multiple distributions.

These two measures are both deterministic numbers between 0 and 1, with 0 and 1 corresponding to the two extreme cases --- KPC is 0 if and only if perfect conditional dependence holds, and 1 if and only if there is a conditional functional relationship; while KMD is 0 if and only if all the distributions that are compared are equal, and 1 if and only if these distributions are mutually singular. Both KPC and KMD can be estimated consistently using a computationally efficient graph-based method (including k-nearest neighbor graph and minimum spanning tree). For applications, KPC can be used to develop a model-free variable selection algorithm. This algorithm is provably consistent under sparsity assumptions, and shows superior performance in practice compared to existing procedures. KMD can be used to design an easily implementable test for the equality of multiple distributions, which is consistent against all alternatives where at least two distributions are not equal.

A problem closely related to multi-sample testing is testing for symmetry. In the second part of the thesis, we develop distribution-free tests for multivariate symmetry (that includes central symmetry, sign symmetry, spherical symmetry, etc.) based on multivariate signs, ranks and signed-ranks defined via optimal transport (OT). One test we propose can be thought of as a multivariate generalization of Wilcoxon signed-rank (GWSR) test and shares many of the appealing properties of its one-dimensional counterparts. In particular, when testing against location shift alternatives, the GWSR test suffers from no loss in (asymptotic) efficiency, when compared to Hotelling's T2 test, despite being nonparametric and exactly distribution-free.

Another test we propose is based on a combination of kernel methods and the multivariate signs and ranks defined via OT. This test is universally consistent against all alternatives, while still maintaining the distribution-free property. Furthermore, it is capable of testing a broader class of multivariate symmetry, including exchangeability, extending beyond the class of symmetry testable by GWSR.

## Subjects

## Files

- Huang_columbia_0054D_18377.pdf application/pdf 2.44 MB Download File

## More About This Work

- Academic Units
- Statistics
- Thesis Advisors
- Sen, Bodhisattva
- Degree
- Ph.D., Columbia University
- Published Here
- July 3, 2024