2017 Theses Doctoral

# Essays on Matching and Weighting for Causal Inference in Observational Studies

This thesis consists of three papers on matching and weighting methods for causal inference. The first paper conducts a Monte Carlo simulation study to evaluate the performance of multivariate matching methods that select a subset of treatment and control observations. The matching methods studied are the widely used nearest neighbor matching with propensity score calipers, and the more recently proposed methods, optimal matching of an optimally chosen subset and optimal cardinality matching. The main findings are: (i) covariate balance, as measured by differences in means, variance ratios, Kolmogorov-Smirnov distances, and cross-match test statistics, is better with cardinality matching since by construction it satisfies balance requirements; (ii) for given levels of covariate balance, the matched samples are larger with cardinality matching than with the other methods; (iii) in terms of covariate distances, optimal subset matching performs best; (iv) treatment effect estimates from cardinality matching have lower RMSEs, provided strong requirements for balance, specifically, fine balance, or strength-k balance, plus close mean balance. In standard practice, a matched sample is considered to be balanced if the absolute differences in means of the covariates across treatment groups are smaller than 0.1 standard deviations. However, the simulation results suggest that stronger forms of balance should be pursued in order to remove systematic biases due to observed covariates when a difference in means treatment effect estimator is used. In particular, if the true outcome model is additive then marginal distributions should be balanced, and if the true outcome model is additive with interactions then low-dimensional joints should be balanced.

The second paper focuses on longitudinal studies, where marginal structural models (MSMs) are widely used to estimate the effect of time-dependent treatments in the presence of time-dependent confounders. Under a sequential ignorability assumption, MSMs yield unbiased treatment effect estimates by weighting each observation by the inverse of the probability of their observed treatment sequence given their history of observed covariates. However, these probabilities are typically estimated by fitting a propensity score model, and the resulting weights can fail to adjust for observed covariates due to model misspecification. Also, these weights tend to yield very unstable estimates if the predicted probabilities of treatment are very close to zero, which is often the case in practice. To address both of these problems, instead of modeling the probabilities of treatment, a design-based approach is taken and weights of minimum variance that adjust for the covariates across all possible treatment histories are directly found. For this, the role of weighting in longitudinal studies of treatment effects is analyzed, and a convex optimization problem that can be solved efficiently is defined. Unlike standard methods, this approach makes evident to the investigator the limitations imposed by the data when estimating causal effects without extrapolating. A simulation study shows that this approach outperforms standard methods, providing less biased and more precise estimates of time-varying treatment effects in a variety of settings. The proposed method is used on Chilean educational data to estimate the cumulative effect of attending a private subsidized school, as opposed to a public school, on students’ university admission tests scores.

The third paper is centered on observational studies with multi-valued treatments. Generalizing methods for matching and stratifying to accommodate multi-valued treatments has proven to be a complex task. A natural way to address confounding in this case is by weighting the observations, typically by the inverse probability of treatment weights (IPTW). As in the MSMs case, these weights can be highly variable and produce unstable estimates due to extreme weights. In addition, model misspecification, small sample sizes, and truncation of extreme weights can cause the weights to fail to adjust appropriately for observed confounders. The conditions the weights need to satisfy in order to provide close to unbiased treatment effect estimates with a reduced variability are determined and the convex optimization problem that can be solved in polynomial time to obtain them is defined. A simulation study with different settings is conducted to compare the proposed weighting scheme to IPTW, including generalized propensity score estimation methods that also consider explicitly the covariate balance problem in the probability estimation process. The applicability of the methods to continuous treatments is also tested. The results show that directly targeting balance with the weights, instead of focusing on estimating treatment assignment probabilities, provides the best results in terms of bias and root mean square error of the treatment effect estimator. The effects of the intensity level of the 2010 Chilean earthquake on posttraumatic stress disorder are estimated using the proposed methodology.

## Files

- ResaJuxE1rez_columbia_0054D_14260.pdf application/pdf 856 KB Download File

## More About This Work

- Academic Units
- Statistics
- Thesis Advisors
- Madigan, David
- Zubizarreta, José R.
- Degree
- Ph.D., Columbia University
- Published Here
- October 24, 2017