Essays in Cluster Sampling and Causal Inference

Susanna Makela

Essays in Cluster Sampling and Causal Inference
Makela, Susanna
Thesis Advisor(s):
Gelman, Andrew
Ph.D., Columbia University
Persistent URL:
This thesis consists of three papers in applied statistics, specifically in cluster sampling, causal inference, and measurement error. The first paper studies the problem of estimating the finite population mean from a two-stage sample with unequal selection probabilies in a Bayesian framework. Cluster sampling is common in survey practice, and the corresponding inference has been predominantly design-based. We develop a Bayesian framework for cluster sampling and account for the design effect in the outcome modeling. In a two-stage cluster sampling design, clusters are first selected with probability proportional to cluster size, and units are then randomly sampled within selected clusters. Methodological challenges arise when the sizes of nonsampled cluster are unknown. We propose both nonparametric and parametric Bayesian approaches for predicting the cluster size, and we implement inference for the unknown cluster sizes simultaneously with inference for survey outcome. We implement this method in Stan and use simulation studies to compare the performance of an integrated Bayesian approach to classical methods on their frequentist properties. We then apply our propsed method to the Fragile Families and Child Wellbeing study as an illustration of complex survey inference. The second paper focuses on the problem of weak instrumental variables, motivated by estimating the causal effect of incarceration on recidivism. An instrument is weak when it is only weakly predictive of the treatment of interest. Given the well-known pitfalls of weak instrumental variables, we propose a method for strengthening a weak instrument. We use a matching strategy that pairs observations to be close on observed covariates but far on the instrument. This strategy strengthens the instrument, but with the tradeoff of reduced sample size. To help guide the applied researcher in selecting a match, we propose simulating the power of a sensitivity analysis and design sensitivity and using graphical methods to examine the results. We also demonstrate the use of recently developed methods for identifying effect modification, which is an interaction between a pretreatment covariate and the treatment. Larger and less variable treatment effects are less sensitive to unobserved bias, so identifying when effect modification is present and which covariates may be the source is important. We undertake our study in the context of studying the causal effect of incarceration on recividism via a natural experiment in the state of Pennsylvania, a motivating example that illustrates each component of our analysis. The third paper considers the issue of measurement error in the context of survey sampling and hierarchical models. Researchers are often interested in studying the relationship between community-levels variables and individual outcomes. This approach often requires estimating the neighborhood-level variable of interest from the sampled households, which induces measurement error in the neighborhood-level covariate since not all households are sampled. Other times, neighborhood-level variables are not observed directly, and only a noisy proxy is available. In both cases, the observed variables may contain measurement error. Measurement error is known to attenuate the coefficient of the mismeasured variable, but it can also affect other coefficients in the model, and ignoring measurement error can lead to misleading inference. We propose a Bayesian hierarchical model that integrates an explicit model for the measurement error process along with a model for the outcome of interest for both sampling-induced measurement error and classical measurement error. Advances in Bayesian computation, specifically the development of the Stan probabilistic programming language, make the implementation of such models easy and straightforward.
Mathematical statistics
Cluster analysis
Errors-in-variables models
Item views
text | xml
Suggested Citation:
Susanna Makela, , Essays in Cluster Sampling and Causal Inference, Columbia University Academic Commons, .

Columbia University Libraries | Policies | FAQ