2023 Theses Doctoral
Correcting for Measurement Error and Misclassification using General Location Models
Measurement error is common in epidemiologic studies and can lead to biased statistical inference. It is well known, for example, that regression analyses involving measurement error in predictors often produce biased model coefficient estimates. The work in this dissertation adds to the existing vast literature on measurement error by proposing a missing data treatment of measurement error through general location models.
The focus is on the case in which information about the measurement error model is not obtained from a subsample of the main study data but from separate, external information, namely the external calibration. Methods for handling measurement error in the setting of external calibration are in need with the increase in the availability of external data sources and the popularity of data integration in epidemiologic studies. General location models are well suited for the joint analysis of continuous and discrete variables. They offer direct relationships with the linear and logistic regression models and can be readily implemented using frequentist and Bayesian approaches. We use the general location models to correct for measurement error and misclassification in the context of three practical problems.
The first problem concerns measurement error in a continuous variable from a dataset containing both continuous and categorical variables. In the second problem, measurement error in the continuous variable is further complicated by the limit of detection (LOD) of the measurement instrument, resulting in some measures of the error-prone continuous variable undetectable if they are below LOD. The third problem deals with misclassification in a binary treatment variable. We implement the proposed methods using Bayesian approaches for the first two problems and using the Expectation-maximization algorithm for the third problem.
For the first problem we propose a Bayesian approach, based on the general location model, to correct measurement error of a continuous variable in a data set with both continuous and categorical variables. We consider the external calibration setting where in addition to the main study data of interest, calibration data are available and provide information on the measurement error but not on the error-free variables.
The proposed method uses observed data from both the calibration and main study samples and incorporates relationships among all variables in measurement error adjustment, unlike existing methods that only use the calibration data for model estimation. We assume by strong nondifferential measurement error (sNDME) that the measurement error is independent of all the error-free variables given the true value of the error-prone variable. The sNDME assumption allows us to identify our model parameters. We show through simulations that the proposed method yields reduced bias, smaller mean squared error, and interval coverage closer to the nominal level compared to existing methods in regression settings. Furthermore, this improvement is pronounced with increased measurement error, higher correlation between covariates, and stronger covariate effects. We apply the new method to the New York City Neighborhood Asthma and Allergy Study to examine the association between indoor allergen concentrations and asthma morbidity among urban asthmatic children.
The simultaneous occurrence of measurement error and LOD is common particularly in environmental exposures such as measurements of the indoor allergen concentrations mentioned in the first problem. Statistical analyses that do not address these two problems simultaneously could lead to wrong scientific conclusions. To address this second problem, we extend the Bayesian general location models for measurement error adjustment to handle both measurement error and values below LOD in a continuous environmental exposure in a regression setting with mixed continuous and discrete variables. We treat values below LOD as censored. Simulations show that our method yields smaller bias and root mean squared error and the posterior credible interval of our method has coverage closer to the nominal level compared to alternative methods, even when the proportion of data below LOD is moderate. We revisit data from the New York City Neighborhood Asthma and Allergy Study and quantify the effect of indoor allergen concentrations on childhood asthma when over 50% of the measured concentrations are below LOD.
We finally look at the third problem of group mean comparison when treatment groups are misclassified. Our motivation comes from the Frequent User Services Engagement (FUSE) study. Researchers wish to compare quantitative health and social outcome measures for frequent jail-and-shelter users who were assigned housing and those who were not housed, and misclassification occurs as a result of noncompliance. The recommended intent-to-treat analysis which is based on initial group assignment is known to underestimate group mean differences. We use the general location model to estimate differences in group means after adjusting for misclassification in the binary grouping variable. Information on the misclassification is available through the sensitivity and specificity. We assume nondifferential misclassification so that misclassification does not depend on the outcome. We use the expectation-maximization algorithm to obtain estimates of the general location model parameters and the group means difference. Simulations show the bias reduction in the estimates of group means difference.
Geographic Areas
Subjects
Files
- Kwizera_columbia_0054D_18022.pdf application/pdf 1.65 MB Download File
More About This Work
- Academic Units
- Biostatistics
- Thesis Advisors
- Chen, Qixuan
- Degree
- Ph.D., Columbia University
- Published Here
- October 4, 2023