2018 Theses Doctoral
Leveraging patient-provided data to improve understanding of disease risk
Patient-provided data are crucial to achieving the goal of precision medicine. These data, which include family medical history, race and ethnicity, and social and behavioral determinants of health, are essential for disease risk assessment. Despite the well-established importance of patient-provided data, there are many data quality challenges that affect how this information can be used for biomedical research.
To determine how to best use patient-provided data to assess disease risk, the research reflected in this dissertation was divided into three overarching aims. In Aim 1, I focused on determining the quality of race and ethnicity, family history and smoking status in clinical databases. In Aim 2, I assessed the impact of various interventions on the quality of these data, including policy changes such as the implementation of the requirements imposed by the Meaningful Use program, and patient-facing tools for collecting and sharing information with patients. In addition to these evaluations, I also developed and evaluated a method “Relationship Inference from the Electronic Health Record” (RIFTEHR), that infers familial relationships from clinical datasets. In Aim 3, I used patient-provided data to assess disease risk both at a population level, by estimating disease heritability, and at an individual level, by identifying high-risk individuals eligible for additional screening for a common disease (diabetes mellitus) and a rare disease (celiac disease).
My research uncovered several data quality concerns for patient-provided data in clinical databases. When assessing the impact of interventions on the quality of these data, I found that policy interventions led to more data collection, but not necessarily to better data quality. In contrast, patient-facing tools did increase the quality of the patient-provided data. In the absence of high-quality patient-provided data for family medical history, I developed and evaluated a method for inferring this information from large clinical databases. I demonstrated that electronic health record data can be used to infer familial relationships accurately. Moreover, I showed how the use of clinical data in conjunction with the inferred familial relationships could determine disease risk in two studies. In the first study, I successfully computed disease heritability estimates for 500 conditions, some of which had not been previously studied. In the second study, I identified that screening rates among family members that are considered to be at high-risk for disease development were low for both diabetes mellitus and celiac disease.
In summary, the work represented in this dissertation contributes to the understanding of the quality of patient-provided data, how interventions affect the quality of these data, and how novel methods can be applied to troves of existing clinical data to generate new knowledge to support research and clinical care.
- daGracaPolubriaginof_columbia_0054D_14771.pdf application/pdf 17.6 MB Download File
More About This Work
- Academic Units
- Biomedical Informatics
- Thesis Advisors
- Vawdrey, David K.
- Ph.D., Columbia University
- Published Here
- October 5, 2018