Academic Commons

Theses Doctoral

Phenotyping Endometriosis from Observational Health Data

McKillop, Mollie

The signs and symptoms of many diseases remain poorly characterized. For these types of conditions, the constellation of symptoms experienced by patients are not adequately described, nor are the signs and symptoms specific to the condition well-defined. These features define an enigmatic disease. One of the most prevalent yet enigmatic conditions today is endometriosis, described as when endometrial-like cells grow outside of the uterus. Largely because of the wide, unexplained variation in patient symptoms, beyond the surgical definition of the disease, and the lack of noninvasive diagnostic biomarkers, there exists a significant delay in diagnosis. Better characterization of enigmatic diseases like endometriosis should lead us towards more accurate and earlier disease diagnosis. In informatics, characterizing a condition is phenotyping. For a prevalent condition for which the the symptomatic experience is highly heterogeneous, this process involves the use of data-driven methods to describe group-specific patterns to better explain this heterogeneity.
Traditional data sources for phenotyping include observational health data like electronic health records (EHR) and administrative claims. Collecting data longitudinally and designing data collection so it is relevant to the patient experience may provide a complementary characterization of the condition useful for phenotyping. Alternative data sources such as patient-generated health data from self-tracking devices may elucidate, over time, a wider range of signs and symptoms of the disease at a more granular level than traditional phenotyping data sources. Patient-generated health data, however, remains an unexplored data source for disease phenotyping of enigmatic conditions like endometriosis.
This thesis explores the following research questions: 1) To what extent are traditional data sources representative of endometriosis? 2) How should researchers design a self-tracking app for endometriosis that is engaging for the user and supports phenotyping at scale? 3) What computational methods can help phenotype endometriosis at scale from self-tracking data, and 4) can the disease be detected earlier with a validated EHR phenotype?
First, the disease dimensions relevant to endometriosis are elicited from both traditional observational health data sources and from patients directly. Second, using these dimensions, a self-tracking app for endometriosis is designed to be both engaging to the user and to facilitate disease phenotyping across a patient population. The app is then developed using a standard software framework, and patients are recruited to use the self-tracking app. Third, using self-tracking data and traditional phenotyping data sources, such as claims and EHRs, computational methods for identifying subtypes of the disease and for early disease detection are explored.
This thesis contributes the following: 1) Using over 1,400 patient records for manual chart review, a validated, reproducible, and portable endometriosis cohort definition for selecting patients from both claims and EHR data with a sensitivity (recall) of 70%, specificity of 93%, and positive predictive value (precision) of 85% is developed. Using this definition, a characterization of the disease to help with early disease detection is elucidated using over two million endometriosis patients across institutions and settings. 2) A self-tracking app (Phendo) that supports further characterization of the disease at scale has been designed and developed and is currently used by over 6,000 endometriosis patients from over 70 countries. 3) Data from this app has been used to identify three novel subtypes of the disease that are clinically meaningful, interpretable, and correlate with what is known about the condition from a gold-standard clinical survey. 4) Leveraging the cohort definition characterization for earlier disease detection, a well-performing prediction model, with area under the curve of 68.6%, for early identification of endometriosis has been trained and tested across a network of observational health databases.


  • thumnail for McKillop_columbia_0054D_15109.pdf McKillop_columbia_0054D_15109.pdf application/pdf 52.3 MB Download File

More About This Work

Academic Units
Biomedical Informatics
Thesis Advisors
Elhadad, Noemie
Ph.D., Columbia University
Published Here
March 8, 2019