Theses Doctoral

Learning predictive models from menstrual cycle data

Li, Kathy Yinuo

Despite being a physiological phenomenon that impacts billions of womxn worldwide, menstruation has long been understudied. In this dissertation, we first explore the menstrual characteristics of nearly 380,000 womxn, as collected via a self-tracking mobile health (mHealth) app, Clue. We examine how variation in menstrual cycle length is related to volatility in other experienced symptoms, helping to debunk the idea that menstrual cycles should be 'regular.'

We then develop predictive models for menstruation utilizing this dataset, demonstrating first how a fully generative model that explicitly accounts for the possibility that self-tracked data may be flawed in terms of reliability can both outperform baselines and aid in the detection of self-tracking artifacts (i.e., instances where a user supposedly did not experience a period event, but in reality forgot or otherwise neglected to track it). Finally, we explore a hierarchical, deep generative model for symptom tracking, where we utilize a deep neural network to learn per-user parameters for tracking and retain a mechanism for modeling per-user likelihood of adherence.

We find that leveraging symptom data at the time series level allows us to predict occurrence of next bleeding and non-bleeding tracking events with high accuracy. This work demonstrates the great potential that large-scale mHealth data holds to better understanding menstruation as a whole, as well as the importance of treating such data carefully.


  • thumnail for Li_columbia_0054D_17193.pdf Li_columbia_0054D_17193.pdf application/pdf 23 MB Download File

More About This Work

Academic Units
Applied Physics and Applied Mathematics
Thesis Advisors
Wiggins, Chris H.
Ph.D., Columbia University
Published Here
April 27, 2022