Data pre-processing for the preterm prediction study MFMU dataset

Vovsha, Ilia; Salleb-Aouissi, Ansaf; Radeva, Axinia; Raja, Anita; Diab, Hatim; Tomar, Ashish; Rajan, Ashwath

Preterm birth is a major public health problem with profound implications on society. There would be extreme value in being able to identify women at risk of preterm birth during the course of their pregnancy. Previous research has largely focused on individual risk factors correlated with preterm birth (e.g. prior preterm birth, race, and infection) and less on combining these factors in a way to understand the complex etiologies of preterm birth. We attempt to address this gap by conducting a deeper analysis of the preterm prediction study data collected by the NICHD Maternal Fetal Medicine Units (MFMU) Network, a high-quality data for over 3,000 singleton pregnancies having detailed study visits and biospecimen collection at 24, 26, 28 and 30 weeks gestation. Reports from this dataset used relatively straightforward biostatitistical methodologies such as relative risk assessments to measure associations between risk factors and PTB (Maternal Fetal Medicine Units Net- work. Biostatistical Coordinating Center NICHD Networks, 1995). These methods include descriptive statistics, Pearson correlation, Fisher’s exact tests and linear/logistic regression where risk factors are studied independent of each other. In order to perform detailed experiments on this data using non-linear Support Vector Machines and other machine learning (ML) methodologies, it is necessary to complete several pre-processing steps that we describe in this report.


More About This Work

Academic Units
Center for Computational Learning Systems
Published Here
May 21, 2013