Articles

A comprehensive and bias-free machine learning approach for risk prediction of preeclampsia with severe features in a nulliparous study cohort

Lin, Yun Chao; Mallia, Daniel; Clark‑Sevilla, Andrea Ogilvie; Catto, Adam; Leshchenko, Alisa; Yan, Qi; Haas, David M.; Wapner, Ronald; Pe’er, Itsik; Raja, Anita; Salleb-Aouissi, Ansaf

Preeclampsia is one of the leading causes of maternal morbidity, with consequences during and after pregnancy. Because of its diverse clinical presentation, preeclampsia is an adverse pregnancy outcome that is uniquely challenging to predict and manage. In this paper, we developed racial bias-free machine learning models that predict the onset of preeclampsia with severe features or eclampsia at discrete time points in a nulliparous pregnant study cohort. To focus on those most at risk, we selected probands with severe PE (sPE). Those with mild preeclampsia, superimposed preeclampsia, and new onset hypertension were excluded.

The prospective study cohort to which we applied machine learning is the Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-be (nuMoM2b) study, which contains information from eight clinical sites across the US. Maternal serum samples were collected for 1,857 individuals between the first and second trimesters. These patients with serum samples collected are selected as the final cohort.

Our prediction models achieved an AUROC of 0.72 (95% CI, 0.69–0.76), 0.75 (95% CI, 0.71–0.79), and 0.77 (95% CI, 0.74–0.80), respectively, for the three visits. Our initial models were biased toward non-Hispanic black participants with a high predictive equality ratio of 1.31. We corrected this bias and reduced this ratio to 1.14. This lowers the rate of false positives in our predictive model for the non-Hispanic black participants. The exact cause of the bias is still under investigation, but previous studies have recognized PLGF as a potential bias-inducing factor. However, since our model includes various factors that exhibit a positive correlation with PLGF, such as blood pressure measurements and BMI, we have employed an algorithmic approach to disentangle this bias from the model.

The top features of our built model stress the importance of using several tests, particularly for biomarkers (BMI and blood pressure measurements) and ultrasound measurements. Placental analytes (PLGF and Endoglin) were strong predictors for screening for the early onset of preeclampsia with severe features in the first two trimesters.

Geographic Areas

Files

  • thumbnail for 12884_2024_Article_6988.pdf 12884_2024_Article_6988.pdf application/pdf 576 KB Download File

Also Published In

Title
BMC Pregnancy and Childbirth
DOI
https://doi.org/10.1186/s12884-024-06988-w

More About This Work

Academic Units
Computer Science
Obstetrics and Gynecology
Published Here
December 24, 2025

Notes

Preeclampsia, Machine learning, PlGF, Fairness in machine learning, Preeclampsia with severe features, Ensemble model