2025 Theses Doctoral
Early Risk Detection for Infection-Related Hospitalization or Emergency Department Visits Among Home Healthcare Patients
Background:
Home healthcare patients, often elderly with multiple chronic conditions, face high risks of infection-related hospitalizations and emergency department visits. Existing post-acute care risk-prediction models rely solely on structured electronic health record data and neglect information in clinical notes. They also lack temporal awareness to track changing patient risk and rarely assess fairness across diverse patient groups. To address these gaps, this dissertation aimed to: 1) systematically review existing infection models in post-acute care; 2) developed a pipeline to extract structured infection indicators from home healthcare clinical notes; and 3) build and evaluate a sequence-aware deep learning model for dynamic, equitable infection-related risk prediction.
Methods:
This dissertation comprised three studies. First, a systematic review identified common predictors, data sources, and methodological gaps in post-acute care infection models. Second, we developed an information-extraction pipeline using instruction-tuned large language models with parameter-efficient tuning and targeted data augmentation to extract structured infection indicators from home healthcare clinical notes. Third, in a retrospective cohort of home healthcare admissions, we combined structured electronic health record features and text-derived indicators to train sequence-aware deep learning models predicting first-time infection-related hospitalizations and emergency department visits within 30 days of home healthcare admission. We evaluated models on performance (e.g., area under the precision–recall curve), risk-stratification utility, interpretability, and fairness across demographic and socioeconomic subgroups.
Results:
The systematic review highlighted the need for post-acute care infection models to incorporate socio-environmental determinants, leverage multi-modal data, and adopt rigorous evaluation strategies. The infection indicator extraction pipeline achieved a partial micro F1-score of 0.88 and maintained strong format adherence, with data augmentation improving performance on rare indicators and in handling specialized terminology. The sequence-aware deep learning models, especially a Bi-LSTM integrating both structured and text-derived features, outperformed non-sequential baselines, achieving an area under the precision–recall curve of 0.88 for a four-day prediction window. Identified key predictors included baseline clinical profiles (e.g., prior hospitalizations, comorbidity burden, functional dependency) and dynamic features (e.g., home visit intensity, pulse rate change, malnutrition status), with the importance of dynamic features peaking in the most recent visits. A three-tier risk stratification concentrated over 77 percent of infection events within the top 5 percent highest-risk group. Model performance remained equitable across evaluated subgroups.
Conclusion:
This research offers a robust, actionable approach for dynamic infection-related risk prediction in home healthcare. It demonstrates the feasibility of using instruction-tuned large language models to extract structured infection indicators in resource-constrained settings and validates sequence-aware deep learning that integrates multi-source data for equitable and interpretable risk assessment. These advances support continuous, evidence-based monitoring and proactive management of infection risk, ultimately enabling personalized care planning in home healthcare.
Subjects
Files
This item is currently under embargo. It will be available starting 2026-07-29.
More About This Work
- Academic Units
- Nursing
- Thesis Advisors
- Topaz, Maxim
- Shang, Jingjing
- Degree
- Ph.D., Columbia University
- Published Here
- September 3, 2025