Theses Doctoral

Understanding the Utility of Social Risk Factors Documented in Clinical Notes to Predict Hospitalization and Emergency Department Visits in Home Healthcare

Hobensack, Mollie

Background: Approximately 5 million older adults receive home healthcare (HHC) annually in the United Sates, and nearly 90% of HHC recipients are 65 years or older. HHC encompasses in-home interdisciplinary services such as skilled nursing, social work, and physical, speech, and occupational therapy. One in every five patients is hospitalized during their time in HHC. Researchers have explored machine learning models that use data in the electronic health record (EHR) to aid clinicians in identifying patients at high risk for hospitalization and emergency department (ED) visits. Failure to consider social risk factors can exacerbate health inequities.

Some studies suggest that including social risk factors in machine learning models can help to mitigate bias in model performance among individuals from racial and ethnic minority groups. Prior literature has reported that a majority of social information is documented in clinical notes. In the HHC setting, there is a gap in understanding how social risk factors are documented in clinical notes and whether adding social risk factors in machine learning models can improve model performance. Thus, this dissertation aims to: 1) summarize the literature on machine learning conducted in the HHC setting, 2) extract social risk factors documented in HHC clinical notes, and 3) examine how social risk factors influence machine learning model performance.

Methods: The data from this dissertation is from one HHC agency in New York, New York, including approximately 65,000 unique patients and 2.3 million clinical notes. The Biopsychosocial Model guided this study by providing a framework to report the features included in the machine learning models. To address the first aim, a scoping review was conducted to summarize the literature on machine learning applied to EHR data in the HHC setting. To address the second aim, a natural language processing system was developed to extract social risk factors from HHC clinical notes. Then, logistic regression was utilized to examine the association between the social risk factors documented in clinical notes and hospitalization and ED visits. Finally, to address the third aim, social risk factors were included in four machine learning models to predict hospitalization and ED visit risk in HHC. A sub-analysis was conducted to explore the utility of social risk factors in machine learning models across individuals from different racial and ethnic groups.

Results: The results from all three aims suggest that there has been a rise in machine learning applied in HHC, but few studies have incorporated clinical notes. There are gaps in implementing machine learning models in practice and standardizing social risk factors in documentation. HHC clinicians are documenting the following social risk factors in 4% of their clinical notes: Social Environment, Physical Environment, Education and Literacy, Food Insecurity, and Access to Care. These social risk factors are significantly associated with hospitalization and ED visits; however, their contribution showed minimal differences in machine learning model performance.

Conclusion: This dissertation study demonstrates the feasibility and utility of leveraging HHC clinicians’ clinical notes to understand social risk factors. Further exploration is needed to tease out the nuances in how HHC clinicians perceive, assess, and document social risk factors in the EHR. Stakeholders are encouraged to standardize social risk factors and develop informatics tools tailored to the HHC setting to improve the identification of patients at risk for hospitalization and ED visits.


This item is currently under embargo. It will be available starting 2024-08-31.

More About This Work

Academic Units
Thesis Advisors
Topaz, Maxim
Ph.D., Columbia University
Published Here
September 6, 2023