Theses Doctoral

Transformer Models for Clinical Target Prediction using Pathology Report Text

Kefeli, Jenna

Structured electronic health record (EHR) data are commonly incomplete and can lack diagnostic detail. Clinical reports, on the other hand, are typically comprehensive and contain a wealth of detailed medical information. Pathologists invest considerable time and specialized training to create information-rich pathology reports, but the necessary manual review of these reports for clinical or research use is a high barrier to their routine utilization. The automated extraction of clinical targets directly from pathology reports would allow for the structured aggregation of relevant patient data that are not currently routinely captured in the EHR. In this dissertation, I apply recently developed transformer models to predict clinical targets from cancer pathology report text.

In the first chapter, I present a pathology report corpus that I fully processed and made publicly available, and perform a proof-of-concept cancer type classification. In the second chapter, I discuss a set of cancer stage classification models that I fine-tune on the pathology report corpus and then externally validate on reports from Columbia University Irving Medical Center (CUIMC).

In the last chapter, I explore additional applications for this methodology, developing a generalizable model to classify prostate cancer reports into primary Gleason score categories, applying a transformer model to classify reports into diagnosis categories for a Barrett’s esophagus patient cohort in a low-data environment, and performing a proof-of-concept prediction of adverse drug events from 1D drug representations.

Files

  • thumnail for Kefeli_columbia_0054D_18213.pdf Kefeli_columbia_0054D_18213.pdf application/pdf 4.96 MB Download File

More About This Work

Academic Units
Cellular, Molecular and Biomedical Studies
Thesis Advisors
Tatonetti, Nicholas P.
Degree
Ph.D., Columbia University
Published Here
January 10, 2024