Theses Doctoral

Analysis of Search on Clinical Narrative within the EHR

Natarajan, Karthik

Electronic Health Records (EHRs) are used increasingly in the hospital and outpatient set- tings, and patients are amassing digitized clinical information. On one hand, aggregating all the patient's clinical information can greatly assist health care workers in making sound decisions. On the other hand, it can result in information overload, making it difficult to browse for information within the health record. Considering the time constraints clinicians face, one way to reduce information overload is through a search utility. However, traditional, free-text search engines within the EHR can potentially miss documents that do not contain the query but that are relevant to the clinical user's search. This dissertation aims at addressing this gap by analyzing within-patient search of the EHR and examining various semantic search approaches on clinical narrative. Our work consists of three studies where clinical users' search needs are examined, traditional string-matching is analyzed, and semantic search approaches on clinical narrative are evaluated. The first study applied a mixed method approach in order to provide a better understanding of clinical users' search needs within the EHR. It is comprised of a retrospective log analysis of search log files and a survey that was administered to clinical professionals within our institution. The log analysis attempts to categorize how users of a search system query for information, and the survey tries to understand users' search preferences. This study showed that clinical users were very interested in search functionality within the EHR and that various types of users utilize a search utility differently. Overall, most users searched for specific laboratory tests and diseases within the health record. The last two studies rely on a gold standard, which was developed specifically for this dissertation. The gold standard contained a document collection, a set of queries, and for each document/query pair, a relevance judgment. This gold standard was used to evaluate and compare different search models on clinical narrative. The second study conducted was an error analysis of the traditional, vector-space model search approach. The study examined the false positives and false negatives of this approach and categorized the errors in order to identify gaps that semantic approaches may fill. The last study was a systematic evaluation of five different semantic search approaches. These search methods consisted of distributional semantic approaches and an ontology-based approach. The study identified that a mixed topic modeling and vector-space model approach was the best performing search algorithm on our gold standard. All of these studies lay the foundation for us to gain a deeper understanding of information retrieval methods within the electronic health record. Ultimately, this will allow health care professionals to easily access pertinent patient information, which could result in better health care delivery.


  • thumnail for Natarajan_columbia_0054D_11003.pdf Natarajan_columbia_0054D_11003.pdf application/pdf 3.86 MB Download File

More About This Work

Academic Units
Biomedical Informatics
Thesis Advisors
Elhadad, Noemie
Ph.D., Columbia University
Published Here
November 2, 2012