Theses Doctoral

Computational Methods for Analyzing Health News Coverage

McFarlane, Delano J.

Researchers that investigate the media's coverage of health have historically relied on keyword searches to retrieve relevant health news coverage, and manual content analysis methods to categorize and score health news text. These methods are problematic. Manual content analysis methods are labor intensive, time consuming, and inherently subjective because they rely on human coders to review, score, and annotate content. Retrieving relevant health news coverage using keywords can be challenging because manually defining an optimal keyword query, especially for complex health topics and media analysis concepts, can be very difficult, and the optimal query may vary based on when the news was published, the type of news published, and the target audience of the news coverage. This dissertation research investigated computational methods that can assist health news investigators by facilitating these tasks. The first step was to identify the research methods currently used by investigators, and the research questions and health topics researchers tend to investigate. To capture this information an extensive literature review of health news analyses was performed. No literature review of this type and scope could be found in the research literature. This review confirmed that researchers overwhelmingly rely on manual content analysis methods to analyze the text of health news coverage, and on the use of keyword searching to identify relevant health news articles. To investigate the use of computational methods for facilitating these tasks, classifiers that categorize health news on relevance to the topic of obesity, and on their news framing were developed and evaluated. The obesity news classifier developed for this dissertation outperformed alternative methods, including searching based on keyword appearance. Classifying on the framing of health news proved to be a more difficult task. The news framing classifiers performed well, but the results suggest that the underlying features of health news coverage that contribute to the framing of health news are a richer and more useful source of framing information rather than binary news framing classifications. The third step in this dissertation was to use the findings of the literature review and the classifier studies to design the SalientHealthNews system. The purpose of SalientHealthNews is to facilitate the use of computational and data mining techniques for health news investigation, hypothesis testing, and hypothesis generation. To illustrate the use of SalientHealthNews' features and algorithms, it was used to generate preliminary data for a study investigating how framing features vary in health and obesity news coverage that discusses populations with health disparities. This research contributes to the study of the media's coverage of health by providing a detailed description of how health news is studied and what health news topics are investigated, then by demonstrating that certain tasks performed in health news analyses can be facilitated by computational methods, and lastly by describing the design of a system that will facilitate the use of computational and data mining techniques for the study of health news. These contributions should further the study of health news by expanding the methods available to health news analysis researchers. This will lead to researchers being better equipped to accurately and consistently evaluate the media's coverage of health. Knowledge of the quality of health news coverage should in turn lead to better informed health journalists, healthcare providers, and healthcare consumers, ultimately improving individual and public health.


  • thumnail for McFarlane_columbia_0054D_10234.pdf McFarlane_columbia_0054D_10234.pdf application/pdf 11.9 MB Download File

More About This Work

Academic Units
Biomedical Informatics
Thesis Advisors
Kukafka, Rita
Ph.D., Columbia University
Published Here
May 18, 2011