Theses Doctoral

Combining Heterogeneous Databases to Detect Adverse Drug Reactions

Li, Ying

Adverse drug reactions (ADRs) cause a global and substantial burden accounting for considerable mortality, morbidity and extra costs. In the United States, over 770,000 ADR related injures or deaths occur each year in hospitals, which may cost up to $5.6 million each year per hospital. Unanticipated ADRs may occur after a drug has been approved due to its use or prolonged use on large, diverse populations. Therefore, the post-marketing surveillance of drugs is essential for generating more complete drug safety profiles and for providing a decision making tool to help governmental drug administration agencies take an action on the marketed drugs. Analysis of spontaneous reports of suspected ADRs has traditionally served as a valuable tool in pharmacovigilance. However, because of well-known limitations of spontaneous reports, observational healthcare data, such as electronic health records (EHRs) and administrative claims data, are starting to be used to complement the spontaneous reporting system. Synthesizing ADR evidence from multiple data sources has been conducted by human experts on an at hoc basis. However, the amount of data from both spontaneous reporting systems (SRSs) and observational healthcare databases is growing exponentially. The revolution in the ability of machines to access, process, and mine databases, making it advantageous to develop an automatic system to obtain integrated evidence by combining them.
Towards this goal, this dissertation proposes a framework consisting of three components that generates signal scores based on data an EHR system and of an SRS system, and then integrates two signal scores into a composite one. The first component is a data-driven and regression- based method that aims to alleviate confounding effect and detect ADR based on EHRs. The results demonstrate that this component achieves comparable or slightly higher accuracy than those trained with experts and existing automatic methods. The second component is also a data- driven and regression-based method that aims to reduce the effect of confounding by co- medication and confounding by indication using primary suspected, secondary suspected, concomitant medications and indications on the basis of a SRS. This study demonstrates that it could accomplish comparable or slightly better accuracy than the cutting edge algorithm Gamma Poisson Shrinkage (GPS), which uses primary suspected medications only. The third component is a computational integration method that normalizes signal scores from each data source and integrates them into a composite signal score. The results achieved by the method demonstrate that the combined ADR evidence achieve better accuracy of drug-ADR detection than individual systems based on either an SRS or an EHR. Furthermore, component three is explored as a tool to assist clinical assessors in pharmacovigilance practice.
The research presented in this dissertation has produced several novel insights and provided new solutions towards the challenging problem of pharmacovigilance. The method of reducing confounding effect can be generalizable to other EHR systems and the method for integrating ADR evidence can be generalizable to include other data sources. In conclusion, this dissertation develops a method to reduce confounding effect in both EHRs and SRSs, and a combined system to synthesize evidence, which could potentially unveil drug safety profiles and novel adverse events in a timely fashion.



  • thumnail for Li_columbia_0054D_12824.pdf Li_columbia_0054D_12824.pdf application/pdf 1.01 MB Download File

More About This Work

Academic Units
Biomedical Informatics
Thesis Advisors
Friedman, Carol
Ph.D., Columbia University
Published Here
September 16, 2015