Theses Doctoral

Characterizing Immune Responses to Marburg Virus Infection in Animal Hosts Using Statistical Transcriptomic Analysis

Lee, Albert Kim

Marburg virus (MARV)–along with Ebola Virus–comprises Filoviridae, a family of virus which causes the life-threatening hemorrhagic fever in human and non-human primates for which there is no clinically approved vaccine. For this reason, this virus can potentially lend itself to pandemic and weapons of bioterrorism. Strikingly, this virus yields asymptomatic responses in its recently discovered host Rousettus aegyptiacus. Understanding of the interaction between MARV and different animal hosts will enable the improved understanding of filovirus immunology and the development of effective therapeutic agents. Although cell lines and primary cells have been used to investigate gene expression analysis of this virus, the transcriptomic view of MARV infection on the tissue samples of animal hosts has been an uncharted territory. The comprehensive analysis of transcriptome in hosts and spillover hosts will shed light on the immune responses on a molecular level and potentially allow the comparative analysis to understand the phenotypical differences. However, there have been gaps in resources necessary to carry the transcriptome research for MARV. For example, MARV host Rousettus aegyptiacus genome and transcriptome had not been available. Furthermore, the statistical machinery necessary to analyze multi-tissue/multi-time data was not available. In this dissertation, I introduce the two items that fill these gaps and show the application of the tools I built for novel biological discovery. In particular, I have built 1) the comprehensive de novo transcriptome reference of Rousettus aegyptiacus and 2) the Multilevel Analysis of Gene Expression (MAGE) pipeline to analyze the RNA-seq data with the complex experimental design. I show the application of MAGE in multi-time, multi-tissue transcriptome data of Macaca mulata infected with MARV. In this study, 15 rhesus macaques were sequentially sacrificed via aerosol exposure to MARV Angola over the course of 9 days, and 3 types of lymph node tissues (tracheobronchial, mesenteric, and inguinal) were extracted from each sample and sequenced for gene expression analysis. With MAGE pipeline, I discovered that the posterior median log2FC of genes separates the samples based on day post infection and viral load. I discovered the set of genes such as CD40LG and TMEM197 with interesting trends over time and how similar and different pathways have been influenced in three lymph nodes. I also identified the biologically meaningful clusters of genes based on the topology-based clustering algorithm known as Mapper. Using the MAGE posterior samples, I also determined the genes that are preferentially expressed in tracheobronchial lymph nodes. In addition to new analysis tools and biological findings, I built the gene expression exploration tool for biologists to examine differential gene expression over time in various immune-related pathways and contributing members of the pathways. In conclusion, I have contributed to the two important components in the transcriptome analysis in MARV research and discovered novel biological insights. The MAGE pipeline is modular and extensible and will be useful for the transcriptome research with the complex experimental designs which are becoming increasingly prevalent with the decrease in the cost of sequencing.


  • thumnail for Lee_columbia_0054D_14474.pdf Lee_columbia_0054D_14474.pdf application/pdf 5.64 MB Download File

More About This Work

Academic Units
Biomedical Informatics
Thesis Advisors
Rabadan, Raul
Ph.D., Columbia University
Published Here
March 24, 2018