Theses Doctoral

Exploring the Plasmodium falciparum Transcriptome Using Hypergeometric Analysis of Time Series (HATS)

Scanfeld, Daniel

Malaria poses a significant public health and economic threat in many regions of the world, disproportionately affecting children in sub-Saharan Africa under the age of five. Though success has been celebrated in lowering infection rates, it remains a serious challenge, causing at least 200 million infections and 655,000 deaths per year, with deleterious effects on economic growth and development. Investigation of the malaria parasite Plasmodium falciparum has entered the post-genomics age, with several strains sequenced and many microarray gene expression studies performed. Gene expression studies allow a full sampling of the genomic repertoire of a parasite, and their detailed analysis will prove invaluable in deciphering novel parasite biology as well as the modes of action of antimalarial drug resistance.

We have developed a computational pipeline that converts a series of fluorescence readings from a DNA microarray into a meaningful set of biological hypotheses based on the comparison of two lines, generally one that is drug sensitive and one that is drug resistant. Each step of the computational pipeline is described in detail in this thesis, beginning with data normalization and alignment, followed by visualization through dimensionality reduction, and finally a direct analysis of the differences and similarities between the two lines. Comparisons and analyses were performed at both the individual gene and gene set level. An important component of the analytical methods we have developed is a suite of visualization tools that help to easily identify outliers and experimental flaws, measure the significance of predictions, show how lines relate and how well they can be aligned, and demonstrate the results of an analysis.

These visualization tools should be used as a starting point for further biological study to test the resulting hypotheses. We also developed a software tool, Gene Attribute and Set Enrichment Ranking (GASER), which combines a wealth of genomic data from the TDR Targets web site along with expression data from a variety of sources, and allows researchers to create sophisticated weighted queries to undercover potential drug targets. Queries in our system can be updated in real time, along with their accompanying gene and gene set lists. We analyzed all possible pair-wise combinations of 11 parasite lines to create baseline distributions for gene and gene set enrichment. Using the baseline as a comparison, we identified and discarded spurious results and recognized stochastic genes and gene sets.

We analyzed three major sets of parasite lines: those involving manipulation of the multidrug resistance-1 (PfMDR1) transporter, a key resistance determinant; those involving manipulation of the P. falciparum chloroquine resistance transporter (PfCRT), another important resistance determinant; and finally a set of parasites that had varying sensitivity to artemisinins. This analysis resulted in a rich library of high scoring genes that may merit further exploration as potential modes of action of resistance. More specifically, we found that manipulation of pfcrt expression resulted in an up-regulation of tRNA synthetases, which might serve to increase protein production in response to reduced amino acid availability from degraded hemoglobin. We observed that a copy number increase in pfmdr1 resulted in increases in glycerophospholipid metabolism and up-regulation of a number of ABC transporters. Finally, when comparing artemisinin sensitive to artemisinin tolerant lines, we found an increased abundance of redox metabolites and the transcripts involved in redox regulation, and significant reduction in transcription and altered expression of transcripts encoding for core histone proteins. These alterations could help confer an increased tolerance to drug induced redox perturbation by lowering endogenous redox stress.

We also offer a robust computational tool, Hypergeometric Analysis of Time Series (HATS), to handle challenging biological questions related to comparison of time series experiments. Our pipeline provides a rigorous method for aligning expression experiments and then determining which genes and gene sets differ most between them. The changes in gene expression level between drug-sensitive and drug-resistant lines offer important clues in our quest for understanding mechanisms of resistance and identifying new drug targets. Our pipeline allows for comparison of future lines with our base set and holds potential for other organisms, especially those similar to Plasmodium with a strong time-dependent component. The full excel files of all the analyses performed in this thesis can be found at: (

Geographic Areas


  • thumnail for Scanfeld_columbia_0054D_11072.pdf Scanfeld_columbia_0054D_11072.pdf application/pdf 20.5 MB Download File

More About This Work

Academic Units
Cellular, Molecular and Biomedical Studies
Thesis Advisors
Fidock, David
Ph.D., Columbia University
Published Here
January 25, 2013