Theses Doctoral

Topics in Genomic Signal Processing

Jajamovich, Guido Hugo

Genomic information is digital in its nature and admits mathematical modeling in order to gain biological knowledge. This dissertation focuses on the development and application of detection and estimation theories for solving problems in genomics by describing biological problems in mathematical terms and proposing a solution in this domain. More specifically, a novel framework for hypothesis testing is presented, where it is desired to decide among multiple hypotheses and where each hypothesis involves unknown parameters. Within this framework, a test is developed to perform both detection and estimation jointly in an optimal sense. The proposed test is then applied to the problem of detecting and estimating periodicities in DNA sequences. Moreover, the problem of motif discovery in DNA sequences is presented, where a set of sequences is observed and it is needed to determine which sequences contain instances (if any) of an unknown motif and estimate their positions. A statistical description of the problem is used and a sequential Monte Carlo method is applied for the inference. Finally, the phasing of haplotypes for diploid organisms is introduced, where a novel mathematical model is proposed. The haplotypes that are used to reconstruct the observed genotypes of a group of unrelated individuals are detected and the haplotype pair for each individual in the group is estimated. The model translates a biological principle, the maximum parsimony principle, to a sparseness condition.


  • thumnail for Jajamovich_columbia_0054D_10588.pdf Jajamovich_columbia_0054D_10588.pdf application/pdf 932 KB Download File

More About This Work

Academic Units
Electrical Engineering
Thesis Advisors
Wang, Xiaodong
Ph.D., Columbia University
Published Here
March 21, 2012