2012 Theses Doctoral
Topics in Genomic Signal Processing
Genomic information is digital in its nature and admits mathematical modeling in order to gain biological knowledge. This dissertation focuses on the development and application of detection and estimation theories for solving problems in genomics by describing biological problems in mathematical terms and proposing a solution in this domain. More specifically, a novel framework for hypothesis testing is presented, where it is desired to decide among multiple hypotheses and where each hypothesis involves unknown parameters. Within this framework, a test is developed to perform both detection and estimation jointly in an optimal sense. The proposed test is then applied to the problem of detecting and estimating periodicities in DNA sequences. Moreover, the problem of motif discovery in DNA sequences is presented, where a set of sequences is observed and it is needed to determine which sequences contain instances (if any) of an unknown motif and estimate their positions. A statistical description of the problem is used and a sequential Monte Carlo method is applied for the inference. Finally, the phasing of haplotypes for diploid organisms is introduced, where a novel mathematical model is proposed. The haplotypes that are used to reconstruct the observed genotypes of a group of unrelated individuals are detected and the haplotype pair for each individual in the group is estimated. The model translates a biological principle, the maximum parsimony principle, to a sparseness condition.
Subjects
Files
- Jajamovich_columbia_0054D_10588.pdf application/pdf 932 KB Download File
More About This Work
- Academic Units
- Electrical Engineering
- Thesis Advisors
- Wang, Xiaodong
- Degree
- Ph.D., Columbia University
- Published Here
- March 21, 2012