Combining Pairwise Sequence Similarity and Support Vector Machines for Remote Protein Homology Detection

Liao, Li; Noble, William Stafford

One key element in understanding the molecular machinery of the cell is to understand the meaning, or function, of each protein encoded in the genome. A very successful means of inferring the function of a previously unannotated protein is via sequence similarity with one or more proteins whose functions are already known. Currently, one of the most powerful such homology detection methods is the SVM-Fisher method of Jaakkola, Diekhans and Haussler (ISMB 2000). This method combines a generative, profile hidden Markov model (HMM) with a discriminative classification algorithm known as a support vector machine (SVM). The current work presents an alternative method for SVM-based protein classification. The method, SVM-pairwise, uses a pairwise sequence similarity algorithm such as Smith-Waterman in place of the HMM in the SVM-Fisher method. The resulting algorithm, when tested on its ability to recognize previously unseen families from the SCOP database, yields significantly better remote protein homology detection than SVM-Fisher, profile HMMs and PSI-BLAST.



More About This Work

Academic Units
Computer Science
Department of Computer Science, Columbia University
Columbia University Computer Science Technical Reports, CUCS-012-01
Published Here
April 22, 2011