Home

Combining Microarray Expression Data and Phylogenetic Profiles to Learn Gene Functional Categories Using Support Vector Machines

Paul Pavlidis; William Noble Grundy

Title:
Combining Microarray Expression Data and Phylogenetic Profiles to Learn Gene Functional Categories Using Support Vector Machines
Author(s):
Pavlidis, Paul
Grundy, William Noble
Date:
Type:
Technical reports
Department:
Computer Science
Permanent URL:
Series:
Columbia University Computer Science Technical Reports
Part Number:
CUCS-011-00
Publisher:
Department of Computer Science, Columbia University
Publisher Location:
New York
Abstract:
A primary goal in biology is to understand the molecular machinery of the cell. The sequencing projects currently underway provide one view of this machinery. A complementary view is provided by data from DNA microarray hybridization experiments. Synthesizing the information from these disparate types of data requires the development of improved computational techniques. We demonstrate how to apply a machine learning algorithm called support vector machines to a heterogeneous data set consisting of expression data as well as phylogenetic profiles derived from sequence similarity searches against a collection of complete genomes. The two types of data provide accurate pictures of overlapping subsets of the gene functional categories present in the cell. Combining the expression data and phylogenetic profiles within a single learning algorithm frequently yields superior classification performance compared to using either data set alone. However, the improvement is not uniform across functional classes. For the data sets investigated here, 23-element phylogenetic profiles typically provide more information than 79-element expression vectors. Often, adding expression data to the phylogenetic profiles introduces more noise than information. Thus, these two types of data should only be combined when there is evidence that the functional classification of interest is clearly reflected in both data sets.
Subject(s):
Computer science
Item views:
342
Metadata:
text | xml

In Partnership with the Center for Digital Research and Scholarship at Columbia University Libraries/Information Services | Terms of Use