Academic Commons

Articles

Hierarchical Dirichlet process model for gene expression clustering

Wang, Xiaodong; Wang, Liming

Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments.

Subjects

Files

  • thumnail for 5d67fb7d8f972e21fe74cbecd2e8dec0.zip 5d67fb7d8f972e21fe74cbecd2e8dec0.zip binary/octet-stream 557 KB Download File

Also Published In

Title
EURASIP Journal on Bioinformatics and Systems Biology
DOI
https://doi.org/10.1186/1687-4153-2013-5

More About This Work

Academic Units
Electrical Engineering
Publisher
Springer
Published Here
September 8, 2014
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.