Hierarchical Dirichlet process model for gene expression clustering

Wang, Xiaodong; Wang, Liming

Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments.



  • thumnail for application/zip 557 KB Download File

Also Published In

EURASIP Journal on Bioinformatics and Systems Biology

More About This Work

Academic Units
Electrical Engineering
Published Here
September 8, 2014