Log on / register
Feedback | Support | My details
Open AccessHighly AccessMethodology article

Building pathway clusters from Random Forests classification using class votes

Herbert Pang1 email and Hongyu Zhao1,2 email

1Division of Biostatistics, Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT 06520, USA

2Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA

author email corresponding author email

BMC Bioinformatics 2008, 9:87doi:10.1186/1471-2105-9-87

Published: 6 February 2008

Abstract

Background

Recent years have seen the development of various pathway-based methods for the analysis of microarray gene expression data. These approaches have the potential to bring biological insights into microarray studies. A variety of methods have been proposed to construct networks using gene expression data. Because individual pathways do not act in isolation, it is important to understand how different pathways coordinate to perform cellular functions. However, there are no published methods describing how to build pathway clusters that are closely related to traits of interest.

Results

We propose to build pathway clusters from pathway-based classification methods. The proposed methods allow researchers to identify clusters of pathways sharing similar functions. These pathways may or may not share genes. As an illustration, our approach is applied to three human breast cancer microarray data sets. We found that our methods yielded consistent and interpretable results for these three data sets. We further investigated one of the pathway clusters found using PubMatrix. We found that informative genes in the pathway clusters do have more publications with keywords, like estrogen receptor, compared with informative genes in other top pathways. In addition, using the shortest path analysis in GeneGo's MetaCore and Human Protein Reference Database, we were able to identify the links which connect the pathways without shared genes within the pathway cluster.

Conclusion

Our proposed pathway clustering methods allow bioinformaticians and biologists to investigate how informative genes within pathways are related to each other and understand possible crosstalk between pathways in a cluster. Therefore, building pathway clusters may lead to a better understanding of molecular mechanisms affecting a trait of interest, and help generate further biological hypotheses from gene expression data.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.