Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

Inferring biological functions and associated transcriptional regulators using gene set expression coherence analysis

Tae-Min Kim12, Yeun-Jun Chung23, Mun-Gan Rhyu2 and Myeong Ho Jung14*

Author Affiliations

1 Division of Metabolic Disease, Center for Biomedical Science, National Institute of Health, Nokbun-dong 5, Eunpyung-gu, Seoul, Republic of Korea

2 Department of Microbiology, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea

3 Integrated Research Center for Genome Polymorphism, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea

4 School of Oriental Medicine, Pusan National University, Busan, Republic of Korea

For all author emails, please log on.

BMC Bioinformatics 2007, 8:453  doi:10.1186/1471-2105-8-453

Published: 17 November 2007

Abstract

Background

Gene clustering has been widely used to group genes with similar expression pattern in microarray data analysis. Subsequent enrichment analysis using predefined gene sets can provide clues on which functional themes or regulatory sequence motifs are associated with individual gene clusters. In spite of the potential utility, gene clustering and enrichment analysis have been used in separate platforms, thus, the development of integrative algorithm linking both methods is highly challenging.

Results

In this study, we propose an algorithm for discovery of molecular functions and elucidation of transcriptional logics using two kinds of gene information, functional and regulatory motif gene sets. The algorithm, termed gene set expression coherence analysis first selects functional gene sets with significantly high expression coherences. Those candidate gene sets are further processed into a number of functionally related themes or functional clusters according to the expression similarities. Each functional cluster is then, investigated for the enrichment of transcriptional regulatory motifs using modified gene set enrichment analysis and regulatory motif gene sets. The method was tested for two publicly available expression profiles representing murine myogenesis and erythropoiesis. For respective profiles, our algorithm identified myocyte- and erythrocyte-related molecular functions, along with the putative transcriptional regulators for the corresponding molecular functions.

Conclusion

As an integrative and comprehensive method for the analysis of large-scaled gene expression profiles, our method is able to generate a set of testable hypotheses: the transcriptional regulator X regulates function Y under cellular condition Z. GSECA algorithm is implemented into freely available software package.