Email updates

Keep up to date with the latest news and content from BMC Systems Biology and BioMed Central.

This article is part of the supplement: The International Conference on Intelligent Biology and Medicine (ICIBM): Systems Biology

Open Access Research

Revealing functionally coherent subsets using a spectral clustering and an information integration approach

Adam J Richards12*, John H Schwacke1, Bärbel Rohrer3, L Ashley Cowart1 and Xinghua Lu14*

Author affiliations

1 Department of Biochemistry and Molecular Biology, Medical University of South Carolina, Charleston, SC 29425, USA

2 Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27705, USA

3 Departments of Ophthalmology and Neurosciences, Medical University of South Carolina, Charleston, SC 29425, USA

4 Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15232, USA

For all author emails, please log on.

Citation and License

BMC Systems Biology 2012, 6(Suppl 3):S7  doi:10.1186/1752-0509-6-S3-S7

Published: 17 December 2012

Abstract

Background

Contemporary high-throughput analyses often produce lengthy lists of genes or proteins. It is desirable to divide the genes into functionally coherent subsets for further investigation, by integrating heterogeneous information regarding the genes. Here we report a principled approach for managing and integrating multiple data sources within the framework of graph-spectrum analysis in order to identify coherent gene subsets.

Results

We investigated several approaches to integrate information derived from different sources that reflect distinct aspects of gene functional relationships including: functional annotations of genes in the form of the Gene Ontology, co-mentioning of genes in the literature, and shared transcription factor binding sites among genes. Given a list of genes, we construct a graph containing the genes in each information space; then the graphs were kernel transformed so they could be integrated; finally functionally coherent subsets were identified using a spectral clustering algorithm. In a series of simulation experiments, known functionally coherent gene sets were mixed and recovered using our approach.

Conclusions

The results indicate that spectral clustering approaches are capable of recovering coherent gene modules even under noisy conditions, and that information integration serves to further enhance this capability. When applied to a real-world data set, our methods revealed biologically sensible modules, and highlighted the importance of information integration. The implementation of the statistical model is provided under the GNU general public license, as an installable Python module, at: http://code.google.com/p/spectralmix webcite.