Theme discovery from gene lists for identification and viewing of multiple functional groups
1 Department of Neurobiology, A.I. Virtanen-Institute, University of Kuopio P.O. Box 1627, FIN-70211 Kuopio, Finland
2 Department of Computer Science, University of Kuopio P.O. Box 1627, FIN-70211 Kuopio, Finland
3 Bioinformatics Group, Institute of Biotechnology, P.O. Box 56, 00014 University of Helsinki, Finland
BMC Bioinformatics 2005, 6:162 doi:10.1186/1471-2105-6-162Published: 29 June 2005
High throughput methods of the genome era produce vast amounts of data in the form of gene lists. These lists are large and difficult to interpret without advanced computational or bioinformatic tools. Most existing methods analyse a gene list as a single entity although it is comprised of multiple gene groups associated with separate biological functions. Therefore it is imperative to define and visualize gene groups with unique functionality within gene lists.
In order to analyse the functional heterogeneity within a gene list, we have developed a method that clusters genes to groups with homogenous functionalities. The method uses Non-negative Matrix Factorization (NMF) to create several clustering results with varying numbers of clusters. The obtained clustering results are combined into a simple graphical presentation showing the functional groups over-represented in the analyzed gene list. We demonstrate its performance on two data sets and show results that improve upon existing methods. The comparison also shows that our method creates a more simplified view that aids in discovery of biological themes within the list and discards less informative classes from the results.
The presented method and associated software are useful for the identification and interpretation of biological functions associated with gene lists and are especially useful for the analysis of large lists.