Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: UT-ORNL-KBRIN Bioinformatics Summit 2008

Open Access Poster presentation

Using a literature-based NMF model for discovering gene functional relationships

Elina Tjioe1, Michael Berry2*, Ramin Homayouni3 and Kevin Heinrich4

Author affiliations

1 Genome Science and Technology Graduate School, University of Tennessee, Knoxville, TN 37996, USA

2 Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN 37996, USA

3 Bioinformatics Program, Department of Biology, University of Memphis, Memphis, TN 38152, USA

4 Computable Genomix LLC, Bartlett, TN 38133, USA

For all author emails, please log on.

Citation and License

BMC Bioinformatics 2008, 9(Suppl 7):P1  doi:10.1186/1471-2105-9-S7-P1

The electronic version of this article is the complete one and can be found online at:

Published:8 July 2008

© 2008 Tjioe et al; licensee BioMed Central Ltd.


The rapid growth of the biomedical literature and genomic information present a major challenge for determining the functional relationships among genes. Several bioinformatics tools have been developed to extract and identify gene relationships from various biological databases. In this study, we develop a Web-based bioinformatics tool called Feature Annotation Using Nonnegative matrix factorization (FAUN) to facilitate both the discovery and classification of functional relationships among genes. The algorithms of nonnegative matrix factorization (NMF) described in [1] are used. Both the computational complexity and parameterization of NMF for processing gene sets are discussed. FAUN is first tested on a small manually constructed 50-gene (50TG) collection that we, as well as others, have previously used [2]. The screenshots of FAUN feature classification and gene-to-gene correlation for the 50TG collection are shown in Figures 1 and 2. We then apply FAUN to analyze several microarray-derived gene sets obtained from studies of the developing cerebellum in normal and mutant mice. FAUN provides utilities for collaborative knowledge discovery and identification of new gene relationships from text streams and repositories (e.g. MEDLINE). It is particularly useful for the validation and analysis of gene associations suggested by microarray experimentation. FAUN tool is publicly available at webcite.

thumbnailFigure 1. FAUN screenshot 1. The upper window shows the classification features for 50TG collection and their top associated terms; the lower window shows use of dominant terms across genes highly associated with the user-selected feature.

thumbnailFigure 2. FAUN screenshot 2. The right window shows the correlation between genes highly associated with the user-selected feature; the left window shows the feature strength for the genes from the user-selected correlation cell (pointed by arrow).


For a preliminary assessment of FAUN feature classification, each gene in the 50TG collection was classified based on its most dominant annotated feature or based on some feature weight threshold. The FAUN classification using the strongest feature (per gene) yielded 90% accuracy. A FAUN-based analysis of a new cerebellum gene set has revealed new knowledge – the gene set contains a large component of transcription factors.


This work is supported by an NIH-subcontract (HD052472) involving the University of Tennessee, University of Memphis, Oak Ridge National Laboratory, and the University of British Columbia.


  1. Berry MW, Browne M, Langville AN, Pauca VP, Plemmons RJ: Algorithms and Applications for Approximate Nonnegative Matrix Factorization.

    Computational Statistics & Data Analysis 2007, 52(1):155-173. Publisher Full Text OpenURL

  2. Homayouni R, Heinrich K, Wei L, Berry MW: Gene clustering by latent semantic indexing of MEDLINE abstracts.

    Bioinformatics 2005, 21(1):104-115. PubMed Abstract | Publisher Full Text OpenURL