|
| This article is part of the supplement: UT-ORNL-KBRIN Bioinformatics Summit 2009 .LSI based framework to predict gene regulatory information1Department of Biology, University of Memphis, Memphis, TN 38152, USA 2Bioinformatics Program, University of Memphis, Memphis, TN 38152, USA
from UT-ORNL-KBRIN Bioinformatics Summit 2009 BMC Bioinformatics 2009, 10(Suppl 7):A5doi:10.1186/1471-2105-10-S7-A5 The electronic version of this abstract is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/10/S7/A5
© 2009 Roy et al; licensee BioMed Central Ltd. BackgroundLatent Semantic Indexing (LSI), a vector space model for information retrieval, has shown promise in predicting functional relationships between genes using textual information in MEDLINE abstracts. The underlying principle is that genes may be represented as document vectors in a multi-dimensional hyperspace, and the conceptual relationship between any two genes is determined by the cosine of the angle between their vectors [1]. In this study, we sought to extend this concept for identification of putative transcription factors (TFs) that regulate a group of co-regulated genes. We hypothesized that co-expressed genes identified by microarray experiments are functionally related and that at least some of these genes have previously been linked explicitly or implicitly to TFs in the literature. A transcriptional module is then defined as a set of genes clustered together in LSI space with closely related TFs (Figure 1). We devised a framework using these assumptions to identify transcriptional modules from microarray and promoter motif data (Figure 2). The framework requires as input, co-expressed genes from a microarray dataset and a set of TFs that have consensus motifs in the promoter regions of the co-expressed genes. Usually the set of such motif-derived TFs is large and makes the identification of the critical ones difficult. The framework first identifies functionally related clusters of co-expressed genes based on their latent relationships from literature, and then adds to each cluster TFs that are closely associated with the genes in the cluster. The putative transcriptional modules are ranked based on the degree of relative literature coherence amongst the entities in them.
Results and discussionThe LSI-based algorithm allows prediction of TFs based on latent (implicit) relationships in the literature. A preliminary evaluation of our method using previously published knock-out experiments revealed that it has reasonable recall and precision. A more rigorous evaluation of the method will require several additional TF knock-out microarray experiments. This work provides proof of principle that the combination of motif analysis and LSI may be used to identify putative transcriptional modules from microarray data. References
Have something to say? Post a comment on this article! |



on Google Scholar







author email
corresponding author email
Figure 1.
Figure 2.