Email updates

Keep up to date with the latest news and content from BMC Systems Biology and BioMed Central.

Open Access Highly Accessed Methodology article

A search engine to identify pathway genes from expression data on multiple organisms

Chunnuan Chen1, Matthew T Weirauch1, Corey C Powell1, Alexander C Zambon2 and Joshua M Stuart1*

Author Affiliations

1 Department of Biomolecular Engineering, University of California, Santa Cruz, California, 95064, USA

2 Department of Medicine, Gladstone Institute of Cardiovascular Disease, San Francisco, California 94158, USA

For all author emails, please log on.

BMC Systems Biology 2007, 1:20  doi:10.1186/1752-0509-1-20

Published: 4 May 2007



The completion of several genome projects showed that most genes have not yet been characterized, especially in multicellular organisms. Although most genes have unknown functions, a large collection of data is available describing their transcriptional activities under many different experimental conditions. In many cases, the coregulatation of a set of genes across a set of conditions can be used to infer roles for genes of unknown function.


We developed a search engine, the Multiple-Species Gene Recommender (MSGR), which scans gene expression datasets from multiple organisms to identify genes that participate in a genetic pathway. The MSGR takes a query consisting of a list of genes that function together in a genetic pathway from one of six organisms: Homo sapiens, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Arabidopsis thaliana, and Helicobacter pylori. Using a probabilistic method to merge searches, the MSGR identifies genes that are significantly coregulated with the query genes in one or more of those organisms. The MSGR achieves its highest accuracy for many human pathways when searches are combined across species. We describe specific examples in which new genes were identified to be involved in a neuromuscular signaling pathway and a cell-adhesion pathway.


The search engine can scan large collections of gene expression data for new genes that are significantly coregulated with a pathway of interest. By integrating searches across organisms, the MSGR can identify pathway members whose coregulation is either ancient or newly evolved.