BMC Bioinformatics

official impact factor 3.03

Open Access Highly Access Methodology article

Literature-aided meta-analysis of microarray data: a compendium study on muscle development and disease

Rob Jelier1,2, Peter AC 't Hoen2*, Ellen Sterrenburg2, Johan T den Dunnen2, Gert-Jan B van Ommen2, Jan A Kors1 and Barend Mons1

Author Affiliations

1 Department of Medical Informatics, Erasmus MC University Medical Center, Rotterdam, The Netherlands

2 Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands

For all author emails, please log on.

BMC Bioinformatics 2008, 9:291 doi:10.1186/1471-2105-9-291

Published: 24 June 2008

Abstract

Background

Comparative analysis of expression microarray studies is difficult due to the large influence of technical factors on experimental outcome. Still, the identified differentially expressed genes may hint at the same biological processes. However, manually curated assignment of genes to biological processes, such as pursued by the Gene Ontology (GO) consortium, is incomplete and limited. We hypothesised that automatic association of genes with biological processes through thesaurus-controlled mining of Medline abstracts would be more effective. Therefore, we developed a novel algorithm (LAMA: Literature-Aided Meta-Analysis) to quantify the similarity between transcriptomics studies. We evaluated our algorithm on a large compendium of 102 microarray studies published in the field of muscle development and disease, and compared it to similarity measures based on gene overlap and over-representation of biological processes assigned by GO.

Results

While the overlap in both genes and overrepresented GO-terms was poor, LAMA retrieved many more biologically meaningful links between studies, with substantially lower influence of technical factors. LAMA correctly grouped muscular dystrophy, regeneration and myositis studies, and linked patient and corresponding mouse model studies. LAMA also retrieves the connecting biological concepts. Among other new discoveries, we associated cullin proteins, a class of ubiquitinylation proteins, with genes down-regulated during muscle regeneration, whereas ubiquitinylation was previously reported to be activated during the inverse process: muscle atrophy.

Conclusion

Our literature-based association analysis is capable of finding hidden common biological denominators in microarray studies, and circumvents the need for raw data analysis or curated gene annotation databases.