Meta-analysis of muscle transcriptome data using the MADMuscle database reveals biologically relevant gene patterns
1 INSERM, U915, Nantes, F-44000 France
2 Université de Nantes, Faculté de Médecine, Nantes, F-44000, France
3 CHU de Nantes, l'Institut du Thorax, CIC, Nantes, F-44000, France
4 CHU de Nantes, Laboratoire d'Explorations Fonctionnelles, Nantes, F-44000, France
5 CHU de Nantes, Centre de Référence des Maladies Neuromusculaires Rares de l'Enfant et de l'Adulte Nantes-Angers, Nantes, F-44000, France
6 Institut Jacques Monod, UMR7592-CNRS, Paris, F-75013, France
7 Université Paris Diderot-Paris 7, Paris, F-75013, France
8 INSERM, UMR 694, Angers, F-49033, France
9 Université d'Angers, Angers, F-49033, France
10 CHU Angers, Laboratoire de Biochimie et Biologie moléculaire, Angers, F-49033, France
11 Laboratoire d'Informatique de Nantes Atlantique LINA, Ecole Polytechnique, Nantes, F-44000, France
Citation and License
BMC Genomics 2011, 12:113 doi:10.1186/1471-2164-12-113Published: 16 February 2011
DNA microarray technology has had a great impact on muscle research and microarray gene expression data has been widely used to identify gene signatures characteristic of the studied conditions. With the rapid accumulation of muscle microarray data, it is of great interest to understand how to compare and combine data across multiple studies. Meta-analysis of transcriptome data is a valuable method to achieve it. It enables to highlight conserved gene signatures between multiple independent studies. However, using it is made difficult by the diversity of the available data: different microarray platforms, different gene nomenclature, different species studied, etc.
We have developed a system tool dedicated to muscle transcriptome data. This system comprises a collection of microarray data as well as a query tool. This latter allows the user to extract similar clusters of co-expressed genes from the database, using an input gene list. Common and relevant gene signatures can thus be searched more easily. The dedicated database consists in a large compendium of public data (more than 500 data sets) related to muscle (skeletal and heart). These studies included seven different animal species from invertebrates (Drosophila melanogaster, Caenorhabditis elegans) and vertebrates (Homo sapiens, Mus musculus, Rattus norvegicus, Canis familiaris, Gallus gallus). After a renormalization step, clusters of co-expressed genes were identified in each dataset. The lists of co-expressed genes were annotated using a unified re-annotation procedure. These gene lists were compared to find significant overlaps between studies.
Applied to this large compendium of data sets, meta-analyses demonstrated that conserved patterns between species could be identified. Focusing on a specific pathology (Duchenne Muscular Dystrophy) we validated results across independent studies and revealed robust biomarkers and new pathways of interest. The meta-analyses performed with MADMuscle show the usefulness of this approach. Our method can be applied to all public transcriptome data.