Log on / register
Feedback | Support | My details
Open AccessResearch article

Genome-wide identification of coding and non-coding conserved sequence tags in human and mouse genomes

Flavio Mignone1 email, Anna Anselmo2 email, Giacinto Donvito3 email, Giorgio P Maggi3 email, Giorgio Grillo4 email and Graziano Pesole4,5 email

1Department of Structural Chemistry and Inorganic Stereochemistry, School of Pharmacy, University of Milan, Italy

2Department of Biomolecular Sciences and Biotechnology, University of Milan, Italy

3National Institute of Nuclear Physics, Bari, Italy

4Istituto Tecnologie Biomediche, Consiglio Nazionale delle Ricerche, Bari, Italy

5Dipartimento di Biochimica e Biologia Molecolare, University of Bari, Italy

author email corresponding author email

BMC Genomics 2008, 9:277doi:10.1186/1471-2164-9-277

Published: 11 June 2008

Abstract

Background

The accurate detection of genes and the identification of functional regions is still an open issue in the annotation of genomic sequences. This problem affects new genomes but also those of very well studied organisms such as human and mouse where, despite the great efforts, the inventory of genes and regulatory regions is far from complete. Comparative genomics is an effective approach to address this problem. Unfortunately it is limited by the computational requirements needed to perform genome-wide comparisons and by the problem of discriminating between conserved coding and non-coding sequences. This discrimination is often based (thus dependent) on the availability of annotated proteins.

Results

In this paper we present the results of a comprehensive comparison of human and mouse genomes performed with a new high throughput grid-based system which allows the rapid detection of conserved sequences and accurate assessment of their coding potential. By detecting clusters of coding conserved sequences the system is also suitable to accurately identify potential gene loci.

Following this analysis we created a collection of human-mouse conserved sequence tags and carefully compared our results to reliable annotations in order to benchmark the reliability of our classifications. Strikingly we were able to detect several potential gene loci supported by EST sequences but not corresponding to as yet annotated genes.

Conclusion

Here we present a new system which allows comprehensive comparison of genomes to detect conserved coding and non-coding sequences and the identification of potential gene loci. Our system does not require the availability of any annotated sequence thus is suitable for the analysis of new or poorly annotated genomes.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.