Genome-wide identification of coding and non-coding conserved sequence tags in human and mouse genomes
1 Department of Structural Chemistry and Inorganic Stereochemistry, School of Pharmacy, University of Milan, Italy
2 Department of Biomolecular Sciences and Biotechnology, University of Milan, Italy
3 National Institute of Nuclear Physics, Bari, Italy
4 Istituto Tecnologie Biomediche, Consiglio Nazionale delle Ricerche, Bari, Italy
5 Dipartimento di Biochimica e Biologia Molecolare, University of Bari, Italy
BMC Genomics 2008, 9:277 doi:10.1186/1471-2164-9-277Published: 11 June 2008
The accurate detection of genes and the identification of functional regions is still an open issue in the annotation of genomic sequences. This problem affects new genomes but also those of very well studied organisms such as human and mouse where, despite the great efforts, the inventory of genes and regulatory regions is far from complete. Comparative genomics is an effective approach to address this problem. Unfortunately it is limited by the computational requirements needed to perform genome-wide comparisons and by the problem of discriminating between conserved coding and non-coding sequences. This discrimination is often based (thus dependent) on the availability of annotated proteins.
In this paper we present the results of a comprehensive comparison of human and mouse genomes performed with a new high throughput grid-based system which allows the rapid detection of conserved sequences and accurate assessment of their coding potential. By detecting clusters of coding conserved sequences the system is also suitable to accurately identify potential gene loci.
Following this analysis we created a collection of human-mouse conserved sequence tags and carefully compared our results to reliable annotations in order to benchmark the reliability of our classifications. Strikingly we were able to detect several potential gene loci supported by EST sequences but not corresponding to as yet annotated genes.
Here we present a new system which allows comprehensive comparison of genomes to detect conserved coding and non-coding sequences and the identification of potential gene loci. Our system does not require the availability of any annotated sequence thus is suitable for the analysis of new or poorly annotated genomes.