Identitag, a relational database for SAGE tag identification and interspecies comparison of SAGE libraries
1 Équipe Signalisation et identités cellulaires, Centre de Génétique Moléculaire et Cellulaire CNRS UMR 5534, Université Claude Bernard Lyon 1, bâtiment Gregor Mendel, 16 rue Raphaël Dubois 69622 Villeurbanne cedex France
2 Équipe Bioinformatique et génomique évolutive, Laboratoire Biométrie et Biologie Évolutive CNRS UMR 5558, Université Claude Bernard Lyon 1, bâtiment Gregor Mendel, 16 rue Raphaël Dubois, 69622 Villeurbanne cedex France
BMC Bioinformatics 2004, 5:143 doi:10.1186/1471-2105-5-143Published: 6 October 2004
Serial Analysis of Gene Expression (SAGE) is a method of large-scale gene expression analysis that has the potential to generate the full list of mRNAs present within a cell population at a given time and their frequency. An essential step in SAGE library analysis is the unambiguous assignment of each 14 bp tag to the transcript from which it was derived. This process, called tag-to-gene mapping, represents a step that has to be improved in the analysis of SAGE libraries. Indeed, the existing web sites providing correspondence between tags and transcripts do not concern all species for which numerous EST and cDNA have already been sequenced.
This is the reason why we designed and implemented a freely available tool called Identitag for tag identification that can be used in any species for which transcript sequences are available. Identitag is based on a relational database structure in order to allow rapid and easy storage and updating of data and, most importantly, in order to be able to precisely define identification parameters. This structure can be seen like three interconnected modules : the first one stores virtual tags extracted from a given list of transcript sequences, the second stores experimental tags observed in SAGE experiments, and the third allows the annotation of the transcript sequences used for virtual tag extraction. It therefore connects an observed tag to a virtual tag and to the sequence it comes from, and then to its functional annotation when available. Databases made from different species can be connected according to orthology relationship thus allowing the comparison of SAGE libraries between species. We successfully used Identitag to identify tags from our chicken SAGE libraries and for chicken to human SAGE tags interspecies comparison. Identitag sources are freely available on http://pbil.univ-lyon1.fr/software/identitag/ webcite web site.
Identitag is a flexible and powerful tool for tag identification in any single species and for interspecies comparison of SAGE libraries. It opens the way to comparative transcriptomic analysis, an emerging branch of biology.