BMC Genomics

official impact factor 4.21

Open Access Research article

Analysis of CATMA transcriptome data identifies hundreds of novel functional genes and improves gene models in the Arabidopsis genome

Sébastien Aubourg1*, Marie-Laure Martin-Magniette1,2, Véronique Brunaud1, Ludivine Taconnat1, Frédérique Bitton1, Sandrine Balzergue1, Pauline E Jullien3, Mathieu Ingouff3, Vincent Thareau4, Thomas Schiex5, Alain Lecharny1,4 and Jean-Pierre Renou1*

Author Affiliations

1 Unité de Recherche en Génomique Végétale (URGV), UMR INRA 1165-CNRS 8114-UEVE, 2 Rue Gaston Crémieux, 91057 Evry Cedex, France

2 Unité de Mathématiques et Informatique Appliquées (MIA), UMR AgroParisTech-INRA518, 16 Rue Claude Bernard, 75231 Paris Cedex, France

3 Chromatin and Reproduction group, Temasek Lifesciences Laboratory, 1 Research Link, 117604 Singapore

4 Université Paris-Sud, Institut de Biotechnologie des Plantes (IBP), UMR CNRS-UPS, Bâtiment 630, 91405 Orsay Cedex, France

5 Unité de Biométrie et Intelligence Artificielle (BIA), INRA, Chemin de Borde-Rouge-Auzeville, 31326 Castanet-Tolosan Cedex, France

For all author emails, please log on.

BMC Genomics 2007, 8:401 doi:10.1186/1471-2164-8-401

Published: 2 November 2007

Additional files

Additional file 1:

Information about identification and function of the 465 novel genes. 1: ID of GSTs selected outside TAIR models and exhibiting a transcription signal (with a web link to the CATdb database [16] for additional information). 2: Validation of the expression by sequencing RT-PCR product. 3: ID of CDS models proposed by Eugene at this locus. 4: ID of the gene upstream from the new gene. 5: Correction made in the recent TAIR annotation release 7. 6: ID of the gene downstream from the new gene. 7: Presence of cognate transcripts (EST and/or cDNA from GenBank R.159). 8: Presence of cognate MPSS tags [19]. 9: Presence of cognate RACE-PCR products obtained by TIGR [20]. Accession number is mentioned. 10: Presence of PFAM motifs (ID are mentioned). 11: Presence of homolog(s) in Arabidopsis based on BLASTX (GenBank R. 159). 12: Presence of homolog(s) in other species based on BLASTX (GenBank R. 159). 13: Putative biochemical function inferred from homology with known proteins. 14: Presence of a previous annotation carried out by AGI members at the BAC level but lost in the TAIR genome annotation. 15: Number of positive hybridized mRNA samples out of the 522 transcriptomes analyzed. 16: Minimum signal intensity detected. 17: Median signal intensity detected. 18: Maximum signal intensity detected. 19: Sequence of the left primer used in RT-PCR to confirm the transcriptional activity. 20: Sequence of the right primer used in RT-PCR to confirm the transcriptional activity. 21 to 32: Presence of hybridization signal in the following organs: leaf, root, stem, aerial, cell culture, pollen, seed, whole plant, flower, protoplast, silique, hypocotyl.

Format: XLS Size: 478KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 2:

Information about extended genes. 1: ID of GSTs localized in the gene extension (CDS) and exhibiting a transcriptional signal (with a web link to the CATdb database [16] for additional information). 2: Validation of the expression by sequencing RT-PCR product. 3: ID of CDS models proposed by Eugene. 4: Correction made in the recent TAIR annotation release 7. 5: ID of the TAIR gene models. 6: Side of the extension. 7: Number of putative additional exons in CDS based on Eugene prediction. 8: Presence of cognate transcripts (EST and/or cDNA from GenBank R.159). 9: Presence of cognate MPSS tags [19]. 10: Presence of cognate RACE-PCR products obtained by TIGR [20]. Accession number is mentioned. 11: Presence of PFAM motifs (ID are mentioned). 12: Presence of homolog(s) in Arabidopsis based on BLASTX (GenBank R. 159). 13: Presence of homolog(s) in other species based on BLASTX (GenBank R. 159). 14: Putative biochemical function inferred from homology with known proteins. 15: Presence of a previous extension annotation carried out by AGI members at the BAC level but lost in the TAIR genome annotation. 16: Number of positive hybridized mRNA samples out of the 522 transcriptomes analyzed. 17: Minimum signal intensity detected. 18: Median signal intensity detected. 19: Maximum signal intensity detected. 20: Sequence of the left primer used in RT-PCR to confirm the transcription of the extension. 21: Sequence of the right primer used in RT-PCR to confirm the transcription of the extension. 22 to 33: Presence of hybridization signal in the following organs: leaf, root, stem, aerial, cell culture, pollen, seed, whole plant, flower, protoplast, silique, hypocotyl.

Format: XLS Size: 27KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 3:

Information about detected erroneous gene merging. 1 and 2: ID of the two GSTs used to detect the gene merging (with a web link to the CATdb database [16] for additional information). 3: ID of the erroneous TAIR gene models. 4: Correction made in the recent TAIR annotation release 7. 5: Number of opposite hybridized mRNA samples between the two GSTs. 6: Additional information validating the erroneous gene merging (EST, homologies, MPSS). 7: Function deduced from homology with gene 1 (cognate to the GST 1). 8: Function deduced from homology with gene 2 (cognate to the GST 2)

Format: XLS Size: 23KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 4:

Identification in GEO [27] or ArrayExpress [28] repositories of the 40 transcriptome projects used and web links to their detailed descriptions in the CATdb database [16].

Format: XLS Size: 33KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data