Analysis of CATMA transcriptome data identifies hundreds of novel functional genes and improves gene models in the Arabidopsis genome
-
* Corresponding authors: Sébastien Aubourg aubourg@evry.inra.fr - Jean-Pierre Renou renou@evry.inra.fr
1 Unité de Recherche en Génomique Végétale (URGV), UMR INRA 1165-CNRS 8114-UEVE, 2 Rue Gaston Crémieux, 91057 Evry Cedex, France
2 Unité de Mathématiques et Informatique Appliquées (MIA), UMR AgroParisTech-INRA518, 16 Rue Claude Bernard, 75231 Paris Cedex, France
3 Chromatin and Reproduction group, Temasek Lifesciences Laboratory, 1 Research Link, 117604 Singapore
4 Université Paris-Sud, Institut de Biotechnologie des Plantes (IBP), UMR CNRS-UPS, Bâtiment 630, 91405 Orsay Cedex, France
5 Unité de Biométrie et Intelligence Artificielle (BIA), INRA, Chemin de Borde-Rouge-Auzeville, 31326 Castanet-Tolosan Cedex, France
BMC Genomics 2007, 8:401 doi:10.1186/1471-2164-8-401
Published: 2 November 2007Additional files
Additional file 1:
Information about identification and function of the 465 novel genes. 1: ID of GSTs selected outside TAIR models and exhibiting a transcription signal (with a web link to the CATdb database [16] for additional information). 2: Validation of the expression by sequencing RT-PCR product. 3: ID of CDS models proposed by Eugene at this locus. 4: ID of the gene upstream from the new gene. 5: Correction made in the recent TAIR annotation release 7. 6: ID of the gene downstream from the new gene. 7: Presence of cognate transcripts (EST and/or cDNA from GenBank R.159). 8: Presence of cognate MPSS tags [19]. 9: Presence of cognate RACE-PCR products obtained by TIGR [20]. Accession number is mentioned. 10: Presence of PFAM motifs (ID are mentioned). 11: Presence of homolog(s) in Arabidopsis based on BLASTX (GenBank R. 159). 12: Presence of homolog(s) in other species based on BLASTX (GenBank R. 159). 13: Putative biochemical function inferred from homology with known proteins. 14: Presence of a previous annotation carried out by AGI members at the BAC level but lost in the TAIR genome annotation. 15: Number of positive hybridized mRNA samples out of the 522 transcriptomes analyzed. 16: Minimum signal intensity detected. 17: Median signal intensity detected. 18: Maximum signal intensity detected. 19: Sequence of the left primer used in RT-PCR to confirm the transcriptional activity. 20: Sequence of the right primer used in RT-PCR to confirm the transcriptional activity. 21 to 32: Presence of hybridization signal in the following organs: leaf, root, stem, aerial, cell culture, pollen, seed, whole plant, flower, protoplast, silique, hypocotyl.
Format: XLS Size: 478KB Download file
This file can be viewed with: Microsoft Excel Viewer
Additional file 2:
Information about extended genes. 1: ID of GSTs localized in the gene extension (CDS) and exhibiting a transcriptional signal (with a web link to the CATdb database [16] for additional information). 2: Validation of the expression by sequencing RT-PCR product. 3: ID of CDS models proposed by Eugene. 4: Correction made in the recent TAIR annotation release 7. 5: ID of the TAIR gene models. 6: Side of the extension. 7: Number of putative additional exons in CDS based on Eugene prediction. 8: Presence of cognate transcripts (EST and/or cDNA from GenBank R.159). 9: Presence of cognate MPSS tags [19]. 10: Presence of cognate RACE-PCR products obtained by TIGR [20]. Accession number is mentioned. 11: Presence of PFAM motifs (ID are mentioned). 12: Presence of homolog(s) in Arabidopsis based on BLASTX (GenBank R. 159). 13: Presence of homolog(s) in other species based on BLASTX (GenBank R. 159). 14: Putative biochemical function inferred from homology with known proteins. 15: Presence of a previous extension annotation carried out by AGI members at the BAC level but lost in the TAIR genome annotation. 16: Number of positive hybridized mRNA samples out of the 522 transcriptomes analyzed. 17: Minimum signal intensity detected. 18: Median signal intensity detected. 19: Maximum signal intensity detected. 20: Sequence of the left primer used in RT-PCR to confirm the transcription of the extension. 21: Sequence of the right primer used in RT-PCR to confirm the transcription of the extension. 22 to 33: Presence of hybridization signal in the following organs: leaf, root, stem, aerial, cell culture, pollen, seed, whole plant, flower, protoplast, silique, hypocotyl.
Format: XLS Size: 27KB Download file
This file can be viewed with: Microsoft Excel Viewer
Additional file 3:
Information about detected erroneous gene merging. 1 and 2: ID of the two GSTs used to detect the gene merging (with a web link to the CATdb database [16] for additional information). 3: ID of the erroneous TAIR gene models. 4: Correction made in the recent TAIR annotation release 7. 5: Number of opposite hybridized mRNA samples between the two GSTs. 6: Additional information validating the erroneous gene merging (EST, homologies, MPSS). 7: Function deduced from homology with gene 1 (cognate to the GST 1). 8: Function deduced from homology with gene 2 (cognate to the GST 2)
Format: XLS Size: 23KB Download file
This file can be viewed with: Microsoft Excel Viewer
Additional file 4:
Identification in GEO [27] or ArrayExpress [28] repositories of the 40 transcriptome projects used and web links to their detailed descriptions in the CATdb database [16].
Format: XLS Size: 33KB Download file
This file can be viewed with: Microsoft Excel Viewer
