Open Access Research article

Comparative analysis of grapevine whole-genome gene predictions, functional annotation, categorization and integration of the predicted gene sequences

Jérôme Grimplet1*, John Van Hemert2, Pablo Carbonell-Bejerano13, José Díaz-Riquelme1, Julie Dickerson2, Anne Fennell4, Mario Pezzotti5 and José M Martínez-Zapater13

Author Affiliations

1 Instituto de Ciencias de la Vid y del Vino (CSIC, Universidad de La Rioja, Gobierno de La Rioja), CCT, C/Madre de Dios 51, Logroño, España, 26006, Vietnam

2 Bioinformatics and Computational Biology Department, Iowa State University, Ames, IA, 50011, USA

3 Departamento de Genética Molecular de Plantas, Centro Nacional de Biotecnología, (CNB-CSIC), C/Darwin 3, Madrid, España, 28049, Vietnam

4 Plant Science Department, South Dakota State University, Brookings, SD, 57007, USA

5 Department of Biotechnology, University of Verona, Strada le Grazie 15, Verona, 37134, Italy

For all author emails, please log on.

BMC Research Notes 2012, 5:213  doi:10.1186/1756-0500-5-213

Published: 3 May 2012

Additional files

Additional file 1 :

The complete grape gene annotation and correspondence between the sets of sequences.Unique ID: ID from the highest priority level available for a unique gene sequence (priority order 12X sequence v1 > 12X sequence v0 > 8X sequence > EST from DFCI Grape gene index v5 > EST from Grapegen microarrays); gene name followed by an underscore and a number are theoretic genes corresponding to new genes that were incorrectly merged. Old 12Xv1 name: former name utilized for the v1 of the 12X sequence. 12Xv0 ID: ID from the v0 of the 12X assembly. Identical genes in 8X or other EST: ID of the corresponding gene from the 8X sequencing or EST sequence that does not match an 8X gene. Probeset grapegen: probeset ID for the Affymetrix GrapeGen Vitis vinifera Genome Array. Chromosome position 12X: position of the gene on chromosome in the 12X sequencing assembly; the first part separated by underscore corresponds to the chromosome number, the middle part to the beginning position, and the last part to the end position. Cardinality between 12Xv0 and 12Xv1: Comment about the accuracy of the gene prediction inferred from the v0 to v1 comparison; “merge” indicates that multiple sequences of the v0 match one sequence of the v1, “partial” indicates that multiple sequences of the v1 match one sequence of the v0, numbers indicate how may genes from one set match one gene from the other set. Cardinality between 8X and 12Xv1: Comment about the accuracy of the gene prediction inferred from the 8X to 12X comparison; “merge” indicates that multiple sequences of the 8X assembly match one sequence of the 12Xv1 (unless noted otherwise the 12X assembly gene is correct), “To split” indicates that the 12X gene is incorrect and needs to be split (if there are more than 2 genes, those that need to be grouped are indicated by order in the column “Identical genes in 8X or other EST”), “redundant” indicates multiple12X genes matching a single 8X gene on the same position, XX indicates no match between 12X and 8X, OK indicates a one-to-one relationship between 12X and 8X, “OK (Split)” indicates a 12X gene matching an 8X gene that was an incorrect merging of multiple genes, “Ls” indicates a low score between the matches even though they seem to be correct. Track 12Xv1: the track of the 12Xv1 assembly, either the main track (v1) or the repeat track (v1_r). Functional annotation: tentative functional annotation; briefname, EC or Kegg ID: the identifier that is used in the networks. Network: list of the VitisNet networks in which the gene appears. Functional category: each functional category assigned to the gene. There are up to seven categories for a single gene. Best Arabidopsis match: best matched hit in Arabidopsis putative proteins. Gene Ontology (GO): list of the identified GO terms and their description. Plant Ontology (PO): list of the identified PO terms and their description. Pfam: list of the domains detected from Pfam. Smart: list of the domains detected from Smart. Prosite: list of the domains detected from Prosite. Psort: list of the cellular localization detected from Psort. InterPro domain: list of the domains detected from Interpro. Accession UniProt for published grapevine protein: UniProt ID for grapevine proteins individually published apart from the genome sequencing. Chromosome position 8X: position of the gene on chromosome in the 8X sequencing assembly. Other Vitis: presence in non-vinifera Vitis species. cDNA array: ID used in the cDNA array from Mathiason et al. (2009). TC from VVGI5: list of other TC from the DFCI matching the gene. GeneChip probesets: probeset ID for the Affymetrix GeneChip® Vitis vinifera (Grape) Genome Array. Best match against proteins with evidence at protein level: compared with uniprot database.

Format: TIFF Size: 513KB Download file

Open Data

Additional file 2 :

List of theVitisputative proteins’ functional categories and correspondence with other catalogs.Vitis Functional Category: full name of each functional category. Vitis Functional Category Code: numbered nomenclature of the Vitis Functional Category. Level: hierarchized level of description of the functional category. GO Name: full name of each GO description. GO ID: numbered nomenclature of the GO. VitisNet Network: Corresponding VitisNet network. MIPS Funcat Name: full name of each MIPS functional categories. MIPS Funcat: numbered nomenclature of the MIPS functional category. Number of genes: number of genes belonging to the category.

Format: XLSX Size: 6.9MB Download file

Open Data

Additional file 3 :

List of networks available in VitisNet.VVID: VitisNet identification number; gen: number of genes in network; pro: number of proteins in network; met: number of metabolites in network. New networks are italicized.

Format: XLSX Size: 302KB Download file

Open Data

Additional file 4 :

Analyses workflow for determining cardinality between 8X and 12Xv1 assembly genes. Straight line: representation of the genes from the 12Xv1 assembly. Wavy line: representation of the genes from the 8X assembly. Dotted line: genetic sequence.

Format: DOCX Size: 43KB Download file

Open Data