Email updates

Keep up to date with the latest news and content from BMC Plant Biology and BioMed Central.

Open Access Research article

Comparative expression profiling in grape (Vitis vinifera) berries derived from frequency analysis of ESTs and MPSS signatures

Alberto Iandolino13, Kan Nobuta2, Francisco Goes da Silva1, Douglas R Cook1 and Blake C Meyers2*

Author Affiliations

1 Department of Plant Pathology and College of Agricultural and Environmental Sciences Genomics Facility, University of California, One Shields Avenue, Davis, CA 95616, USA

2 Department of Plant and Soil Sciences & Delaware Biotechnology Institute, University of Delaware, Newark, Delaware 19711, USA

3 Monsanto, 1920 5th Street, Davis, 95616, California, USA

For all author emails, please log on.

BMC Plant Biology 2008, 8:53  doi:10.1186/1471-2229-8-53


The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2229/8/53


Received:26 June 2007
Accepted:12 May 2008
Published:12 May 2008

© 2008 Iandolino et al; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

Vitis vinifera (V. vinifera) is the primary grape species cultivated for wine production, with an industry valued annually in the billions of dollars worldwide. In order to sustain and increase grape production, it is necessary to understand the genetic makeup of grape species. Here we performed mRNA profiling using Massively Parallel Signature Sequencing (MPSS) and combined it with available Expressed Sequence Tag (EST) data. These tag-based technologies, which do not require a priori knowledge of genomic sequence, are well-suited for transcriptional profiling. The sequence depth of MPSS allowed us to capture and quantify almost all the transcripts at a specific stage in the development of the grape berry.

Results

The number and relative abundance of transcripts from stage II grape berries was defined using Massively Parallel Signature Sequencing (MPSS). A total of 2,635,293 17-base and 2,259,286 20-base signatures were obtained, representing at least 30,737 and 26,878 distinct sequences. The average normalized abundance per signature was ~49 TPM (Transcripts Per Million). Comparisons of the MPSS signatures with available Vitis species' ESTs and a unigene set demonstrated that 6,430 distinct contigs and 2,190 singletons have a perfect match to at least one MPSS signature. Among the matched sequences, ESTs were identified from tissues other than berries or from berries at different developmental stages. Additional MPSS signatures not matching to known grape ESTs can extend our knowledge of the V. vinifera transcriptome, particularly when these data are used to assist in annotation of whole genome sequences from Vitis vinifera.

Conclusion

The MPSS data presented here not only achieved a higher level of saturation than previous EST based analyses, but in doing so, expand the known set of transcripts of grape berries during the unique stage in development that immediately precedes the onset of ripening. The MPSS dataset also revealed evidence of antisense expression not previously reported in grapes but comparable to that reported in other plant species. Finally, we developed a novel web-based, public resource for utilization of the grape MPSS data [1].

Background

Grapes species (Vitis spp.) represent the most widely cultivated and economically important fruit crop in the world [2]. The use of grape berries includes the production of juice, fresh and dried fruit, and distilled liquor, although wine produced from cultivars of V. vinifera has the highest economic value of grape products. Grapevine berries are non-climacteric fruits with a characteristic double sigmoid growth curve. The initial phase of exponential berry growth (stage I) is followed by a lag phase (stage II), with growth resuming after the onset of ripening or "veraison" (stage III). Berry development is characterized by changes in numerous biological processes, including cell division and enlargement, primary and secondary metabolism, and resistance or susceptibility to abiotic or biotic stresses [3,4]. The importance of this plant species to agriculture has made the development of genomic resources a high priority. Among these resources, transcriptional profiling of important grape tissues is a practical option that may reveal transcriptional complexity and changes in this dynamic developmental system.

Massively parallel signature sequencing technology (MPSS) [5,6] is a sequence-based method for measuring gene expression. The depth of sampling provided by MPSS can identify a nearly complete inventory of transcripts in a given sample. The method is based on a unique process for parallel sequencing, which starts with the cloning of a cDNA library on 5 μm diameter microbeads; one transcript from the original RNA sample is represented on a single bead [5]. From each bead, a sequence of the 'signature' of 17 or more nucleotides is obtained by successive round of sequencing reactions [5-7]. These signatures are derived from and include the most 3' occurrence of a specific restriction enzyme site in a transcript (most often DpnII, producing signatures that start with GATC) [5,6]. The output of the method is conceptually similar to a possibly more familiar method called Serial Analysis of Gene Expression (SAGE) [8]. However, the MPSS technology permits the simultaneous sequencing of millions of signatures from a given library [5]. By matching these signatures to the genome to identify specific genes, the abundance of each signature represents and measures the gene expression levels in the sample tissue. Among several published applications of this technology, we have previously conducted comprehensive transcriptional analyses of the reference plant species Arabidopsis thaliana and rice [7,9]. While MPSS, SAGE, and expressed sequence tags (ESTs) are all sequence-based technologies for transcriptional profiling, MPSS provides more thorough qualitative and quantitative description of gene expression due to its tremendous depth. While novel sequencing technologies, such as sequence-by-synthesis (SBS) and 454, offer deeper sequencing and longer read lengths, none have yet demonstrated consistently better results than MPSS for mRNA profiling [10].

In this report, we have measured gene expression in developing grape berries using MPSS, compared this expression profile with that provided by the current Vitis Unigene set [4], and we developed a novel web-based resource for utilization of the grape MPSS data. As a result of this analysis, we were able to annotate thousands of signatures matching predicted genes, quantify the expression level of these genes in the developing berries, compare the expression profiles derived from ESTs and MPSS signature frequencies, and expand the coverage of known transcripts in an important grapevine organ at a specific developmental stage. Because these data are based on sequences, they comprise a resource that will be useful for the annotation of any grape genomic sequence produced in the future.

Results

Analysis of the V. vinifera berry MPSS dataset and signature annotation

An MPSS library was constructed using RNA extracted from stage II berries (green, hard) that were sampled from field-grown V. vinifera cv. Cabernet Sauvignon. After cloning of the cDNA library onto beads, 17-base and 20-base signatures were generated by MPSS sequencing [5,6]. We note that these are not independent samples, in that 20-base signatures are obtained by extending previously recorded 17-base signatures by three nucleotides; due to a low failure rate at each additional base of sequencing, the raw count of sequences is lower for the 20-base data. A total of 2,635,293 17-base and 2,259,286 20-base signatures were produced that corresponded to 30,737 and 26,878 distinct sequences, respectively (Table 1A–C). This represents a discovery rate or average raw abundance value of approximately one distinctive sequence for every ~49 sequenced cDNA tags.

Table 1. Summary statistics of raw 17- and 20-base MPSS signatures from grape berries.

Initially, to link the MPSS signatures to predicted gene annotations, all sites ("GATC") that could potentially produce an MPSS signature were identified from the available Vitis Unigene dataset in public databases. This comprised 14,658 contigs (1,307 from non-vinifera Vitis species) and 14,931 singletons (1,080 from non-vinifera Vitis species). All potential signatures starting with the GATC anchor sequence were extracted from both sense and antisense directions of the grape sequences. A total of 84,834 and 48,490 distinct 17-base potential signatures were identified, respectively, in contigs and singletons of this version of the Vitis cDNA data. When both datasets were combined, the total number of unique genomic signatures equaled 123,563. The total number of in silico-extracted distinct MPSS signatures is approximately six-fold lower than the 753,894 distinct "genomic" MPSS signatures reported for the completed Arabidopsis sequence [11], reflecting the incomplete nature of the grape EST dataset and the lack of intergenic and intron sequences.

Observed MPSS signatures were classified based on the output of "reliability" and "significance" filters [11]. The purpose of these filters is to separate high quality data, which is represented by signatures encountered above specified frequency thresholds, from background signal generated by very low abundance MPSS signatures. As with other MPSS datasets, the grape library was generated from four sequencing runs representing two sequencing frames [11]. There were two runs for each of the "two-step" and "four-step" sequencing frames. The reliability filter asks whether a signature is present in more than one sequencing run (of the four total runs); signatures observed in more than one run are considered "reliable". The significance filter identifies as "significant" only those signatures with a normalized abundance greater than three transcripts per million (TPM). The classifications of 17- and 20-base expressed signatures in terms of reliability and significance are shown in Tables 1A–C and 2; 96.8% of all MPSS signatures corresponded to the "reliable" and "significant" category, consistent with an extremely low abundance for signatures not passing the filters. This value is similar to the 97.5% reported for the Arabidopsis MPSS dataset [11]. Among MPSS signatures with exact sequence matches to EST contigs (Table 2A–B) and singletons (Table 2C–D), unique "reliable" and "significant" signatures represented the largest category (more than 60% of the unique signatures).

Table 2. Distinct MPSS signatures matching EST contigs or singletons classified based on "reliability" and "significance" filters.

Expressed signatures were mapped to grape EST contigs and singletons based on exact matches to the in silico extracted "potential signatures" (see above). A total of 5,794 and 5,407 contigs were matched by expressed reliable and significant 17-base and 20-base MPSS signatures, respectively (see Additional file 12). This represented, on average, more than 40% of all known Vitis sp. genes. On the other hand, only 14% of singletons in the Vitis sp. EST set were matched by MPSS signatures (Table 2C and 2D). The vast majority of the unmatched Vitis sp sequences had in silico potential signatures that were not detected in the MPSS data. It is possible that the corresponding genes were not expressed in this sample; alternatively, unmatched contig and singleton EST sequences may represent 5' reads of cDNA clones, and thus fail to represent 3' regions where the majority of MPSS signatures originate. The disproportionate representation of singleton ESTs among the unmatched set is consistent with this later interpretation, because singleton ESTs in the Vitis dataset are more often the product of 5' sequencing reactions.

Additional file 1. Filtered MPSS signatures matching to grape EST contigs. Table A: 17-mer signatures. Table B: 20-mer signatures

Format: DOC Size: KB Download file

This file can be viewed with: Microsoft Word ViewerOpen Data

Additional file 2. Number of MPSS signatures matching to contigs and singletons. All the unique signatures (both 17 and 20-mer) were categorized into the following eight categories: Reliable (R), not Reliable (nR), Significant (S), not Significant (nS), Reliable and Significant (RS), Reliable but not Significant (RnS), not Reliable but significant (nRS), and not Reliable and not Significant (nRnS). The number and the frequency of the signatures in each category were identified in both sense and antisense orientation. Panel A: 17-mer MPSS signatures matched to EST contigs. Panel B: 20-mer MPSS signatures matched to EST contigs. Panel C: 17-mer MPSS signatures matched to EST singletons. Panel D: 20-mer MPSS signatures matched to EST singletons.

Format: XLS Size: KB Download file

This file can be viewed with: Microsoft Excel ViewerOpen Data

Most signatures matched a single contig or singleton, while ~40% matched two or more [see Additional file 12]. In excess of 70% of matched contigs and singletons showed a one-to-one assignment to a reliable and significant MPSS signature (Figure 1) [see Additional file 3]. The remaining sequences had one-to-many assignments of up to a maximum of 16 different signatures to a single contig [see Additional file 4]. Sequences of 17–20 bp are rarely duplicated by chance in unrelated genes [7] [see Additional file 5]. Instead, biological factors involving gene duplication or transcript processing may complicate the unambiguous assignment of signatures to transcripts. Thus, gene family members with high sequence similarity are likely to yield distinct transcripts containing the same signature, while the use of multiple polyadenylation sites or alternative splice site selection can yield multiple signatures from the same transcription unit. To estimate the frequency of alternative termination, a subset of 5,145 contigs was properly aligned in their 5' to 3' orientation. From this subset, 975 contigs matched by at least two MPSS signatures were identified. The abundance counts of 17-nucleotide significant and reliable MPSS signatures were transformed to relative frequency values and the location of each signature was plotted along the 3'-to-5' axis for each of the 975 contigs (Figure 2). The signature frequency per contig decreased exponentially from the 3'-to-5' direction. On average, ~70% of all signatures originate from the 3' most GATC site, while only ~29% and ~14% of signatures originate from the second and third 3' most positions (further 5'), respectively. Therefore, most of the transcripts matched by MPSS are the product of polyadenylation at the most distal of all recorded 3' sites. It is possible, however, that the MPSS signatures that did not match ESTs (contigs or singletons) are derived from longer 3' ends for which transcript sequence was not available.

thumbnailFigure 1. Frequency distribution of grape ESTs matched by filtered MPSS signatures. Reliable and significant MPSS signatures were matched to EST contigs and EST singletons. Up to 16 and 10 MPSS signatures matched to one EST contig and singleton, respectively. The proportion of the number of MPSS signatures matching to (A) EST contigs and (B) EST singletons are represented by the bar graph.

Additional file 3. Iandolino. Frequency distribution of grape ESTs matched by MPSS signatures. The tables in this file show the frequency of MPSS signatures matching to ESTs. The frequency ranges from one to 16 for EST contigs (panel A and B) and 1 to 10 for EST singletons (panel C and D). Data in each table are categorized based on the filters we used to sort MPSS signatures: RS, reliable and significant; RnS, reliable but non-significant; nRS, non-reliable but significant; nRnS, non-reliable and non-significant. Panel A: 17-mer MPSS signatures matched to EST contigs. Panel B: 20-mer MPSS signatures matched to EST contigs. Panel C: 17-mer MPSS signatures matched to EST singletons. Panel D: 20-mer MPSS signatures matched to EST singletons.

Format: PDF Size: 72KB Download file

This file can be viewed with: Adobe Acrobat ReaderOpen Data

Additional file 4. Example of a grape EST contigs matched by multiple MPSS signatures. One of the EST contigs with 3 MPSS signatures matches is shown. This contig (CTG1027770) has similarity to "putative transcription factor BTF3-like mRNA". Panel A shows all the MPSS signatures identified in this contig, with the abundance level and its coordinate on the contig. Panel B displays the sequence of this contig and all the sense MPSS signatures from panel A are indicated in blue. Uppercase letters indicate the predicted ORF, while lowercase letters indicate the predicted UTRs. The position of the most abundant signature (#2) is consistent with the most-3' DpnII site, the position measured by MPSS. Other signatures may result from signatures resulting from other transcripts, alternative polyadenylation or incomplete digestion during the construction of the MPSS library.

Format: PDF Size: 51KB Download file

This file can be viewed with: Adobe Acrobat ReaderOpen Data

Additional file 5. Occurrence of identical MPSS signatures in related and unrelated contigs. The example of MPSS signatures with multiple hits to EST contigs is shown. In this particular example, the MPSS signature "GATCAAGACTGATGAAA" (displayed in red) was identified in three EST contigs where two of them have the same annotation and the third is different. The most closely related Arabidopsis homolog along with its BLAST expected value is list at the beginning of each coding sequence.

Format: PDF Size: KB Download file

This file can be viewed with: Adobe Acrobat ReaderOpen Data

thumbnailFigure 2. Frequency of reliable and significant 17-mer MPSS signatures in a subset of 5'-to-3' oriented contigs. Signatures were mapped based on their location relative to the 3' end of the EST contigs. Most signatures were found at the 3'-most DpnII site, indicated as position #1 on the x-axis. However, expressed MPSS signatures were found as far 5' as eighth DpnII site from the 3' end of the contig.

Analysis of sense-antisense expression

Approximately 15% and 11% of the EST contigs and singletons, respectively, were matched by MPSS signatures in both sense and antisense orientations (Tables 3A–B). The MPSS signature frequencies were much higher on the sense strand for some sequences, while other sequences had higher MPSS abundances on the antisense strand [see Additional file 6]. Contigs matched in both orientations represented ~12% of the known berry transcriptome (of a total of 7,828 including contigs derived from EST sequenced and cloned from cDNA libraries other than green stage II), with the 2,891 MPSS signatures matching these contigs representing ~52% of the total MPSS abundance. It is possible that the sense-antisense transcript pairs are an important transcriptional feature which could provide a mechanism for post-transcriptional gene silencing [12] during this dynamic phase of berry development. Functional categorization of these contigs showed no particular overrepresented category (Figure 3). Moreover, none of these contigs had significant identifiable tBLASTx hits in both reading frame orientations, suggesting protein coding is a property of only one strand. It is possible that anti-sense transcripts could result from overlapping 3'UTRs of adjacent genes, or from transcription of an overlapping non-coding RNA.

Table 3. Matched and un-matched Vitis EST contig and singletons.

thumbnailFigure 3. Functional categorization of transcripts with both sense and anti-sense MPSS signatures. EST contigs, which have both sense and anti-sense MPSS signatures, were categorized based on GO (Gene Ontology) annotation and the proportion of each category is displayed in pie-chart: (A) Cellular component, (B) Molecular function, and (C) Biological process.

Additional file 6. EST contigs with expressed sense and antisense MPSS signatures. All the EST contigs with MPSS signatures matching to both sense and antisense orientation are displayed. Each contig was BLASTed against Arabidopsis annotation version 5 (TIGR5) and the potential function of the contigs was listed under "blastdef" along with the gene ID ("ginumber") and the BLAST expected value ("evalue"). The contigs originated from two different EST corrections (Stage II berry GH and GS) that derived from various Vitis species. The EST ID numbers for GH and GS, as well as the species name, are listed under "Berry SII-GH", "Berry SII-GS", and "SPECIES".

Format: XLS Size: KB Download file

This file can be viewed with: Microsoft Excel ViewerOpen Data

Expression profiles determined by EST and MPSS abundances

To quantify gene expression levels, we used the relative abundance of the 7,686 reliable and significant 17-base MPSS signatures from the stage II berry library. These signatures represent the most robust subset of the MPSS expression data. Although the remaining 1,734 reliable but not significant signatures were not considered in this analysis, prior analysis suggests that these signatures are likely to represent genuine transcripts expressed at very low levels [11]. The transcripts represented by these signatures may be expressed at higher levels in different specific cells or tissue layers that were not sampled.

The MPSS sequences provide an inventory of the transcript population in a given organ or tissue that can be sorted based on abundance. This data is particularly powerful when aligned with EST data from related tissues, as it allows sorting based on abundance and predicted gene function. The MPSS-matched set of 5,791 grape EST contigs are derived from a series of cDNA libraries that survey several stages of plant development, as well as responses to biotic and abiotic stress [4]. Off these, 4,753 contigs contained ESTs derived from one or more grape berry tissues, while 1,038 contigs were composed of ESTs from other grape tissues but not from berries (Table 4A). A total of 1,242 EST contigs matched by MPSS signatures were from ESTs found in only a single grape tissue; of these, 555 corresponded to berry-specific EST contigs. The remaining contigs were exclusively derived from leaves, flowers, petioles, stems, buds and even roots. The remaining 4,548 cDNA contigs and sequences were detected in two or more grape organs (Table 4A). Only three MPSS-matched EST contigs were found in all seven of the grape cDNA libraries. In a similar analysis of the EST singletons, the vast majority corresponded to transcripts previously observed exclusively in berry cDNA libraries, but only 207 were stage II berries (Table 4B). Among the contigs and singletons not previously associated with berry libraries were those derived from flower and leaf cDNA libraries. MPSS signatures provided valuable information to confirm the presence and relative transcriptional levels of transcripts. Many of these transcripts may have been previously mistakenly identified as tissue-specific based on EST data only because EST sequencing was not deep enough to detect these low abundance transcripts in different tissues. The MPSS data demonstrate that the inventory of genes in a given tissue is complex and there may be substantially more overlap in diverse tissues than previously characterized, and this can be identified only by sequencing ESTs at a very deep level.

Table 4. Grape ESTs derived from distinct tissue types matched by MPSS signatures (only from Vitis vinifera).

One advantage of tag-based transcriptional profiling technologies such as ESTs, SAGE and MPSS is that the targets are not preselected prior to analysis. While the discovery rate of new transcripts using ESTs-based approaches is limited by the extent of sequencing effort and redundancy within a given cDNA library, unmatched or low abundance MPSS signatures could be used as primers for PCR based methods to expand the current set of known genes for Vitis [13]. There were 18,631 distinct 17-base MPSS signatures that did not match known grape EST sequences, of which 5,900 were both significant and reliable; these are most likely to represent novel genes not previously identified as transcribed or transcriptional variants. We tested this hypothesis by using available sequence of the grape genome, composed of 57,662 contigs containing 487,125,096 base pairs [14]. In total, 20,661 17-mer and 17,867 20-mer distinct MPSS signatures matched to genome contig sequences. Among these, there were 9,125 and 7,771 distinct 17-mer and 20-mer MPSS signatures that matched only genomic contigs and not ESTs. Taking the 17-mer signatures as the benchmark, the MPSS data reveal 44% more transcript diversity than recorded in the existing public EST resource.

In silico expression profiles resulting from EST (Table 5) and MPSS signature frequencies (Table 6) showed both differences and commonalities in the relative abundance of the top-ranked genes. For example, a common feature of both datasets is the relative high abundance of several chitinases, metallothionein-like and storage proteins, as well as a putative transcription factor and an elongation factor 1-α. On the other hand, two hexameric polyubiquitins and a plasma membrane aquaporin were among the top ranked genes based on MPSS signatures but not based on EST counts, and the opposite was true (present among top ESTs, not among MPSS signatures) for a non-specific lipid transfer protein A. A similar pattern emerges from the analysis of singleton ESTs that matched abundant MPSS signatures (Table 7). Among such singleton ESTs, there were transcripts related to cell wall modification (xyloglucan-specific fungal endoglucanase inhibitor protein and an extensin-like protein), abiotic/biotic stress factors (catalase and hydroperoxide oxidase), a eukaryotic translation initiation factor and several poorly annotated transcripts.

Table 5. Most highly expressed grape EST contigs in the grape berry stage II libraries, based on MPSS signature abundance.

Table 6. Most highly expressed grape EST contigs in the grape berry stage II libraries based on EST frequency.

Table 7. Top 20 grape EST singletons based on MPSS signature abundance.

Significant differences were observed in the relative abundance of contigs from EST or MPSS signature counts. While a total of 195 contigs accounted for approximately 50% of the ESTs sequenced from the two berry SII libraries, only 10 contigs matched an identical proportion of the filtered MPSS signatures. The top 20 contigs ranked based on MPSS frequency accounted for 410,925 (56.7% of all sequences matching to EST contigs), suggesting a steeper curve and perhaps lower level of diversity in MPSS data. In contrast, the 20 most frequent contigs based on EST counts represented only 29.4% of the total EST for these two libraries.

As might be expected, MPSS signatures sequenced from V. vinifera berries stage II also matched several non-vinifera EST singletons and contigs in the Vitis Unigene set. Although the transcriptome of the non-vinifera species has been minimally characterized, a comparison of the top-ranked transcripts based on MPSS signature frequency (Tables 8 and 9) showed remarkable similarities between the different species.

Table 8. Most highly expressed grape EST contigs from non-vinifera libraries based on MPSS signature abundance.

Table 9. Most highly expressed grape EST contigs from non-vinifera libraries based on MPSS signature abundance.

A website for access to the grape MPSS data

To facilitate public access and utilization of the MPSS data, we developed a database and web-based interface [15]. The database and interface is a customized version of a previously described website [16]. Unlike the Arabidopsis or rice MPSS sites which utilize the complete genomic sequence of these species, our grape database focuses on EST contigs. This required the development of specialized tools and methods. For example, the incomplete nature of ESTs required a BLAST tool that would allow the user to identify the closest grape sequence to their gene of interest. The MPSS data can be accessed by entering the grape contig identifier or EST code, the MPSS signature sequence, the grape sequence of interest, or a list of contig identifiers. The data on transcriptional activity that this website provides may be used as the starting point for analyses of individual genes or gene families in grape.

Discussion

We have explored expression patterns at a specific stage in grape berry development by comparing and combining two tag-based methods: ESTs and MPSS. Both approaches described similar patterns of transcripts abundances, although there were some clear differences perhaps associated with the methods themselves. In principle, due to deeper sequencing, the MPSS data should provide a more thorough and quantitative representation of the absolute transcript population in terms of representation and relative abundance than that from ESTs [7,11]. This is particularly true when the number of cDNA clones sequenced from any given library is low or for genes expressed at only low levels in the sampled tissues. For the EST frequency to represent the absolute transcript frequency, sequencing efforts must be large and sampling must be unbiased. The goal of achieving saturation for libraries constructed from a specific tissue may be overcome by combining library information available in public domain databases, if those resources are large enough. However, the different protocols used for library construction and EST sequencing, the lack of complete control of growing conditions, genotype and even standardized guidelines to describe a particular stage in development, makes it difficult to achieve unbiased sampling. On the other hand, MPSS analysis is also subject to bias. For example, some highly transcribed genes (based on EST frequency analysis) were unmatched by any MPSS signatures, possibly due to either the lack of a GATC site in the sequence or a technological artifact. The lack of suitable DpnII sites in some Arabidopsis transcripts is one source of negative results in MPSS transcriptional profiles compared against other high-throughput technologies [17]. In addition, MPSS substantially underestimates expression for signatures either containing the recognition site for the Type IIS restriction endonuclease BbvI (used in MPSS sequencing), or signatures containing certain four-nucleotide words in the sequencing frames [11]. The formerly high cost of tag-based methods limited biological replication as part of the experimental approach; such data would be highly desirable to determine the degree of biological variation and technical noise derived from these technologies [7]. This may be more achievable with the next generation of technologies as costs are reduced. The combined application of multiple approaches for transcriptional profiling is likely to provide the most robust determination of transcript levels.

In the grape MPSS dataset, when multiple signatures matched to one contig, these usually varied significantly in abundance. However, these data were consistent with the most abundant MPSS signature derived from the predominant form of the transcript among the ESTs [1]. An assessment of alternative transcript polyadenylation based on MPSS in diverse tissues and treatments could provide insight into this mechanism of gene regulation by identifying differentially terminated transcripts. The annotation and analysis of signatures matching multiple contigs is a more difficult task, but validation of these data could be performed by using microarrays with specifically designed probes to determine the relative expression of all matched genes, or by repeating the MPSS experiment using a different "anchoring enzyme" such as NlaIII (CATG) instead of DpnII (GATC).

The occurrence of genome-wide duplications may drive genome diversification and speciation in the plant kingdom [18]. Gene- and organ-specific silencing and unequal expression levels have been reported in upland cotton for homeologous genes resulting from whole genome polyploidization [19-21] and a similar phenomenon may be the cause of yellow-seeded commercial soybean cultivars [22]. The extent to which duplication-associated changes in gene expression may be playing a role in grapevine phenotypes is largely unknown. Due to the ancestral polyploid nature of the grape genome [23-25], duplication events leading to interactions or silencing among homeologous genes may have occurred. Evidence of extensive antisense expression was identified by comparing the ESTs and MPSS transcriptional profiling data. Initial whole transcriptome analysis in mammalian systems indicated that up to 20% of all transcripts formed sense-antisense (S/AS) pairs [26-31]. Recent analysis derived from a large scale mouse cDNA sequencing project [32] and a high resolution transcriptional map of human chromosomes [33] revealed that S/AS pairs exists for up to 72% and 50% of all mouse and human transcripts, respectively. S/AS frequencies observed in the berry transcriptome are similar to those reported in Arabidopsis, where approximately 22% of all known genes have tissue specific natural antisense transcript pairs [7]. Considering the unequal contribution of different genes and regions in the genome to the formation of S/AS pairs [32], whole transcriptome analysis would certainly provide a more accurate description of the extent of the phenomena in grapes than the one determined with a limited coverage of the transcriptome in this study.

Two distinct sources of native antisense expression have been identified: cis- and trans-encoded antisense [27-29]. The former correspond to transcripts derived from the opposite strand in the same genetic locus as the sense RNA. Cis-encoded antisense transcripts tend to have complete overlap with the sense strand forming long perfect match RNA duplexes [28]. Approximately 50% of sense-antisense pair categories in humans fell within this category [29]. Trans-encoded antisense transcripts derive from alternative loci and tend to have partial overlap with the sense strand of the original locus [27,28]. The function of endogenous populations of dsRNA or small RNAs in grape remain to be elucidated with more detailed experiments, and this is best performed using short-read sequencing methods [34].

Tag-based transcriptional profiling approaches provide unique advantages for the discovery of novel expressed sequences. MPSS signatures derived from a specific stage in berry developmental revealed the existence of potentially 6,345 novel transcripts in grapes. These transcripts could be more fully identified to expand the set of known and experimentally verified Vitis genes either by PCR-based approaches [13], or ultimately aligning the signatures with grape genomic sequence. In the absence of full genome sequence information, PCR-based approaches may become particularly important for transcripts that are difficult to identify by means of EST-based approaches due to their low copy number or technical limitations of RNA-dependent cDNA synthesis. Whole genome sequencing of the V. vinifera genome, combined with data-rich tag-based (ESTs and MPSS signature frequencies) and microarray-based transcriptional data will greatly contribute to our understanding of the complex relationships between genome organization, transcriptional activity, and phenotypes. Because automated genome annotation systems are both error-prone and greatly improved with the incorporation of experimental data, the EST and MPSS data will prove invaluable in the coming years for gene discovery and the annotation of genomic sequences.

Conclusion

We have performed a complete transcriptional analysis of V. vinifera berries in transition to the ripening stage using MPSS combined with EST data. Approximately 30,000 distinct signatures, each representing a distinct transcript, were identified from the MPSS data and the signatures were mapped onto EST sequences. The number of MPSS signatures matching to one EST ranged from one to 16 and suggests the existence of numerous alternative transcripts in V. vinifera. In addition, a large set of MPSS signatures that matched to the anti-sense orientation ESTs was identified. Although the existence of antisense transcripts has been reported in many plant species, this is the first data to suggest the existence of antisense transcripts in V. vinifera. In addition to the signatures with EST matches, large numbers of MPSS signatures which do not match to ESTs were identified. While a small proportion could be due to sequencing errors, we believe the majority of these were mainly due to the low depth of sequence coverage in the current EST dataset; support for this interpretation derives from the fact that the proportion of signatures matching V. vinifera sequences was nearly doubled by incorporation of whole genome sequence data. High capacity, short read sequencing technologies, in particular next generation gigabase methods, have potential to contribute an important element to ongoing annotation of the genome sequence of V. vinifera. The grape MPSS data is accessible from University of Delaware MPSS website [1] and the EST data sets are available through UCDavis College of Agricultural and Environmental Sciences Genomics Facility (CGF) website [35].

Methods

Plant material and sampling procedures

The cDNA used for MPSS sequencing was constructed from stage II berries (green hard) sampled from field-grown V. vinifera cv. Cabernet Sauvignon, clone 8 vines located in the Tyree Teaching Vineyard, UC Davis, CA. Berries were sampled from multiple clusters and from different positions in individual clusters in order to ensure a representative sample. A sub-sample of berries at this stage was used to generate a cDNA library and expressed sequence tags (ESTs), as reported previously [4]. For additional details on sample handling and storage, see Goes da Silva et al., 2005.

MPSS data generation and analysis

All MPSS was performed essentially as described previously [5,6], with the library produced and sequenced at Illumina, Inc. (formerly Solexa, Inc.; Hayward, CA). The raw and normalized MPSS data are available at University of Delaware MPSS website [1]. We compared MPSS signatures to the V. vinifera ESTs available at UC Davis CGF website [35] and assigned signatures to each sequence for which a perfect match was identified. The number of matches of a signature to the EST dataset was recorded as the "hits" for each signature. We merged the sequencing runs and calculate a single normalized abundance as reported earlier [11]. Contig orientation in the 5'-to-3' direction was performed using batch BLASTX search and the analysis of subject indexes of the first EST and last EST for each contig. Data analysis was conducted in MS Excel (Microsoft, Seattle, WA) and SAS V.8 statistical package (The SAS Institute, Cary, NC), or in a customized MySQL database [16] and figures in SigmaPlot version 8.0 (Systat Software Inc., San Jose, CA).

Authors' contributions

AI performed research and analyzed data; KN performed computational research; FGdS analyzed data; DRC and BCM designed the experiments. All of the authors participated in the writing of the manuscript.

Acknowledgements

We thank Huizhuan Wu and Mayumi Nakano for their work on the grape MPSS web interface, and Richi Gupta, Anna Leslie and Brian Chan for bioinformatics assistance. This work was supported by research grants from the NSF Plant Genome Research Program (awards #0110528 and #0321437 to B.C.M.), the USDA-ARS (SCA 58-5302-2-788 to D.R.C.), and the California Department of Food and Agriculture (Contract 02-0150 to D.R.C).

References

  1. Grape MPSS Database [http://mpss.udel.edu/grape/] webcite

  2. Vivier MA, Pretorius IS: Genetically tailored grapevines for the wine industry.

    Trends Biotechnol 2002, 20(11):472. PubMed Abstract | Publisher Full Text OpenURL

  3. Ollat N, Diakou-Verdin P, Carde JP, Barrieu F, Gaudillére JP, Moing A: Grape berry development: a review.

    Journal International des Sciences de la Vigne et du Vin 2002, 36:109-131. OpenURL

  4. Goes da Silva F, Iandolino A, Al-Kayal F, Bohlmann MC, Cushman MA, Lim H, Ergul A, Figueroa R, Kabuloglu EK, Osborne C, Rowe J, Tattersall E, Leslie A, Xu J, Baek J, Cramer GR, Cushman JC, Cook DR: Characterizing the Grape Transcriptome. Analysis of Expressed Sequence Tags from Multiple Vitis Species and Development of a Compendium of Gene Expression during Berry Development.

    Plant Physiol 2005, 139(2):574-597. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  5. Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo S, McCurdy S, Foy M, Ewan M, Roth R, George D, Eletr S, Albrecht G, Vermaas E, Williams SR, Moon K, Burcham T, Pallas M, DuBridge RB, Kirchner J, Fearon K, Mao J, Corcoran K: Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays.

    Nat Biotechnol 2000, 18(6):630-634. PubMed Abstract | Publisher Full Text OpenURL

  6. Brenner S, Williams SR, Vermaas EH, Storck T, Moon K, McCollum C, Mao JI, Luo S, Kirchner JJ, Eletr S, DuBridge RB, Burcham T, Albrecht G: In vitro cloning of complex mixtures of DNA on microbeads: physical separation of differentially expressed cDNAs.

    Proc Natl Acad Sci USA 2000, 97(4):1665-1670. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  7. Meyers BC, Vu TH, Tej SS, Ghazal H, Matvienko M, Agrawal V, Ning J, Haudenschild CD: Analysis of the transcriptional complexity of Arabidopsis thaliana by massively parallel signature sequencing.

    Nat Biotechnol 2004, 22(8):1006-1011. PubMed Abstract | Publisher Full Text OpenURL

  8. Velculescu VE, Zhang L, Vogelstein B, Kinzler KW: Serial Analysis of Gene Expression.

    Science 1995, 270(5235):484-487. PubMed Abstract | Publisher Full Text OpenURL

  9. Nobuta K, Venu RC, Lu C, Belo A, Vemaraju K, Kulkarni K, Wang W, Pillay M, Green PJ, Wang G, Meyers BC: An expression atlas of rice mRNAs and small RNAs.

    Nat Biotechnol 2007, 25(4):473. PubMed Abstract | Publisher Full Text OpenURL

  10. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer MLI, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, Lohman KL, Lu H, Makhijani VB, McDade KE, McKenna MP, Myers EW, Nickerson E, Nobile JR, Plant R, Puc BP, Ronan MT, Roth GT, Sarkis GJ, Simons JF, Simpson JW, Srinivasan M, Tartaro KR, Tomasz A, Vogt KA, Volkmer GA, Wang SH, Wang Y, Weiner MP, Yu P, Begley RF, Rothberg JM: Genome sequencing in microfabricated high-density picolitre reactors.

    Nature 2005, 437(7057):376. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  11. Meyers BC, Tej SS, Vu TH, Haudenschild CD, Agrawal V, Edberg SB, Ghazal H, Delcola S: The use of MPSS for whole-genome transcriptional analysis in Arabidopsis.

    Genome Res 2004, 14:1641-1653. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  12. Borsani O, Zhu J, Verslues PE, Sunkar R, Zhu JK: Endogenous siRNAs derived from a pair of natural cis-antisense transcripts regulate salt tolerance in Arabidopsis.

    Cell 2005, 123(7):1279-1291. PubMed Abstract | Publisher Full Text OpenURL

  13. Chen J, Sun M, Lee S, Zhou G, Rowley JD, Wang SM: Identifying novel transcripts and novel genes in the human genome by using novel SAGE tags.

    Proc Natl Acad Sci USA 2002, 99(19):12257-12262. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  14. Genoscope [http://www.cns.fr/externe/English/Projets/Projet_ML/organisme_ML.html] webcite

  15. Nakano M, Nobuta K, Vemaraju K, Tej SS, Skogen JW, Meyers BC: Plant MPSS databases: signature-based transcriptional resources for analyses of mRNA and small RNA.

    Nucleic Acids Res 2006, 34(Database issue):D731-5. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  16. Meyers BC, Lee DK, Vu TH, Tej SS, Edberg SB, Matvienko M, Tindell LD: Arabidopsis MPSS. An online resource for quantitative expression analysis.

    Plant Physiol 2004, 135(2):801-813. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  17. Coughlan SJ, Agrawal V, Meyers BC: A comparison of global gene expression measurement technologies in Arabidopsis thaliana.

    Comparative and Functional Genomics 2004, 5(3):245-252. Publisher Full Text OpenURL

  18. Cui L, Wall PK, Leebens-Mack JH, Lindsay BG, Soltis DE, Doyle JJ, Soltis PS, Carlson JE, Arumuganathan K, Barakat A, Albert VA, Ma H, dePamphilis CW: Widespread genome duplications throughout the history of flowering plants.

    Genome Res 2006, 16(6):738-749. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  19. Adams KL, Cronn R, Percifield R, Wendel JF: Genes duplicated by polyploidy show unequal contributions to the transcriptome and organ-specific reciprocal silencing.

    Proc Natl Acad Sci USA 2003, 100(8):4649-4654. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  20. Adams KL, Wendel JF: Allele-Specific, Bidirectional Silencing of an Alcohol Dehydrogenase Gene in Different Organs of Interspecific Diploid Cotton Hybrids.

    Genetics 2005, 171(4):2139-2142. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  21. Udall JA, Swanson JM, Nettleton D, Percifield RJ, Wendel JF: A Novel Approach for Characterizing Expression Levels of Genes Duplicated by Polyploidy.

    Genetics 2006, 173(3):1823-1827. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  22. Tuteja JH, Clough SJ, Chan WC, Vodkin LO: Tissue-Specific Gene Silencing Mediated by a Naturally Occurring Chalcone Synthase Gene Cluster in Glycine max.

    Plant Cell 2004, 16(4):819-835. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  23. The French-Italian Public Consortium for grapevine genome chracterization: The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. In Nature. Volume 449. Nature Publishing Group; 2007::463. PubMed Abstract | Publisher Full Text OpenURL

  24. Olmo HP: Grapes. In Evolution of crop plants. Edited by Simmon NW. London , Longman; 1976:294-298. OpenURL

  25. Soltis DE, Soltis PS, Bennett MD, Leitch IJ: Evolution of genome size in the angiosperms.

    Am J Bot 2003, 90(11):1596-1603. Publisher Full Text OpenURL

  26. Fahey ME, Moore TF, Higgins DG: Overlapping antisense transcription in the human genome.

    Comparative and Functional Genomics 2002, 3(3):244-253. Publisher Full Text OpenURL

  27. Chen J, Sun M, Kent WJ, Huang X, Xie H, Wang W, Zhou G, Shi RZ, Rowley JD: Over 20% of human transcripts might form sense-antisense pairs.

    Nucleic Acids Res 2004, 32(16):4812-4820. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  28. Shendure J, Church G: Computational discovery of sense-antisense transcription in the human and mouse genomes.

    Genome Biology 2002, 3(9):research0044. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  29. Rosok O, Sioud M: Systematic identification of sense-antisense transcripts in mammalian cells.

    Nat Biotechnol 2004, 22(1):104. PubMed Abstract | Publisher Full Text OpenURL

  30. Yelin R, Dahary D, Sorek R, Levanon EY, Goldstein O, Shoshan A, Diber A, Biton S, Tamir Y, Khosravi R, Nemzer S, Pinner E, Walach S, Bernstein J, Savitsky K, Rotman G: Widespread occurrence of antisense transcription in the human genome.

    Nat Biotechnol 2003, 21(4):379-386. PubMed Abstract | Publisher Full Text OpenURL

  31. Kiyosawa H, Yamanaka I, Osato N, Kondo S, Hayashizaki Y: Antisense Transcripts With FANTOM2 Clone Set and Their Implications for Gene Regulation.

    Genome Res 2003, 13(6b):1324-1334. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  32. Riken Genome Exploration Research Group and Genome Science Group and the FANTOM Consortium, Katayama S, Tomaru Y, Kasukawa T, Waki K, Nakanishi M, Nakamura M, Nishida H, Yap CC, Suzuki M, Kawai J, Suzuki H, Carninci P, Hayashizaki Y, Wells C, Frith M, Ravasi T, Pang KC, Hallinan J, Mattick J, Hume DA, Lipovich L, Batalov S, Engstrom PG, Mizuno Y, Faghihi MA, Sandelin A, Chalk AM, Mottagui-Tabar S, Liang Z, Lenhard B, Wahlestedt C: Antisense Transcription in the Mammalian Transcriptome.

    Science 2005, 309(5740):1564-1566. PubMed Abstract | Publisher Full Text OpenURL

  33. Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G, Sementchenko V, Piccolboni A, Bekiranov S, Bailey DK, Ganesh M, Ghosh S, Bell I, Gerhard DS, Gingeras TR: Transcriptional Maps of 10 Human Chromosomes at 5-Nucleotide Resolution.

    Science 2005, 308(5725):1149-1154. PubMed Abstract | Publisher Full Text OpenURL

  34. Meyers BC, Souret FF, Lu C, Green PJ: Sweating the small stuff: microRNA discovery in plants.

    Curr Opin Biotechnol 2006, 17(2):139. PubMed Abstract | Publisher Full Text OpenURL

  35. UC Davis College of Agricultural and Environmental Sciences Genomics Facility [http://cgf.ucdavis.edu] webcite