Comparative high-throughput transcriptome sequencing and development of SiESTa, the Silene EST annotation database
- Equal contributors
1 Institute of Integrative Biology (IBZ), ETH Zurich, Universitaetstrasse 16, Zürich, 8092, Switzerland
2 Laboratoire de Biométrie et Biologie Evolutive, CNRS UMR 5558, Université Lyon 1, Villeurbanne, F-69622 cedex, France
3 DTAMB/PRABI, IFR41, Université Lyon 1, Bâtiment Gregor Mendel, Villeurbanne, F-69622 cedex, France
BMC Genomics 2011, 12:376 doi:10.1186/1471-2164-12-376Published: 26 July 2011
The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database.
A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO) terms, and thousands of single-nucleotide polymorphisms (SNPs) were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49%) that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch webcite.
The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to further develop Silene as a plant model system. The genes characterized will be useful for future research not only in the species included in the present study, but also in related species for which no genomic resources are yet available. Our results demonstrate the efficiency of massively parallel transcriptome sequencing in a comparative framework as an approach for developing genomic resources in diverse groups of non-model organisms.