Open Access Highly Accessed Research article

De novo assembly and characterization of the carrot transcriptome reveals novel genes, new markers, and genetic diversity

Massimo Iorizzo1, Douglas A Senalik12, Dariusz Grzebelus3, Megan Bowman1, Pablo F Cavagnaro4, Marta Matvienko67, Hamid Ashrafi5, Allen Van Deynze5 and Philipp W Simon12*

Author Affiliations

1 Department of Horticulture, University of Wisconsin, 1575 Linden Drive, Madison, WI 53706. USA

2 USDA-Agricultural Research Service, Vegetable Crops Research Unit, University of Wisconsin, 1575 Linden Drive, Madison, WI 53706, USA

3 Department of Genetics, Plant Breeding and Seed Science, Agricultural University of Krakow, Al. 29 Listopada 54, 31-425 Krakow, Poland

4 CONICET and INTA EEA La Consulta, CC8 La Consulta (5567), Mendoza, Argentina

5 Seed Biotechnology Center, University of California, 1 Shields Ave, Davis, CA, USA

6 Genome Center, University of California, 1 Shields Ave, Davis, CA, USA

7 Current address: Life Technologies, 850 Lincoln Center Circle, Foster City, CA, USA

For all author emails, please log on.

BMC Genomics 2011, 12:389  doi:10.1186/1471-2164-12-389

Published: 2 August 2011



Among next generation sequence technologies, platforms such as Illumina and SOLiD produce short reads but with higher coverage and lower cost per sequenced nucleotide than 454 or Sanger. A challenge now is to develop efficient strategies to use short-read length platforms for de novo assembly and marker development. The scope of this study was to develop a de novo assembly of carrot ESTs from multiple genotypes using the Illumina platform, and to identify polymorphisms.


A de novo assembly of transcriptome sequence from four genetic backgrounds produced 58,751 contigs and singletons. Over 50% of these assembled sequences were annotated allowing detection of transposable elements and new carrot anthocyanin genes. Presence of multiple genetic backgrounds in our assembly allowed the identification of 114 computationally polymorphic SSRs, and 20,058 SNPs at a depth of coverage of 20× or more. Polymorphisms were predominantly between inbred lines except for the cultivated x wild RIL pool which had high intra-sample polymorphism. About 90% and 88% of tested SSR and SNP primers amplified a product, of which 70% and 46%, respectively, were of the expected size. Out of verified SSR and SNP markers 84% and 82% were polymorphic. About 25% of SNPs genotyped were polymorphic in two diverse mapping populations.


This study confirmed the potential of short read platforms for de novo EST assembly and identification of genetic polymorphisms in carrot. In addition we produced the first large-scale transcriptome of carrot, a species lacking genomic resources.