Identification and analysis of common bean (Phaseolus vulgaris L.) transcriptomes by massively parallel pyrosequencing
1 College of Agriculture & Related Sciences, Delaware State University, Dover, DE 19901, USA
2 Department of Plant & Soil Sciences and Delaware Biotechnology Institute, University of Delaware, Newark, DE 19711, USA
3 W. M. Keck Center for Comparative and Functional Genomics, University of Illinois, Urbana-Champaign, IL 61801, USA
4 Center for Integrated Biological and Environmental Research, Delaware State University, Dover, DE 19901, USA
BMC Plant Biology 2011, 11:135 doi:10.1186/1471-2229-11-135Published: 11 October 2011
Common bean (Phaseolus vulgaris) is the most important food legume in the world. Although this crop is very important to both the developed and developing world as a means of dietary protein supply, resources available in common bean are limited. Global transcriptome analysis is important to better understand gene expression, genetic variation, and gene structure annotation in addition to other important features. However, the number and description of common bean sequences are very limited, which greatly inhibits genome and transcriptome research. Here we used 454 pyrosequencing to obtain a substantial transcriptome dataset for common bean.
We obtained 1,692,972 reads with an average read length of 207 nucleotides (nt). These reads were assembled into 59,295 unigenes including 39,572 contigs and 19,723 singletons, in addition to 35,328 singletons less than 100 bp. Comparing the unigenes to common bean ESTs deposited in GenBank, we found that 53.40% or 31,664 of these unigenes had no matches to this dataset and can be considered as new common bean transcripts. Functional annotation of the unigenes carried out by Gene Ontology assignments from hits to Arabidopsis and soybean indicated coverage of a broad range of GO categories. The common bean unigenes were also compared to the bean bacterial artificial chromosome (BAC) end sequences, and a total of 21% of the unigenes (12,724) including 9,199 contigs and 3,256 singletons match to the 8,823 BAC-end sequences. In addition, a large number of simple sequence repeats (SSRs) and transcription factors were also identified in this study.
This work provides the first large scale identification of the common bean transcriptome derived by 454 pyrosequencing. This research has resulted in a 150% increase in the number of Phaseolus vulgaris ESTs. The dataset obtained through this analysis will provide a platform for functional genomics in common bean and related legumes and will aid in the development of molecular markers that can be used for tagging genes of interest. Additionally, these sequences will provide a means for better annotation of the on-going common bean whole genome sequencing.