Open Access Research article

Sequencing of a QTL-rich region of the Theobroma cacao genome using pooled BACs and the identification of trait specific candidate genes

Frank A Feltus12*, Christopher A Saski1, Keithanne Mockaitis3, Niina Haiminen4, Laxmi Parida4, Zachary Smith3, James Ford3, Margaret E Staton1, Stephen P Ficklin1, Barbara P Blackmon1, Chun-Huai Cheng1, Raymond J Schnell5, David N Kuhn5 and Juan-Carlos Motamayor56

Author Affiliations

1 Clemson University Genomics Institute, Clemson University, 51 New Cherry Street, Clemson, SC 29634, USA

2 Department of Genetics & Biochemistry, Clemson University, 51 New Cherry Street, Clemson, SC 29634, USA

3 Center for Genomics and Bioinformatics, Indiana University, 915 E. Third Street, Bloomington, IN 47405, USA

4 IBM T.J. Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, NY 10598, USA

5 Subtropical Horticulture Research Station, USDA-ARS, 13601 Old Cutler Road, Miami, FL 33158, USA

6 Mars Incorporated, 800 High Street, Hackettstown, NJ 07840, USA

For all author emails, please log on.

BMC Genomics 2011, 12:379  doi:10.1186/1471-2164-12-379

Published: 27 July 2011



BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library.


This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight.


Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed.

next-generation sequencing; QTL sequencing; fungal disease resistance; chocolate