Open Access Open Badges Methodology article

Improved linkage analysis of Quantitative Trait Loci using bulk segregants unveils a novel determinant of high ethanol tolerance in yeast

Jorge Duitama1, Aminael Sánchez-Rodríguez2, Annelies Goovaerts3, Sergio Pulido-Tamayo245, Georg Hubmann3, María R Foulquié-Moreno3, Johan M Thevelein3*, Kevin J Verstrepen1* and Kathleen Marchal245*

Author Affiliations

1 VIB Laboratory of Systems Biology & Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, Leuven B-3001, Belgium

2 Department of Microbial and Molecular Systems, Centre of Microbial and Plant Genetics, KU Leuven, Kasteelpark Arenberg 20, Leuven B-3001, Belgium

3 VIB Department of Molecular Microbiology & Laboratory of Molecular Cell Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, Leuven B-3001, Belgium

4 Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium

5 Department of Information Technology, Ghent University, IMinds, VIB, Gent 9052, Belgium

For all author emails, please log on.

BMC Genomics 2014, 15:207  doi:10.1186/1471-2164-15-207

Published: 19 March 2014



Bulk segregant analysis (BSA) coupled to high throughput sequencing is a powerful method to map genomic regions related with phenotypes of interest. It relies on crossing two parents, one inferior and one superior for a trait of interest. Segregants displaying the trait of the superior parent are pooled, the DNA extracted and sequenced. Genomic regions linked to the trait of interest are identified by searching the pool for overrepresented alleles that normally originate from the superior parent. BSA data analysis is non-trivial due to sequencing, alignment and screening errors.


To increase the power of the BSA technology and obtain a better distinction between spuriously and truly linked regions, we developed EXPLoRA (EXtraction of over-rePresented aLleles in BSA), an algorithm for BSA data analysis that explicitly models the dependency between neighboring marker sites by exploiting the properties of linkage disequilibrium through a Hidden Markov Model (HMM).

Reanalyzing a BSA dataset for high ethanol tolerance in yeast allowed reliably identifying QTLs linked to this phenotype that could not be identified with statistical significance in the original study. Experimental validation of one of the least pronounced linked regions, by identifying its causative gene VPS70, confirmed the potential of our method.


EXPLoRA has a performance at least as good as the state-of-the-art and it is robust even at low signal to noise ratio’s i.e. when the true linkage signal is diluted by sampling, screening errors or when few segregants are available.