Open Access Highly Accessed Software

ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data

Osvaldo Zagordi12*, Arnab Bhattacharya1, Nicholas Eriksson3 and Niko Beerenwinkel12

Author Affiliations

1 Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, 4058 Basel, Switzerland

2 SIB Swiss Institute of Bioinformatics, Switzerland

3 23andMe, Mountain View, CA 94043, USA

For all author emails, please log on.

BMC Bioinformatics 2011, 12:119  doi:10.1186/1471-2105-12-119

Published: 26 April 2011



With next-generation sequencing technologies, experiments that were considered prohibitive only a few years ago are now possible. However, while these technologies have the ability to produce enormous volumes of data, the sequence reads are prone to error. This poses fundamental hurdles when genetic diversity is investigated.


We developed ShoRAH, a computational method for quantifying genetic diversity in a mixed sample and for identifying the individual clones in the population, while accounting for sequencing errors. The software was run on simulated data and on real data obtained in wet lab experiments to assess its reliability.


ShoRAH is implemented in C++, Python, and Perl and has been tested under Linux and Mac OS X. Source code is available under the GNU General Public License at webcite.