Open Access Highly Accessed Research article

Improving mammalian genome scaffolding using large insert mate-pair next-generation sequencing

Sebastiaan van Heesch1, Wigard P Kloosterman2, Nico Lansu1, Frans-Paul Ruzius1, Elizabeth Levandowsky3, Clarence C Lee3, Shiguo Zhou4, Steve Goldstein4, David C Schwartz4, Timothy T Harkins3, Victor Guryev15* and Edwin Cuppen12*

Author Affiliations

1 Hubrecht Institute/KNAW and University Medical Center Utrecht, Uppsalalaan 8, Utrecht 3584 CT, The Netherlands

2 Department of Medical Genetics, UMC Utrecht, Universiteitsweg 100, Utrecht, 3584 GG, The Netherlands

3 Life Technologies Inc., Advanced Applications Group, 500 Cummings Center, Beverly, MA, 01915, USA

4 Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, UW-Biotechnology Center, University of Wisconsin-Madison, Madison, WI, 53706, USA

5 Present address: Laboratory of Genome Structure and Ageing, European Research Institute for the Biology of Ageing; RuG and UMC Groningen, Antonius Deusinglaan 1, Groningen 9713 AV, The Netherlands

For all author emails, please log on.

BMC Genomics 2013, 14:257  doi:10.1186/1471-2164-14-257

Published: 16 April 2013



Paired-tag sequencing approaches are commonly used for the analysis of genome structure. However, mammalian genomes have a complex organization with a variety of repetitive elements that complicate comprehensive genome-wide analyses.


Here, we systematically assessed the utility of paired-end and mate-pair (MP) next-generation sequencing libraries with insert sizes ranging from 170 bp to 25 kb, for genome coverage and for improving scaffolding of a mammalian genome (Rattus norvegicus). Despite a lower library complexity, large insert MP libraries (20 or 25 kb) provided very high physical genome coverage and were found to efficiently span repeat elements in the genome. Medium-sized (5, 8 or 15 kb) MP libraries were much more efficient for genome structure analysis than the more commonly used shorter insert paired-end and 3 kb MP libraries. Furthermore, the combination of medium- and large insert libraries resulted in a 3-fold increase in N50 in scaffolding processes. Finally, we show that our data can be used to evaluate and improve contig order and orientation in the current rat reference genome assembly.


We conclude that applying combinations of mate-pair libraries with insert sizes that match the distributions of repetitive elements improves contig scaffolding and can contribute to the finishing of draft genomes.

Genome structure; Genome scaffolding; Mate-pair next-generation sequencing; Contig assembly; Rat genome