Open Access Open Badges Methodology article

Identification of medium-sized genomic deletions with low coverage, mate-paired restricted tags

Qiang Gong12, Yong Tao1, Jian-Rong Yang3, Jun Cai1, Yunfei Yuan4, Jue Ruan1, Jin Yang5, Hailiang Liu3, Wanghua Li12, Xuemei Lu15, Shi-Mei Zhuang3, San Ming Wang6 and Chung-I Wu17*

Author Affiliations

1 Laboratory of Disease Genomics and Individualized Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, P.R. China

2 University of Chinese Academy of Sciences, Beijing, P.R. China

3 Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Science, Sun Yat-sen University, Guangzhou, P.R. China

4 Department of Hepatobiliary Oncology, Cancer Center, Sun Yat-sen University, Guangzhou, P.R. China

5 Chinese Academy of Sciences Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, P.R. China

6 Department of Genetics, Cell Biology & Anatomy, College of Medicine, University of Nebraska Medical Center, Nebraska, USA

7 Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA

For all author emails, please log on.

BMC Genomics 2013, 14:51  doi:10.1186/1471-2164-14-51

Published: 24 January 2013



Genomic deletions are known to be widespread in many species. Variant sequencing-based approaches for identifying deletions have been developed, but their powers to detect those deletions that affect medium-sized regions are limited when the sequencing coverage is low.


We present a cost-effective method for identifying medium-sized deletions in genomic regions with low genomic coverage. Two mate-paired libraries were separately constructed from human cancerous tissue to generate paired short reads (ditags) from restriction fragments digested with a 4-base restriction enzyme. A total of 3 Gb of paired reads (1.0× genome size) was collected, and 175 deletions were inferred by identifying the ditags with disorder alignments to the reference genome sequence. Sanger sequencing results confirmed an overall detection accuracy of 95%. Good reproducibility was verified by the deletions that were detected by both libraries.


We provide an approach to accurately identify medium-sized deletions in large genomes with low sequence coverage. It can be applied in studies of comparative genomics and in the identification of germline and somatic variants.

Medium-sized deletion; Restriction enzymes; Next generation sequencing; Structural variation