Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Methodology article

Discovery of transgene insertion sites by high throughput sequencing of mate pair libraries

Anuj Srivastava1*, Vivek M Philip1, Ian Greenstein2, Lucy B Rowe3, Mary Barter3, Cathleen Lutz2 and Laura G Reinholdt2

Author Affiliations

1 Computational Sciences, The Jackson Laboratory, Bar Harbor, ME USA

2 Genetic Resource Sciences, The Jackson Laboratory, Bar Harbor, ME USA

3 Genome Technologies, The Jackson Laboratory, Bar Harbor, ME USA

For all author emails, please log on.

BMC Genomics 2014, 15:367  doi:10.1186/1471-2164-15-367

Published: 14 May 2014



Transgenesis by random integration of a transgene into the genome of a zygote has become a reliable and powerful method for the creation of new mouse strains that express exogenous genes, including human disease genes, tissue specific reporter genes or genes that allow for tissue specific recombination. Nearly 6,500 transgenic alleles have been created by random integration in embryos over the last 30 years, but for the vast majority of these strains, the transgene insertion sites remain uncharacterized.


To obtain a complete understanding of how insertion sites might contribute to phenotypic outcomes, to more cost effectively manage transgenic strains, and to fully understand mechanisms of instability in transgene expression, we’ve developed methodology and a scoring scheme for transgene insertion site discovery using high throughput sequencing data.


Similar to other molecular approaches to transgene insertion site discovery, high-throughput sequencing of standard paired-end libraries is hindered by low signal to noise ratios. This problem is exacerbated when the transgene consists of sequences that are also present in the host genome. We’ve found that high throughput sequencing data from mate-pair libraries are more informative when compared to data from standard paired end libraries. We also show examples of the genomic regions that harbor transgenes, which have in common a preponderance of repetitive sequences.

High-throughput sequencing; Mate pair library; Transgenic; Transgene insertion sites