Open Access Highly Accessed Methodology article

Barnacle: detecting and characterizing tandem duplications and fusions in transcriptome assemblies

Lucas Swanson12, Gordon Robertson1, Karen L Mungall1, Yaron S Butterfield1, Readman Chiu1, Richard D Corbett1, T Roderick Docking1, Donna Hogge3, Shaun D Jackman1, Richard A Moore1, Andrew J Mungall1, Ka Ming Nip1, Jeremy DK Parker1, Jenny Qing Qian1, Anthony Raymond1, Sandy Sung1, Angela Tam1, Nina Thiessen1, Richard Varhol1, Sherry Wang1, Deniz Yorukoglu125, YongJun Zhao1, Pamela A Hoodless34, S Cenk Sahinalp2, Aly Karsan1 and Inanc Birol124*

Author Affiliations

1 Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, Canada

2 School of Computing Science, Simon Fraser University, Burnaby, Canada

3 Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, Canada

4 Department of Medical Genetics, University of British Columbia, Vancouver, Canada

5 Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, USA

For all author emails, please log on.

BMC Genomics 2013, 14:550  doi:10.1186/1471-2164-14-550

Published: 14 August 2013

Abstract

Background

Chimeric transcripts, including partial and internal tandem duplications (PTDs, ITDs) and gene fusions, are important in the detection, prognosis, and treatment of human cancers.

Results

We describe Barnacle, a production-grade analysis tool that detects such chimeras in de novo assemblies of RNA-seq data, and supports prioritizing them for review and validation by reporting the relative coverage of co-occurring chimeric and wild-type transcripts. We demonstrate applications in large-scale disease studies, by identifying PTDs in MLL, ITDs in FLT3, and reciprocal fusions between PML and RARA, in two deeply sequenced acute myeloid leukemia (AML) RNA-seq datasets.

Conclusions

Our analyses of real and simulated data sets show that, with appropriate filter settings, Barnacle makes highly specific predictions for three types of chimeric transcripts that are important in a range of cancers: PTDs, ITDs, and fusions. High specificity makes manual review and validation efficient, which is necessary in large-scale disease studies. Characterizing an extended range of chimera types will help generate insights into progression, treatment, and outcomes for complex diseases.

Keywords:
Transcriptome assembly; Chimeric transcripts; Fusion; Partial tandem duplication; PTD; Internal tandem duplication; ITD; RNA-seq; Transcriptome