This article is part of the supplement: Italian Society of Bioinformatics (BITS): Annual Meeting 2012
State of art fusion-finder algorithms are suitable to detect transcription-induced chimeras in normal tissues?
- Equal contributors
1 University of Torino, Bioinformatics & Genomics unit, Molecular Biotechnology Center, Via Nizza 52, 10126 Torino, Italy
2 University of Torino, Department of Computer Science, Corso Svizzera 185, 10149 Torino, Italy
3 University of Torino, Unit of Cancer Epidemiology, Department of Biomedical Sciences and Human Oncology, Via Santena 7, 10126 Torino, Italy
BMC Bioinformatics 2013, 14(Suppl 7):S2 doi:10.1186/1471-2105-14-S7-S2Published: 22 April 2013
RNA-seq has the potential to discover genes created by chromosomal rearrangements. Fusion genes, also known as "chimeras", are formed by the breakage and re-joining of two different chromosomes. It is known that chimeras have been implicated in the development of cancer. Few publications in the past showed the presence of fusion events also in normal tissue, but with very limited overlaps between their results. More recently, two fusion genes in normal tissues were detected using both RNA-seq and protein data.
Due to heterogeneous results in identifying chimeras in normal tissue, we decided to evaluate the efficacy of state of the art fusion finders in detecting chimeras in RNA-seq data from normal tissues.
We compared the performance of six fusion-finder tools: FusionHunter, FusionMap, FusionFinder, MapSplice, deFuse and TopHat-fusion. To evaluate the sensitivity we used a synthetic dataset of fusion-products, called positive dataset; in these experiments FusionMap, FusionFinder, MapSplice, and TopHat-fusion are able to detect more than 78% of fusion genes. All tools were error prone with high variability among the tools, identifying some fusion genes not present in the synthetic dataset. To better investigate the false discovery chimera detection rate, synthetic datasets free of fusion-products, called negative datasets, were used. The negative datasets have different read lengths and quality scores, which allow detecting dependency of the tools on both these features. FusionMap, FusionFinder, mapSplice, deFuse and TopHat-fusion were error-prone. Only FusionHunter results were free of false positive. FusionMap gave the best compromise in terms of specificity in the negative dataset and of sensitivity in the positive dataset.
We have observed a dependency of the tools on read length, quality score and on the number of reads supporting each chimera. Thus, it is important to carefully select the software on the basis of the structure of the RNA-seq data under analysis. Furthermore, the sensitivity of chimera detection tools does not seem to be sufficient to provide results consistent with those obtained in normal tissues on the basis of fusion events extracted from published data.