This article is part of the supplement: Proceedings of the Second Annual RECOMB Satellite Workshop on Massively Parallel Sequencing (RECOMB-seq 2012)
PAIR: polymorphic Alu insertion recognition
School of Science and Engineering, Reykjavik University, Reykjavík, 101, Iceland
BMC Bioinformatics 2012, 13(Suppl 6):S7 doi:10.1186/1471-2105-13-S6-S7Published: 19 April 2012
Alu polymorphisms are some of the most common polymorphisms in the genome, yet few methods have been developed for their detection.
We present algorithms to discover Alu polymorphisms using paired-end high throughput sequencing data from multiple individuals. We consider the problem of identifying sites containing polymorphic Alu insertions.
We give efficient and practical algorithms that detect polymorphic Alus, both those that are inserted with respect to the reference genome and those that are deleted. The algorithms have a linear time complexity and can be run on a standard desktop machine in a very short amount of time on top of the output of tools standard for sequencing analysis.
In our simulated dataset we are able to locate 98.1% of Alus inserted with respect to the reference and 97.7% of Alus deleted, our simulations also show an excellent correlations between the deletions detected in parents and children. We further run our algorithms on publicly available data from the 1000 genomes project and find several thousand Alu polymorphisms in each individual.