Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Selected papers from the Seventh Asia-Pacific Bioinformatics Conference (APBC 2009)

Open Access Open Badges Research

Short read DNA fragment anchoring algorithm

Wendi Wang*, Peiheng Zhang and Xinchun Liu

Author Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, PR China

For all author emails, please log on.

BMC Bioinformatics 2009, 10(Suppl 1):S17  doi:10.1186/1471-2105-10-S1-S17

Published: 30 January 2009



The emerging next-generation sequencing method based on PCR technology boosts genome sequencing speed considerably, the expense is also get decreased. It has been utilized to address a broad range of bioinformatics problems. Limited by reliable output sequence length of next-generation sequencing technologies, we are confined to study gene fragments with 30~50 bps in general and it is relatively shorter than traditional gene fragment length. Anchoring gene fragments in long reference sequence is an essential and prerequisite step for further assembly and analysis works. Due to the sheer number of fragments produced by next-generation sequencing technologies and the huge size of reference sequences, anchoring would rapidly becoming a computational bottleneck.

Results and discussion

We compared algorithm efficiency on BLAT, SOAP and EMBF. The efficiency is defined as the count of total output results divided by time consumed to retrieve them. The data show that our algorithm EMBF have 3~4 times efficiency advantage over SOAP, and at least 150 times over BLAT. Moreover, when the reference sequence size is increased, the efficiency of SOAP will get degraded as far as 30%, while EMBF have preferable increasing tendency.


In conclusion, we deem that EMBF is more suitable for short fragment anchoring problem where result completeness and accuracy is predominant and the reference sequences are relatively large.