Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Software

ESTclean: a cleaning tool for next-gen transcriptome shotgun sequencing

Hongseok Tae2, Dongsung Ryu1, Suhas Sureshchandra1 and Jeong-Hyeon Choi12*

Author Affiliations

1 Cancer Center, Department of Biostatistics, Georgia Health Sciences University, Augusta, GA 30912, USA

2 The Center for Genomics and Bioinformatics, Indiana University, Bloomington, IN 47401, USA

For all author emails, please log on.

BMC Bioinformatics 2012, 13:247  doi:10.1186/1471-2105-13-247

Published: 26 September 2012

Abstract

Background

With the advent of next-generation sequencing (NGS) technologies, full cDNA shotgun sequencing has become a major approach in the study of transcriptomes, and several different protocols in 454 sequencing have been invented. As each protocol uses its own short DNA tags or adapters attached to the ends of cDNA fragments for labeling or sequencing, different contaminants may lead to mis-assembly and inaccurate sequence products.

Results

We have designed and implemented a new program for raw sequence cleaning in a graphical user interface and a batch script. The cleaning process consists of several modules including barcode trimming, sequencing adapter trimming, amplification primer trimming, poly-A tail trimming, vector screening and low quality region trimming. These modules can be combined based on various sequencing applications.

Conclusions

ESTclean is a software package not only for cleaning cDNA sequences, but also for helping to develop sequencing protocols by providing summary tables and figures for sequencing quality control in a graphical user interface. It outperforms in cleaning read sequences from complicated sequencing protocols which use barcodes and multiple amplification primers.