Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Methodology article

An optimized procedure greatly improves EST vector contamination removal

Yi-An Chen1, Chang-Chun Lin1, Chin-Di Wang1, Huan-Bin Wu12 and Pei-Ing Hwang12*

Author Affiliations

1 Bioinformatics Core Laboratory, Agricultural Biotechnology Research Center, Academia Sinica, Taipei, Taiwan

2 Lab of Mathematics in Biology, Institute of Statistical Sciences, Academia Sinica, Taipei, Taiwan

For all author emails, please log on.

BMC Genomics 2007, 8:416  doi:10.1186/1471-2164-8-416

Published: 13 November 2007

Additional files

Additional file 1:

BLAST evaluation on the vector trimming results conducted with the three trimming programs using either vector form. The BLAST analysis results following the filtering criteria used through this report are shown in this Excel file. The bioinformatic program and the vector form used for trimming are indicated in the name of the worksheet. The worksheet "column descript" provides a description of what each column name represents. This file contains all the source data used to derive Tables 3 and 4 in the main text.

Format: XLS Size: 635KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 2:

Effect of vectors on vector trimming performance by three programs. The same as Table 4 in the main text except that the number, instead of the percentage, of incompletely trimmed ESTs was used.

Format: PDF Size: 29KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Artifact vector trimming found with ESTs from dbEST at NCBI. Error rate of dbEST with emphasis on vector contamination was investigated by "BLASTing" the ESTs randomly sampled from dbEST at NCBI either against the UniVec (worksheet "601_UniVec") or against the sequences of their cloning vectors (worksheet "601_22vector"). Shown in the Excel file are the filtered BLAST results according to the criteria described in Methods. Please note that in worksheet "601_22vector", only 35,363 EST sequences which were cloned into the most prevalent 22 vectors were used for BLAST analysis (Please see methods for details.) The Spreadsheet "col des" provides a description of each column.

Format: XLS Size: 1.2MB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data