Log on / register
Feedback | Support | My details
Open AccessMethodology article

Incidence of "quasi-ditags" in catalogs generated by Serial Analysis of Gene Expression (SAGE)

Sergey V Anisimov1 email and Alexei A Sharov2 email

Section for Neuronal Survival, Wallenberg Neuroscience Center, Lund University, 221 84 Lund, Sweden

Laboratory of Genetics, National Institute on Aging, National Institutes of Health, Baltimore, MD, 21224, USA

author email corresponding author email

BMC Bioinformatics 2004, 5:152doi:10.1186/1471-2105-5-152

Published: 18 October 2004

Abstract

Background

Serial Analysis of Gene Expression (SAGE) is a functional genomic technique that quantitatively analyzes the cellular transcriptome. The analysis of SAGE libraries relies on the identification of ditags from sequencing files; however, the software used to examine SAGE libraries cannot distinguish between authentic versus false ditags ("quasi-ditags").

Results

We provide examples of quasi-ditags that originate from cloning and sequencing artifacts (i.e. genomic contamination or random combinations of nucleotides) that are included in SAGE libraries. We have employed a mathematical model to predict the frequency of quasi-ditags in random nucleotide sequences, and our data show that clones containing less than or equal to 2 ditags (which include chromosomal cloning artifacts) should be excluded from the analysis of SAGE catalogs.

Conclusions

Cloning and sequencing artifacts contaminating SAGE libraries could be eliminated using simple pre-screening procedure to increase the reliability of the data.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.