BMC Bioinformatics
|
Viewing options:Associated material:Related literature:- Articles citing this article
- Other articles by authors
- Related articles/pages
Tools:Post to:
|
Methodology articleIncidence of "quasi-ditags" in catalogs generated by Serial Analysis of Gene Expression (SAGE)Sergey V Anisimov1 and Alexei A Sharov2  1
Section for Neuronal Survival, Wallenberg Neuroscience Center, Lund University, 221 84 Lund, Sweden 2
Laboratory of Genetics, National Institute on Aging, National Institutes of Health, Baltimore, MD, 21224, USA author email corresponding author email
BMC Bioinformatics 2004,
5:152doi:10.1186/1471-2105-5-152
|
|
| Published: |
18 October 2004 |
Abstract
Background
Serial Analysis of Gene Expression (SAGE) is a functional genomic technique that quantitatively analyzes the cellular transcriptome. The analysis of SAGE libraries relies on the identification of ditags from sequencing files; however, the software used to examine SAGE libraries cannot distinguish between authentic versus false ditags ("quasi-ditags").
Results
We provide examples of quasi-ditags that originate from cloning and sequencing artifacts (i.e. genomic contamination or random combinations of nucleotides) that are included in SAGE libraries. We have employed a mathematical model to predict the frequency of quasi-ditags in random nucleotide sequences, and our data show that clones containing less than or equal to 2 ditags (which include chromosomal cloning artifacts) should be excluded from the analysis of SAGE catalogs.
Conclusions
Cloning and sequencing artifacts contaminating SAGE libraries could be eliminated using simple pre-screening procedure to increase the reliability of the data. |