Log on / register
Feedback | Support | My details
Open AccessHighly AccessCorrespondence

Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics

Barry R Zeeberg* 1 email, Joseph Riss* 2 email, David W Kane3 email, Kimberly J Bussey1 email, Edward Uchio4 email, W Marston Linehan4 email, J Carl Barrett2 email and John N Weinstein1 email

1Genomics & Bioinformatics Group, Laboratory of Molecular Pharmacology, Center for Cancer Research (CCR), National Cancer Institute (NCI), National Institutes of Health (NIH), Bldg 37 Rm 5041, NIH, 9000 Rockville Pike, Bethesda, MD 20892 USA

2Laboratory of Biosystems and Cancer, CCR, Bldg 37 Rm 5032, NIH, 9000 Rockville Pike, Bethesda, MD 20892 USA

3SRA International, 4300 Fair Lakes CT, Fairfax, VA 22033 USA

4Urologic Oncology Branch, Bldg 10 Rm 2B47, National Institutes of Health, Bethesda, MD 20892 USA

author email corresponding author email* Contributed equally

BMC Bioinformatics 2004, 5:80doi:10.1186/1471-2105-5-80

Published: 23 June 2004

Abstract

Background

When processing microarray data sets, we recently noticed that some gene names were being changed inadvertently to non-gene names.

Results

A little detective work traced the problem to default date format conversions and floating-point format conversions in the very useful Excel program package. The date conversions affect at least 30 gene names; the floating-point conversions affect at least 2,000 if Riken identifiers are included. These conversions are irreversible; the original gene names cannot be recovered.

Conclusions

Users of Excel for analyses involving gene names should be aware of this problem, which can cause genes, including medically important ones, to be lost from view and which has contaminated even carefully curated public databases. We provide work-arounds and scripts for circumventing the problem.


© 1999-2008 BioMed Central Ltd unless otherwise stated