Log on / register
Feedback | Support | My details

Comments(6)

Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics

Barry R Zeeberg* email, Joseph Riss* email, David W Kane email, Kimberly J Bussey email, Edward Uchio email, W Marston Linehan email, J Carl Barrett email and John N Weinstein email

BMC Bioinformatics 2004, 5:80doi:10.1186/1471-2105-5-80

not only excel

Heikki Lehvaslaiho   (30 June 2004)  European Bioinformatics Institute email

I quickly tested a few common open source spreadsheet programs, openoffice.org calc, gnumeric and kspread, for this automatic symbol mutation ability.

The following crude text table indicates if the conversions happens by default in these programs. "date" means that DEC1 type string gets converted, "float" means that RIKEN identifiers of type "2310009E13" get converted.

.................."date"...."float"

calc................yes........yes

gnumeric........no........yes

kspread.........no........yes

Be careful out there!

Competing interests

None declared

top

Well spotted

Andrew Clegg   (21 July 2004)  Birkbeck

One to pin up on lab walls everywhere. I shudder to think how many pieces of work this might have affected.

Competing interests

None declared

top

Special Interest group on spreadsheet risks

Patrick OBeirne   (26 July 2004)  Eusprig email

The European Spreadsheet Risk Interest Group (EUSPRIG) discusses the prevention and detection of spreadsheet errors. You can read about the emergence of the discipline of Spreadsheet Engineering and other related information at our website <a href="http://www.eusprig.org">www.eusprig.org</a>. We have just completed our fifth international conference and now have a corpus of approximately 100 peer reviewed papers in our subject domain.

For more reports of spreadsheet errors, see

<a href="http://www.eusprig.org/stories.htm">our stories</a>

We're not specifically a group to discuss Excel bugs and workarounds, the <a href="http://peach.ease.lsoft.com/archives/excel-l.html">Excel-L list</a> is a very busy source of information on these, as well of course as the MS Knowledgebase.

We are very interested in hearing from users about how you mitigate spreadsheet risks, what good practices they adopt, and so on. We are working with the ECDL Foundation for a syllabus of good practice for end users.

Patrick O'Beirne, chair, Eusprig

Competing interests

none

top

Good point.

Carol Bult   (27 July 2004)  The Jackson Laboratory

The article raises a very good point. I've experienced similar behavior in excel for other data types. I would add that it is always a good idea to carry along a unique numeric database id along with gene names/symbols. Database accession ids may be less likely to be munged by Excel (unless the ids are alpha-numeric!) and since they are usually unique and permanent they can be used to restore and/or update lists of gene names/symbols (which change all of the time).

Competing interests

No competing interests

top

19 probe sets in Affymetrix's human U133Plus2.0

Chao Lu   (28 July 2004)  Hospital for Sick Children, Toronto

A good point. Many people did not pay attention to this 'small' error.

Here is a list of 19 probe sets with errors in their gene symbol (June 23, 04 annotation, Affymetrix) when opened in Excel:

1570394_at ===> 1-Sep

200902_at ===> 15-Sep

208999_at ===> 8-Sep

209000_s_at ===> 8-Sep

212413_at ===> 6-Sep

212414_s_at ===> 6-Sep

212415_at ===> 6-Sep

212698_s_at ===> 10-Sep

213666_at ===> 6-Sep

214298_x_at ===> 6-Sep

214720_x_at ===> 10-Sep

220781_at ===> 1-Dec

221129_at ===> 2-Apr

223362_s_at ===> 3-Sep

225814_at ===> 1-Sep

226627_at ===> 8-Sep

227034_at ===> 10-Sep

227552_at ===> 1-Sep

233632_s_at ===> 1-Sep

Competing interests

None declared

top

And the lesson is...

Neil Saunders   (11 April 2008)  University of Queensland

And that's why bioinformaticians don't use Excel for this purpose. Or more generally, don't use spreadsheets as "databases".

Competing interests

None declared

top


© 1999-2008 BioMed Central Ltd unless otherwise stated