Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Workshop on Advances in Bio Text Mining

Open Access Oral presentation

Pitfalls in applying text mining to scientific literature

Jean-Marc Neefs

Author Affiliations

Janssen Pharmaceutica, 2340 Beerse, Belgium

BMC Bioinformatics 2010, 11(Suppl 5):O4  doi:10.1186/1471-2105-11-S5-O4


The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/11/S5/O4


Published:6 October 2010

© 2010 Neefs; licensee BioMed Central Ltd.

Oral presentation

Numbers and data mining are easy. Our numerical system counts 10 digits, any combination is possible, and every measured value can be captured in a number. Large quantities of measures can be analysed efficiently using incredibly powerful calculators, and resulting information can be shown is simple clear graphs.

Text is hard. Hundreds of letters and millions of different combinations can be used in the personal interpretation of information, in words and phrases that reflect one's personality rather than objective measurements. Depending on context and language, the same expression carries totally different information, or no meaning at all.

Text Mining requires 'education' at different levels: for providing information, to capture, to store and to retrieve that information, and to interpret results of the mining process.

I will provide a few examples of a few text mining tools in daily practice.