Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

Information extraction from full text scientific articles: Where are the keywords?

Parantu K Shah12, Carolina Perez-Iratxeta12, Peer Bork12* and Miguel A Andrade123

Author affiliations

1 Biocomputing, European Molecular Biology Laboratory, Heidelberg, Germany

2 Department of Bioinformatics, Max Delbrück Center for Molecular Medicine, Berlin-Buch, Germany

3 Present address: Bioinformatics group, Ottawa Health Research Institute, Ottawa, Canada

For all author emails, please log on.

Citation and License

BMC Bioinformatics 2003, 4:20  doi:10.1186/1471-2105-4-20

Published: 29 May 2003



To date, many of the methods for information extraction of biological information from scientific articles are restricted to the abstract of the article. However, full text articles in electronic version, which offer larger sources of data, are currently available. Several questions arise as to whether the effort of scanning full text articles is worthy, or whether the information that can be extracted from the different sections of an article can be relevant.


In this work we addressed those questions showing that the keyword content of the different sections of a standard scientific article (abstract, introduction, methods, results, and discussion) is very heterogeneous.


Although the abstract contains the best ratio of keywords per total of words, other sections of the article may be a better source of biologically relevant data.

Information extraction; full text article; keyword; gene name; data mining; text mining