Information extraction from full text scientific articles: Where are the keywords?
1 Biocomputing, European Molecular Biology Laboratory, Heidelberg, Germany
2 Department of Bioinformatics, Max Delbrück Center for Molecular Medicine, Berlin-Buch, Germany
3 Present address: Bioinformatics group, Ottawa Health Research Institute, Ottawa, Canada
BMC Bioinformatics 2003, 4:20 doi:10.1186/1471-2105-4-20Published: 29 May 2003
To date, many of the methods for information extraction of biological information from scientific articles are restricted to the abstract of the article. However, full text articles in electronic version, which offer larger sources of data, are currently available. Several questions arise as to whether the effort of scanning full text articles is worthy, or whether the information that can be extracted from the different sections of an article can be relevant.
In this work we addressed those questions showing that the keyword content of the different sections of a standard scientific article (abstract, introduction, methods, results, and discussion) is very heterogeneous.
Although the abstract contains the best ratio of keywords per total of words, other sections of the article may be a better source of biologically relevant data.