|
Using
BioMed Central's open access full-text corpus for text mining research
BioMed Central has so far published 35238
articles of peer-reviewed biomedical research, all of which are covered
by our open access license agreement
which allows free distribution and re-use of the full- text article, including
the highly structured XML version.
As a result, BioMed Central's research article corpus
is ideally suited for use by text mining researchers.
An XSLT preview stylesheet, which will render any BioMed
Central article XML file into HTML, is now available:
preview.xsl (37K)
Sample code for developers, demonstrating the use of the stylesheet,
is also available:
How to download BioMed
Central's corpus
1. By FTP
Server: ftp.biomedcentral.com
Directory: /content/
Username: datamining
Password: $8Xguppy
| File/directory |
Description |
| /content/index.xml |
An index of all research articles, in timestamp
order (the timestamp is the date on which the XML became available) |
| /content/articles/ |
A subdirectory containing the full-text XML file
for each article, each named based on its unique identifier - i.e.
[ui].xml |
| /content/articles.zip |
A single ZIP-compressed file containing all the
full-text XML files
Remember to set FTP transfer mode to BINARY |
2. Via the Open Archive Initiative Metadata Harvesting
Protocol (OAI protocol)
The OAI
protocol is an HTTP/XML web service standard for the exchange of data
between archives and repositories. Full-text XML is one of the metadata
formats that the BioMed Central OAI protocol interface supports. See
BioMed Central's OAI page for more
details.
You should use the following OAI 'set' to download
all open access research articles via BioMed Central's OAI interface.
articletype:research
Publish your text mining research with BioMed Central
BioMed Central is keen to publish high quality research in the
area of text mining and biomedical literature analysis.
See this list of recent publications on this topic that have appeared in BioMed Central's journals.
All research articles published by BioMed Central are
covered by our open access policy, and
so are freely available without subscription.
For more information about submitting an article, visit
the BMC Bioinformatics home page.
More information
For more information on using BioMed Central's articles
for text mining purposes info@biomedcentral.com.
Useful links
- BioNLP
- a collection of resources relating to textual analysis of the biological
literature
- BioLINK
- a special interest group on text mining, which in 2003 ran a text
mining competition that made use of BioMed Central's corpus
- BLIMP - a collection of links to publications on the subject of biomedical text mining
- Data
mining Open Access research - an article in the 8 September 2003
edition of Open Access Now
|