Text mining improves the ranking of journal articles for curation. A test set of 354 articles slated for curation were first ranked by two different methods: (a) via each article's PubMed identification number in descending order (which typically reflects the publication date from newest to oldest paper) and (b) via the rank order determined by our rule-based text-mining application. The articles were then reviewed by a biocurator who determined that 167 of the papers contained relevant data (curated, black bars) while 187 of them did not (rejected, white bars). For presentation, the 354 articles are grouped into progressive quartiles (1st, 2nd, 3rd, and 4th) each containing 89 papers. The overall percent of total curated papers (167) vs. rejected papers (187) are shown distributed over each quartile. The text-mining tool (b) effectively ranked the more relevant papers into the first and second quartile and the less relevant papers to the third and fourth quartile compared to the less informed criteria of PubMed identification numbers (a).
Wiegers et al. BMC Bioinformatics 2009 10:326 doi:10.1186/1471-2105-10-326