Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

Discovery of protein-protein interactions using a combination of linguistic, statistical and graphical information

James W Cooper1* and Aaron Kershenbaum2

Author Affiliations

1 Text Analytics, IBM Thomas J Watson Research Center, PO Box 704, Yorktown Heights, NY 10598, USA

2 Bioinformatics, IBM Thomas J Watson Research Center, PO Box 218, YorktownHeights, NY 10598, USA

For all author emails, please log on.

BMC Bioinformatics 2005, 6:143  doi:10.1186/1471-2105-6-143

Published: 7 June 2005



The rapid publication of important research in the biomedical literature makes it increasingly difficult for researchers to keep current with significant work in their area of interest.


This paper reports a scalable method for the discovery of protein-protein interactions in Medline abstracts, using a combination of text analytics, statistical and graphical analysis, and a set of easily implemented rules. Applying these techniques to 12,300 abstracts, a precision of 0.61 and a recall of 0.97 were obtained, (f = 0.74) and when allowing for two-hop and three-hop relations discovered by graphical analysis, the precision was 0.74 (f = 0.83).


This combination of linguistic and statistical approaches appears to provide the highest precision and recall thus far reported in detecting protein-protein relations using text analytic approaches.