Open Access Highly Accessed Open Badges Research article

Clustering cliques for graph-based summarization of the biomedical research literature

Han Zhang12*, Marcelo Fiszman2, Dongwook Shin2, Bartlomiej Wilkowski34 and Thomas C Rindflesch2

Author Affiliations

1 Department of Medical Informatics, China Medical University, Shenyang, Liaoning 110001, China

2 National Library of Medicine, Bethesda, MD 20894, USA

3 DTUInformatics, Technical University of Denmark, Kongens Lyngby, Denmark

4 Danish National Biobank, National Health Surveillance & Research, Statens Serum Institut, Copenhagen, Denmark

For all author emails, please log on.

BMC Bioinformatics 2013, 14:182  doi:10.1186/1471-2105-14-182

Published: 7 June 2013



Graph-based notions are increasingly used in biomedical data mining and knowledge discovery tasks. In this paper, we present a clique-clustering method to automatically summarize graphs of semantic predications produced from PubMed citations (titles and abstracts).


SemRep is used to extract semantic predications from the citations returned by a PubMed search. Cliques were identified from frequently occurring predications with highly connected arguments filtered by degree centrality. Themes contained in the summary were identified with a hierarchical clustering algorithm based on common arguments shared among cliques. The validity of the clusters in the summaries produced was compared to the Silhouette-generated baseline for cohesion, separation and overall validity. The theme labels were also compared to a reference standard produced with major MeSH headings.


For 11 topics in the testing data set, the overall validity of clusters from the system summary was 10% better than the baseline (43% versus 33%). While compared to the reference standard from MeSH headings, the results for recall, precision and F-score were 0.64, 0.65, and 0.65 respectively.

Clique clustering; Graph-based summarization; Multi-document summarization; Semantic predications