Open Access Research article

Content-based histopathology image retrieval using CometCloud

Xin Qi12*, Daihou Wang23, Ivan Rodero3, Javier Diaz-Montes3, Rebekah H Gensure1, Fuyong Xing5, Hua Zhong1, Lauri Goodell1, Manish Parashar3, David J Foran124 and Lin Yang5

Author Affiliations

1 Department of Pathology and Laboratory Medicine, Rutger Robert Wood Johnson Medical School, 675 Hoes Lane, Piscataway, NJ, USA

2 Center for Biomedical Imaging and Informatics, Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA

3 Rutgers Discovery Informatics Institute and NSF Cloud and Autonomic Computing Center, Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ, USA

4 Department of Radiology, Rutgers - Robert Wood Johnson Medical School, Piscataway, NJ, USA

5 Division of Biomedical Informatics, Department of Biostatistics, Department of Computer Science, University of Kentucky, Lexington, KY, USA

For all author emails, please log on.

BMC Bioinformatics 2014, 15:287  doi:10.1186/1471-2105-15-287

Published: 26 August 2014



The development of digital imaging technology is creating extraordinary levels of accuracy that provide support for improved reliability in different aspects of the image analysis, such as content-based image retrieval, image segmentation, and classification. This has dramatically increased the volume and rate at which data are generated. Together these facts make querying and sharing non-trivial and render centralized solutions unfeasible. Moreover, in many cases this data is often distributed and must be shared across multiple institutions requiring decentralized solutions. In this context, a new generation of data/information driven applications must be developed to take advantage of the national advanced cyber-infrastructure (ACI) which enable investigators to seamlessly and securely interact with information/data which is distributed across geographically disparate resources. This paper presents the development and evaluation of a novel content-based image retrieval (CBIR) framework. The methods were tested extensively using both peripheral blood smears and renal glomeruli specimens. The datasets and performance were evaluated by two pathologists to determine the concordance.


The CBIR algorithms that were developed can reliably retrieve the candidate image patches exhibiting intensity and morphological characteristics that are most similar to a given query image. The methods described in this paper are able to reliably discriminate among subtle staining differences and spatial pattern distributions. By integrating a newly developed dual-similarity relevance feedback module into the CBIR framework, the CBIR results were improved substantially. By aggregating the computational power of high performance computing (HPC) and cloud resources, we demonstrated that the method can be successfully executed in minutes on the Cloud compared to weeks using standard computers.


In this paper, we present a set of newly developed CBIR algorithms and validate them using two different pathology applications, which are regularly evaluated in the practice of pathology. Comparative experimental results demonstrate excellent performance throughout the course of a set of systematic studies. Additionally, we present and evaluate a framework to enable the execution of these algorithms across distributed resources. We show how parallel searching of content-wise similar images in the dataset significantly reduces the overall computational time to ensure the practical utility of the proposed CBIR algorithms.

Histopathology; Digital pathology; Content-based image retrieval; High performance computing