Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: UT-ORNL-KBRIN Bioinformatics Summit 2010

Open Access Open Badges Poster presentation

Application of Pearson correlation coefficient (PCC) and Kolmogorov-Smirnov distance (KSD) metrics to identify disease-specific biomarker genes

Hung-Chung Huang12, Siyuan Zheng123 and Zhongming Zhao123*

Author Affiliations

1 Functional Genomics Shared Resource, Vanderbilt University Medical Center, Nashville, TN 37232, USA

2 Bioinformatics Resource Center, Vanderbilt University Medical Center, Nashville, TN 37203, USA

3 Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA

For all author emails, please log on.

BMC Bioinformatics 2010, 11(Suppl 4):P23  doi:10.1186/1471-2105-11-S4-P23

The electronic version of this article is the complete one and can be found online at:

Published:23 July 2010

© 2010 Zhao et al; licensee BioMed Central Ltd.


DNA microarrays have been widely applied in cancer research for better diagnosis and prediction of the disease states. Traditionally, most microarray studies aim to identify differentially expressed genes (DEGs) by comparing the average gene expression levels between two groups (e.g., the treated vs. control or disease vs. non-disease) based on statistical analysis such as t-test and Significance Analysis of Microarrays (SAM) [1,2].

Materials and methods

In this study, we defined the gene expression profile (GEP) of a gene as the distribution of the log2 values of its normalized expression signal intensities across the samples in the similarly studied microarrays. We hypothesized that the biomarker genes that distinguish disease samples from normal samples might form distinct GEPs between comparison groups. We applied Pearson Correlation Coefficient (PCC) and Kolmogorov-Smirnov Distance (KSD) metrics to identify disease-specific biomarkers by comparing GEPs between normal and disease states and then applied this technology to disease (e.g., cancer) related studies in order to discover some disease genes as biomarker candidates. These biomarkers’ gene profiles in normal and disease samples might be used to diagnose or monitor patient's disease state via regular gene expression analysis.

Results and conclusion

We applied the PCC and KSD metrics to three prostate cancer related microarray datasets. They were generated from the same study and were available in the GEO database (a total of 81 normal samples and 90 prostate cancer samples) [3]. Using the cutoff values KSD > 0.4 and PCC < 0.7, we found 230 biomarker candidate genes. Our Gene Ontology (GO) analysis found that the top ranked biomarker candidate genes for prostate cancer were highly enriched in molecular functions such as “cytoskeletal protein binding” category. We used the top two ranked genes (ACTA1, encoding an actin subunit, and HPN, encoding hepsin) to demonstrate that prostate cancer might be diagnosed and monitored by marker genes. Furthermore, we picked top 20 significantly up-regulated and top 20 down-regulated genes based on PCC and KSD sorting. We found gene pairs comprising one up-regulated and another down-regulated had always best prediction performance (Table 1). Our study provided a promising tool to identify the potential biomarker genes for disease diagnosis and prognosis.

Table 1. Top 10 gene pairs for top prediction accuracies on PCA diagnosis.


  1. Jafari P, Azuaje F: An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors.

    BMC Med Inform Decis Mak 2006, 6:27. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  2. Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response.

    Proc Natl Acad Sci USA 2001, 98:5116-5121. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  3. Chandran UR, Ma C, Dhir R, Bisceglia M, Lyons-Weiler M, Liang W, Michalopoulos G, Becich M, Monzon FA: Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic process.

    BMC Cancer 2007, 7:64. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL