Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Proceedings of the 2012 International Conference on Intelligent Computing (ICIC 2012)

Open Access Proceedings

Diagnostic prediction of complex diseases using phase-only correlation based on virtual sample template

Shu-Lin Wang, Yaping Fang and Jianwen Fang*

Author Affiliations

Applied Bioinformatics Laboratory, the University of Kansas, 2034 Becker Drive, Lawrence, KS 66047, USA

For all author emails, please log on.

BMC Bioinformatics 2013, 14(Suppl 8):S11  doi:10.1186/1471-2105-14-S8-S11

Published: 9 May 2013

Abstract

Motivation

Complex diseases induce perturbations to interaction and regulation networks in living systems, resulting in dynamic equilibrium states that differ for different diseases and also normal states. Thus identifying gene expression patterns corresponding to different equilibrium states is of great benefit to the diagnosis and treatment of complex diseases. However, it remains a major challenge to deal with the high dimensionality and small size of available complex disease gene expression datasets currently used for discovering gene expression patterns.

Results

Here we present a phase-only correlation (POC) based classification method for recognizing the type of complex diseases. First, a virtual sample template is constructed for each subclass by averaging all samples of each subclass in a training dataset. Then the label of a test sample is determined by measuring the similarity between the test sample and each template. This novel method can detect the similarity of overall patterns emerged from the differentially expressed genes or proteins while ignoring small mismatches.

Conclusions

The experimental results obtained on seven publicly available complex disease datasets including microarray and protein array data demonstrate that the proposed POC-based disease classification method is effective and robust for diagnosing complex diseases with regard to the number of initially selected features, and its recognition accuracy is better than or comparable to other state-of-the-art machine learning methods. In addition, the proposed method does not require parameter tuning and data scaling, which can effectively reduce the occurrence of over-fitting and bias.