Log on / register
Feedback | Support | My details
Open AccessHighly AccessResearch article

Computational protein biomarker prediction: a case study for prostate cancer

Michael Wagner1 email, Dayanand N Naik2 email, Alex Pothen3 email, Srinivas Kasukurti3 email, Raghu Ram Devineni3 email, Bao-Ling Adam4 email, O John Semmes4 email and George L Wright Jr4 email

Cincinnati Children's Hospital Research Foundation and Department of Biomedical Engineering, University of Cincinnati, Cincinnati, OH 45229, USA

Department of Mathematics and Statistics, Old Dominion University, Norfolk, VA 23529, USA

Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA

Department of Microbiology and Molecular Cell Biology, Eastern Virginia Medical School, Norfolk, VA 23507, USA

author email corresponding author email

BMC Bioinformatics 2004, 5:26doi:10.1186/1471-2105-5-26

Published: 11 March 2004

Abstract

Background

Recent technological advances in mass spectrometry pose challenges in computational mathematics and statistics to process the mass spectral data into predictive models with clinical and biological significance. We discuss several classification-based approaches to finding protein biomarker candidates using protein profiles obtained via mass spectrometry, and we assess their statistical significance. Our overall goal is to implicate peaks that have a high likelihood of being biologically linked to a given disease state, and thus to narrow the search for biomarker candidates.

Results

Thorough cross-validation studies and randomization tests are performed on a prostate cancer dataset with over 300 patients, obtained at the Eastern Virginia Medical School using SELDI-TOF mass spectrometry. We obtain average classification accuracies of 87% on a four-group classification problem using a two-stage linear SVM-based procedure and just 13 peaks, with other methods performing comparably.

Conclusions

Modern feature selection and classification methods are powerful techniques for both the identification of biomarker candidates and the related problem of building predictive models from protein mass spectrometric profiles. Cross-validation and randomization are essential tools that must be performed carefully in order not to bias the results unfairly. However, only a biological validation and identification of the underlying proteins will ultimately confirm the actual value and power of any computational predictions.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.