This article is part of the supplement: Proceedings of the 2009 AMIA Summit on Translational Bioinformatics

Open Access Proceedings

Knowledge-based variable selection for learning rules from proteomic data

Jonathan L Lustgarten1*, Shyam Visweswaran1, Robert P Bowser2, William R Hogan13 and Vanathi Gopalakrishnan1

Author Affiliations

1 Department of Biomedical Informatics, University of Pittsburgh, 200 Meyran Ave, Parkvale M-183, Pittsburgh, PA, USA

2 Department of Pathology, University of Pittsburgh, S-417 BST, 200 Lothrop Street, Pittsburgh, PA 15261, USA

3 University of Pittsburgh Medical Center, Pittsburgh, PA, USA

For all author emails, please log on.

BMC Bioinformatics 2009, 10(Suppl 9):S16  doi:10.1186/1471-2105-10-S9-S16

Published: 17 September 2009



The incorporation of biological knowledge can enhance the analysis of biomedical data. We present a novel method that uses a proteomic knowledge base to enhance the performance of a rule-learning algorithm in identifying putative biomarkers of disease from high-dimensional proteomic mass spectral data. In particular, we use the Empirical Proteomics Ontology Knowledge Base (EPO-KB) that contains previously identified and validated proteomic biomarkers to select m/zs in a proteomic dataset prior to analysis to increase performance.


We show that using EPO-KB as a pre-processing method, specifically selecting all biomarkers found only in the biofluid of the proteomic dataset, reduces the dimensionality by 95% and provides a statistically significantly greater increase in performance over no variable selection and random variable selection.


Knowledge-based variable selection even with a sparsely-populated resource such as the EPO-KB increases overall performance of rule-learning for disease classification from high-dimensional proteomic mass spectra.