Open Access Research article

Confident Predictability: Identifying reliable gene expression patterns for individualized tumor classification using a local minimax kernel algorithm

Lee K Jones1*, Fei Zou1, Alexander Kheifets1, Konstantin Rybnikov1, Damon Berry1 and Aik Choon Tan2*

Author affiliations

1 Department of Mathematical Sciences, University of Massachusetts, Lowell, MA, USA

2 Division of Medical Oncology, Department of Medicine, University of Colorado Denver School of Medicine, Anschutz Medical Campus, Aurora, CO, USA

For all author emails, please log on.

Citation and License

BMC Medical Genomics 2011, 4:10  doi:10.1186/1755-8794-4-10

Published: 24 January 2011



Molecular classification of tumors can be achieved by global gene expression profiling. Most machine learning classification algorithms furnish global error rates for the entire population. A few algorithms provide an estimate of probability of malignancy for each queried patient but the degree of accuracy of these estimates is unknown. On the other hand local minimax learning provides such probability estimates with best finite sample bounds on expected mean squared error on an individual basis for each queried patient. This allows a significant percentage of the patients to be identified as confidently predictable, a condition that ensures that the machine learning algorithm possesses an error rate below the tolerable level when applied to the confidently predictable patients.


We devise a new learning method that implements: (i) feature selection using the k-TSP algorithm and (ii) classifier construction by local minimax kernel learning. We test our method on three publicly available gene expression datasets and achieve significantly lower error rate for a substantial identifiable subset of patients. Our final classifiers are simple to interpret and they can make prediction on an individual basis with an individualized confidence level.


Patients that were predicted confidently by the classifiers as cancer can receive immediate and appropriate treatment whilst patients that were predicted confidently as healthy will be spared from unnecessary treatment. We believe that our method can be a useful tool to translate the gene expression signatures into clinical practice for personalized medicine.