Log on / register
Feedback | Support | My details
Open AccessHighly AccessMethodology article

Regularized binormal ROC method in disease classification using microarray data

Shuangge Ma1 email, Xiao Song1 email and Jian Huang2 email

Department of Biostatistics, University of Washington, Seattle, WA 98195, USA

Department of Statistics & Actuarial Science and Program in Public Health Genetics, University of Iowa, Iowa City, IA 52242, USA

author email corresponding author email

BMC Bioinformatics 2006, 7:253doi:10.1186/1471-2105-7-253

Published: 9 May 2006

Abstract

Background

An important application of microarrays is to discover genomic biomarkers, among tens of thousands of genes assayed, for disease diagnosis and prognosis. Thus it is of interest to develop efficient statistical methods that can simultaneously identify important biomarkers from such high-throughput genomic data and construct appropriate classification rules. It is also of interest to develop methods for evaluation of classification performance and ranking of identified biomarkers.

Results

The ROC (receiver operating characteristic) technique has been widely used in disease classification with low dimensional biomarkers. Compared with the empirical ROC approach, the binormal ROC is computationally more affordable and robust in small sample size cases. We propose using the binormal AUC (area under the ROC curve) as the objective function for two-sample classification, and the scaled threshold gradient directed regularization method for regularized estimation and biomarker selection. Tuning parameter selection is based on V-fold cross validation. We develop Monte Carlo based methods for evaluating the stability of individual biomarkers and overall prediction performance. Extensive simulation studies show that the proposed approach can generate parsimonious models with excellent classification and prediction performance, under most simulated scenarios including model mis-specification. Application of the method to two cancer studies shows that the identified genes are reasonably stable with satisfactory prediction performance and biologically sound implications. The overall classification performance is satisfactory, with small classification errors and large AUCs.

Conclusion

In comparison to existing methods, the proposed approach is computationally more affordable without losing the optimality possessed by the standard ROC method.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.