This article is part of the supplement: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2012: Medical Genomics

Open Access Open Badges Research

A comparison of statistical methods for the detection of hepatocellular carcinoma based on serum biomarkers and clinical variables

Mengjun Wang1, Anand Mehta1, Timothy M Block1, Jorge Marrero2, Adrian M Di Bisceglie3 and Karthik Devarajan4*

Author Affiliations

1 3508 Old Easton Rd, Doylestown, PA, 18902, USA

2 Division of Gastroenterology, University of Michigan, 3912 Taubman Center, Ann Arbor, MI 48109, USA

3 Saint Louis University School of Medicine, 1402 S. Grand FDT 12th Floor, St. Louis, MO 63104, USA

4 Department of Biostatistics and Bioinformatics, Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, PA 18901, USA

For all author emails, please log on.

BMC Medical Genomics 2013, 6(Suppl 3):S9  doi:10.1186/1755-8794-6-S3-S9

Published: 11 November 2013



Currently, a surgical approach is the best curative treatment for those with hepatocellular carcinoma (HCC). However, this requires HCC detection and removal of the lesion at an early stage. Unfortunately, most cases of HCC are detected at an advanced stage because of the lack of accurate biomarkers that can be used in the surveillance of those at risk. It is believed that biomarkers that could detect HCC early will play an important role in the successful treatment of HCC.


In this study, we analyzed serum levels of alpha fetoprotein, Golgi protein, fucosylated alpha-1-anti-trypsin, and fucosylated kininogen from 113 patients with cirrhosis and 164 serum samples from patients with cirrhosis plus HCC. We utilized two different methods, namely, stepwise penalized logistic regression (stepPLR) and model-based classification and regression trees (mob), along with the inclusion of clinical and demographic factors such as age and gender, to determine if these improved algorithms could be used to increase the detection of cancer.

Results and discussion

The performance of multiple biomarkers was found to be better than that of individual biomarkers. Using several statistical methods, we were able to detect HCC in the background of cirrhosis with an area under the receiver operating characteristic curve of at least 0.95. stepPLR and mob demonstrated better predictive performance relative to logistic regression (LR), penalized LR and classification and regression trees (CART) used in our prior study based on three-fold cross-validation and leave one out cross-validation. In addition, mob provided unparalleled intuitive interpretation of results and potential cut-points for biomarker levels. The inclusion of age and gender improved the overall performance of both methods among all models considered, while the stratified male-only subset provided the best overall performance among all methods and models considered.


In addition to multiple biomarkers, the incorporation of age and gender into statistical models significantly improved their predictive performance in the detection of HCC.