Open Access Highly Accessed Open Badges Methodology article

A computational pipeline for the development of multi-marker bio-signature panels and ensemble classifiers

Oliver P Günther1, Virginia Chen1, Gabriela Cohen Freue12, Robert F Balshaw1122, Scott J Tebbutt110117, Zsuzsanna Hollander13, Mandeep Takhar1, W Robert McMaster149, Bruce M McManus11037, Paul A Keown1356 and Raymond T Ng18*

Author Affiliations

1 NCE CECR Prevention of Organ Failure (PROOF) Centre of Excellence, Vancouver, BC, V6Z 1Y6, Canada

2 Department of Statistics, University of British Columbia, Vancouver, BC, V6T 1Z2, Canada

3 Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, V6T 2B5, Canada

4 Immunity and Infection Research Centre, Vancouver, BC, V5Z 3J5, Canada

5 Immunology Laboratory, Vancouver General Hospital, Vancouver, BC, V5Z 1M9, Canada

6 Department of Medicine, University of British Columbia, Vancouver, BC, V5Z 1M9, Canada

7 James Hogg Research Centre, St. Paul’s Hospital, University of British Columbia, Vancouver, BC, V6Z 1Y6, Canada

8 Department of Computer Science, University of British Columbia, Vancouver, BC, V6T 1Z2, Canada

9 Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada

10 Institute for HEART+LUNG Health, Vancouver, BC, V6Z 1Y6, Canada

11 Department of Medicine, Division of Respiratory Medicine, University of British Columbia, Vancouver, BC, V5Z 1M9, Canada

12 Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada

For all author emails, please log on.

BMC Bioinformatics 2012, 13:326  doi:10.1186/1471-2105-13-326

Published: 8 December 2012



Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble?


The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity.


Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway.

Biomarkers; Computational; Pipeline; Genomics; Proteomics; Ensemble; Classification