Open Access Highly Accessed Research article

Identification of a biomarker panel for colorectal cancer diagnosis

Amaia García-Bilbao1, Rubén Armañanzas2, Ziortza Ispizua1, Begoña Calvo3, Ana Alonso-Varona4, Iñaki Inza5, Pedro Larrañaga2, Guillermo López-Vivanco3, Blanca Suárez-Merino1 and Mónica Betanzos1*

Author Affiliations

1 GAIKER Technology Centre, Parque Tecnológico, Edificio 202, 48170 Zamudio, (Bizkaia), Spain

2 Computational Intelligence Group, Departamento de Inteligencia Artificial, Universidad Politécnica de Madrid, Campus de Montegancedo, 28660 Boadilla del Monte, Spain

3 Medical Oncology Service, Hospital de Cruces, Plaza de Cruces s/n, 48903 Barakaldo, (Bizkaia), Spain

4 Department of Cell Biology and Histology, School of Medicine and Dentistry, University of the Basque Country, 48940 Leioa, (Bizkaia), Spain

5 Department of Computer Science and Artificial Intelligence, Computer Science Faculty, University of the Basque Country, 20018 San Sebastián, (Gipuzkoa), Spain

For all author emails, please log on.

BMC Cancer 2012, 12:43  doi:10.1186/1471-2407-12-43

Published: 26 January 2012



Malignancies arising in the large bowel cause the second largest number of deaths from cancer in the Western World. Despite progresses made during the last decades, colorectal cancer remains one of the most frequent and deadly neoplasias in the western countries.


A genomic study of human colorectal cancer has been carried out on a total of 31 tumoral samples, corresponding to different stages of the disease, and 33 non-tumoral samples. The study was carried out by hybridisation of the tumour samples against a reference pool of non-tumoral samples using Agilent Human 1A 60-mer oligo microarrays. The results obtained were validated by qRT-PCR. In the subsequent bioinformatics analysis, gene networks by means of Bayesian classifiers, variable selection and bootstrap resampling were built. The consensus among all the induced models produced a hierarchy of dependences and, thus, of variables.


After an exhaustive process of pre-processing to ensure data quality--lost values imputation, probes quality, data smoothing and intraclass variability filtering--the final dataset comprised a total of 8, 104 probes. Next, a supervised classification approach and data analysis was carried out to obtain the most relevant genes. Two of them are directly involved in cancer progression and in particular in colorectal cancer. Finally, a supervised classifier was induced to classify new unseen samples.


We have developed a tentative model for the diagnosis of colorectal cancer based on a biomarker panel. Our results indicate that the gene profile described herein can discriminate between non-cancerous and cancerous samples with 94.45% accuracy using different supervised classifiers (AUC values in the range of 0.997 and 0.955).