Open Access Highly Accessed Open Badges Research article

The use of genetic programming in the analysis of quantitative gene expression profiles for identification of nodal status in bladder cancer

Anirban P Mitra1, Arpit A Almal2, Ben George3, David W Fry2, Peter F Lenehan2, Vincenzo Pagliarulo4, Richard J Cote1, Ram H Datar1* and William P Worzel2

  • * Corresponding author: Ram H Datar

  • † Equal contributors

Author Affiliations

1 Department of Pathology, University of Southern California Keck School of Medicine, 2011 Zonal Avenue, HMR 312, Los Angeles CA 90033, USA

2 Genetics Squared Inc., 210 South 5th Avenue, Suite A, Ann Arbor MI 48104, USA

3 Department of Internal Medicine, Gundersen Lutheran Medical Center, 1900 South Avenue, La Crosse WI 54601, USA

4 Dipartimento Emergenza e Trapianti d'Organo, Sezione di Urologia, Università di Bari, Piazza G. Cesare 11, Bari 70124, Italy

For all author emails, please log on.

BMC Cancer 2006, 6:159  doi:10.1186/1471-2407-6-159

Published: 16 June 2006



Previous studies on bladder cancer have shown nodal involvement to be an independent indicator of prognosis and survival. This study aimed at developing an objective method for detection of nodal metastasis from molecular profiles of primary urothelial carcinoma tissues.


The study included primary bladder tumor tissues from 60 patients across different stages and 5 control tissues of normal urothelium. The entire cohort was divided into training and validation sets comprised of node positive and node negative subjects. Quantitative expression profiling was performed for a panel of 70 genes using standardized competitive RT-PCR and the expression values of the training set samples were run through an iterative machine learning process called genetic programming that employed an N-fold cross validation technique to generate classifier rules of limited complexity. These were then used in a voting algorithm to classify the validation set samples into those associated with or without nodal metastasis.


The generated classifier rules using 70 genes demonstrated 81% accuracy on the validation set when compared to the pathological nodal status. The rules showed a strong predilection for ICAM1, MAP2K6 and KDR resulting in gene expression motifs that cumulatively suggested a pattern ICAM1>MAP2K6>KDR for node positive cases. Additionally, the motifs showed CDK8 to be lower relative to ICAM1, and ANXA5 to be relatively high by itself in node positive tumors. Rules generated using only ICAM1, MAP2K6 and KDR were comparably robust, with a single representative rule producing an accuracy of 90% when used by itself on the validation set, suggesting a crucial role for these genes in nodal metastasis.


Our study demonstrates the use of standardized quantitative gene expression values from primary bladder tumor tissues as inputs in a genetic programming system to generate classifier rules for determining the nodal status. Our method also suggests the involvement of ICAM1, MAP2K6, KDR, CDK8 and ANXA5 in unique mathematical combinations in the progression towards nodal positivity. Further studies are needed to identify more class-specific signatures and confirm the role of these genes in the evolution of nodal metastasis in bladder cancer.