Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: UT-ORNL-KBRIN Bioinformatics Summit 2011

Open Access Open Badges Meeting abstract

Gene expression based prototype for automatic tumor prediction

Atiq Islam1, Khan M Iftekharuddin2* and Olusegun E George3

Author Affiliations

1 Ebay Applied Research, Ebay Inc., San Jose, CA 95125, USA

2 Department of Electrical and Computer Engineering, University of Memphis, Memphis, TN 38152, USA

3 Department of Mathematical Sciences, University of Memphis, Memphis, TN 38152 USA

For all author emails, please log on.

BMC Bioinformatics 2011, 12(Suppl 7):A15  doi:10.1186/1471-2105-12-S7-A15

The electronic version of this article is the complete one and can be found online at:

Published:5 August 2011

© 2011 Islam et al; licensee BioMed Central Ltd.

This is an open access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Automatic detection of tumors is a challenging task due to the heterogeneous phenotypic and genotypic behaviors of cells within tumor types [1-3]. In recent years, a number of research endeavors have been reported in literatures that exploit microarray gene expression data to predict tissue/tumor types with high confidence [3-14]. However, in predicting tissue types, the above mentioned works neither explicitly considered correlation among the genes nor the probable subgroups within the known groups. In this work, our primary objective is to develop an automated prediction scheme for tumors based on DNA microarray gene expressions of tissue samples.

Material and methods

The workflow to build the tumor prototypes is shown in Fig. 1. Considering various sources of variation in array measures, we estimate tumor-specific gene expression measures using a two-way ANOVA model. Then, marker genes are identified using Wilcoxon [15] and Kruskal-Wallis [16] test. We then group the highly correlated marker genes together. Then, we obtain eigen-gene expressions measures [10] from each individual gene group. At the end of this step, we replace the gene expression measurements with eigen-gene expression values that conserve correlations among the strongly correlated genes. We then divide the tissue samples of known tumor types into subgroups. The CS measure [17] is exploited to obtain the optimal number of gene groups and tissue subgroups within each tissue type. The centroids of these subgroups of tissue samples represent the prototype of the corresponding tumor type. Finally, any new tissue sample is predicted as the tumor type of the closest centroid.

thumbnailFigure 1. Simplified workflow to build the tumor prototypes.


To evaluate the proposed tumor prediction scheme, five different gene microarray datasets [3-5,7-9] are used, all of which were obtained using Affymetrix technology. We use leave-one-out cross validation method. Table 1 shows a summary of our experimental results for all the datasets. We provide relevant intermediate results along with the final classification accuracy. Finally, Table 2 shows the performance comparison between our proposed prediction scheme and the methods discussed in original works [3,5,7-9] wherein the corresponding datasets are published. We also compare our classification accuracies with those of a Supervised Clustering method [4] for completeness.

Table 1. Experimental results with different dataset.

Table 2. Comparison of methods.


In this work, we propose a novel, seamless, and integrated technique of automatic tumor detection using Affymetrix microarray gene expression data. We appropriately normalize the data by estimating tumor-specific gene expression measures using an ANOVA model. Furthermore, our novel tumor prediction scheme explores molecular information such as probable correlations among genes and probable unknown subgroups within known tumor types. We demonstrate the efficacy of our proposed scheme using five different Affymetrix gene expression datasets.


The research in this paper is supported in part through research grants [RG-01-0125, TG-04-0026] provided by the Whitaker Foundation with Khan M. Iftekharuddin as the principal investigator.


  1. NCI Brain Tumor Progress Review Group [http:/ / find_people/ groups/ brain_tumor_prg/ BTPRGReport.htm] webcite

  2. Yang Y, Guccione S, Bednarski MD: Comparing genomic and histologic correlations to radiographic changes in tumors: A murine SCC Vll model Study.

    Academic Radiology 2003, 10(10):1165-1175. PubMed Abstract | Publisher Full Text OpenURL

  3. Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Kim JY, Goumnerova LC, Black PM, Lau C, Allen JC, Zagzag D, Olson JM, Curran T, Wetmore C, Biegel JA, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis DN, Mesirov JP, Lander ES, Golub TR: Prediction of central nervous system embryonal tumor outcome based on gene expression.

    Nature 2002, 415:436-442. PubMed Abstract | Publisher Full Text OpenURL

  4. Dettling M, Buhlmann P: supervised clustering of genes.

    Genome Biology 2002, 3(12):1-15. BioMed Central Full Text OpenURL

  5. Alon U, Barkai N, Notterman D, Gish K, Ybarra S, Mack D, Levine A: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.

    Proceedings of National Academic of Science 1999, 96(12):6745-6750. Publisher Full Text OpenURL

  6. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburge DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling.

    Nature 2000, 403:503-511. PubMed Abstract | Publisher Full Text OpenURL

  7. Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing J, Caligiuri M, Bloomfield C, Lander E: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring.

    Science 1999, 286:531-537. PubMed Abstract | Publisher Full Text OpenURL

  8. West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson J, Marks J, Nevins J: Predicting the clinical status of human breast cancer by using gene expression profiles.

    Proc Natl Acad Sci 2001, 98:11462-11467. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  9. Singh D, Febbo P, Ross K, Jackson D, Manola J, Ladd C, Tamayo P, Renshaw A, D’Amico A, Richie J: Gene expression correlates of clinical prostate cancer behavior.

    Cancer Cell 2002, 1:203-209. PubMed Abstract | Publisher Full Text OpenURL

  10. Shen R, Ghosh D, Chinnaiyan A, Meng Z: Eigengene-based linear discriminant model for tumor classification using Gene expression microarray data.

    Bioinformatics 2006, 22(21):2635-2642. PubMed Abstract | Publisher Full Text OpenURL

  11. Sandberg R, Ernberg I: Assessment of tumor characteristic gene expression in cell lines using a tissue similarity index (TSI).

    Proceedings of the National Academy of Sciences. USA 2005, 102(6):2052-2057. Publisher Full Text OpenURL

  12. Poisson LM, Ghosh D: Statistical issues and analyses of in vivo and in vitro genomic data in order to identify clinically relevant profiles.

    Cancer Informatics 2007, 3:231-243. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  13. Fromke C, Horhorn LA, Kropt S: Nonparametric relevance-shifted multiple testing procedures for analysis of high-dimensional multivariate data with small sample sizes.

    BMC Bioinformatics 2008, 9:54. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  14. Islam A, Iftekharuddin KM, George EO: Class specific gene expression estimation and classification in microarray data.

    Proceedings of IEEE International Joint Conference on Neural Networks (IJCNN) 2008, 1678-1685. OpenURL

  15. Wilcoxon F: Individual comparisons by ranking methods.

    Biometrics 1945, 1:80-83. Publisher Full Text OpenURL

  16. NIST/SEMATECH e-Handbook of Statistical Methods [] webcite

  17. Chou C, Su M, Lai E: A new cluster validity measure for clusters with different densities.

    IASTED International Conference on Intelligent Systems and Control 2003, 276-281. OpenURL