Open Access Highly Accessed Open Badges Methodology article

Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm

Kun-Huang Chen1*, Kung-Jeng Wang1, Min-Lung Tsai2, Kung-Min Wang3, Angelia Melani Adrian1, Wei-Chung Cheng45, Tzu-Sen Yang67, Nai-Chia Teng8, Kuo-Pin Tan9 and Ku-Shang Chang2

Author Affiliations

1 Department of Industrial Management, National Taiwan University of Science and Technology, Taipei 106, Taiwan, R.O.C

2 Department of Food Science, Yuanpei University, No. 306, Yuanpei Street, Hsinchu 300, Taiwan, R.O.C

3 Department of Surgery, Shin-Kong Wu Ho-Su Memorial Hospital, Taipei, Taiwan, R.O.C

4 Pediatric Neurosurgery, Department of Surgery, Cheng Hsin General Hospital, Taipei 11220, Taiwan, R.O.C

5 Genomic Research Center, National Yang-Ming University, Taipei 11221, Taiwan, R.O.C

6 School of Dental Technology, Taipei Medical University, Taipei 110, Taiwan, R.O.C

7 Taiwan Research Center for Biomedical Implants and Microsurgery Devices, Taipei Medical University Taipei 110, Taiwan, R.O.C

8 School of Dentistry, College of Oral Medicine, Taipei Medical University, Taipei, Taiwan, R.O.C

9 MBA, School of Management, National Taiwan University of Science and Technology, Taipei 106, Taiwan, R.O.C

For all author emails, please log on.

BMC Bioinformatics 2014, 15:49  doi:10.1186/1471-2105-15-49

Published: 20 February 2014



In the application of microarray data, how to select a small number of informative genes from thousands of genes that may contribute to the occurrence of cancers is an important issue. Many researchers use various computational intelligence methods to analyzed gene expression data.


To achieve efficient gene selection from thousands of candidate genes that can contribute in identifying cancers, this study aims at developing a novel method utilizing particle swarm optimization combined with a decision tree as the classifier. This study also compares the performance of our proposed method with other well-known benchmark classification methods (support vector machine, self-organizing map, back propagation neural network, C4.5 decision tree, Naive Bayes, CART decision tree, and artificial immune recognition system) and conducts experiments on 11 gene expression cancer datasets.


Based on statistical analysis, our proposed method outperforms other popular classifiers for all test datasets, and is compatible to SVM for certain specific datasets. Further, the housekeeping genes with various expression patterns and tissue-specific genes are identified. These genes provide a high discrimination power on cancer classification.

Gene expression; Cancer; Particle swarm optimization; Decision tree classifier