Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Methodology article

Novel methods to identify biologically relevant genes for leukemia and prostate cancer from gene expression profiles

Austin H Chen1*, Yin-Wu Tsau2 and Ching-Heng Lin2

Author affiliations

1 Department of Medical Informatics, Tzu Chi University, No.701, Sec. 3, Jhongyang Rd. Hualien City, Hualien County 97004, Taiwan

2 Graduate Institute of Medical Informatics, Tzu Chi University, No.701, Sec. 3, Jhongyang Rd. Hualien City, Hualien County 97004, Taiwan

For all author emails, please log on.

Citation and License

BMC Genomics 2010, 11:274  doi:10.1186/1471-2164-11-274

Published: 30 April 2010

Abstract

Background

High-throughput microarray experiments now permit researchers to screen thousands of genes simultaneously and determine the different expression levels of genes in normal or cancerous tissues. In this paper, we address the challenge of selecting a relevant and manageable subset of genes from a large microarray dataset. Currently, most gene selection methods focus on identifying a set of genes that can further improve classification accuracy. Few or none of these small sets of genes, however, are biologically relevant (i.e. supported by medical evidence). To deal with this critical issue, we propose two novel methods that can identify biologically relevant genes concerning cancers.

Results

In this paper, we propose two novel techniques, entitled random forest gene selection (RFGS) and support vector sampling technique (SVST). Compared with results from six other methods developed in this paper, we demonstrate experimentally that RFGS and SVST can identify more biologically relevant genes in patients with leukemia or prostate cancer. Among the top 25 genes selected using SVST method, 15 genes were biologically relevant genes in patients with leukemia and 13 genes were biologically relevant genes in patients with prostate cancer. Meanwhile, the RFGS method, while less effective than SVST, still identified an average of 9 biologically relevant genes in both leukemia and prostate cancers. In contrast to traditional statistical methods, which only identify less than 8 genes in patients with leukemia and less than 8 genes in patients with prostate cancer, our methods yield significantly better results.

Conclusions

Our proposed SVST and RFGS methods are novel approaches that can identify a greater number of biologically relevant genes. These methods have been successfully applied to both leukemia and prostate cancers. Research in the fields of biology and medicine should benefit from the identification of biologically relevant genes by confirming recent discoveries in cancer research or suggesting new avenues for exploration.