Evaluation of gene importance in microarray data based upon probability of selection
1 Pacific Tuberculosis and Cancer Research Organization, Pasadena, California, USA
2 University of Florida, Gainesville, Florida, USA
BMC Bioinformatics 2005, 6:67 doi:10.1186/1471-2105-6-67Published: 22 March 2005
Microarray devices permit a genome-scale evaluation of gene function. This technology has catalyzed biomedical research and development in recent years. As many important diseases can be traced down to the gene level, a long-standing research problem is to identify specific gene expression patterns linking to metabolic characteristics that contribute to disease development and progression. The microarray approach offers an expedited solution to this problem. However, it has posed a challenging issue to recognize disease-related genes expression patterns embedded in the microarray data. In selecting a small set of biologically significant genes for classifier design, the nature of high data dimensionality inherent in this problem creates substantial amount of uncertainty.
Here we present a model for probability analysis of selected genes in order to determine their importance. Our contribution is that we show how to derive the P value of each selected gene in multiple gene selection trials based on different combinations of data samples and how to conduct a reliability analysis accordingly. The importance of a gene is indicated by its associated P value in that a smaller value implies higher information content from information theory. On the microarray data concerning the subtype classification of small round blue cell tumors, we demonstrate that the method is capable of finding the smallest set of genes (19 genes) with optimal classification performance, compared with results reported in the literature.
In classifier design based on microarray data, the probability value derived from gene selection based on multiple combinations of data samples enables an effective mechanism for reducing the tendency of fitting local data particularities.