Identification of potential gene switches for breast cancer. We analyzed the integrated dataset (E-TABM-185) that contains 5,896 samples from about 300 different conditions to search for bimodality in the gene expression profiles. Genes are ranked based on their ΔAIC calculations, which represent the significance of bimodality in their expression profiles. The top 10%, or about 2000 genes that have the highest ΔAIC values are selected to compute the separation D with respect to breast cancer, among which 17 genes are discovered to express at a distinctive state in breast cancer as compared with all other conditions. An independent dataset (GSE15852, the dotted rectangle box in the Figure) is then used to examine the expression profiles of this 17 genes. The dataset has 43 pairs of samples, each pair consists of a tumor tissue and its adjacent non-tumorous tissue from the same patient. 12 of the 17 genes show different distribution between the breast cancer samples and their paired normal samples. These 12 genes (ESR1, SPDEF, IRX5, ERBB3, ERBB2, CRABP2, RAB25, FXYD3, TACSTD2, DSP, AGR2, CDH1) fell into two types of expression patterns. One type of genes, Type 1, shows bimodality within the breast cancer samples, and they are differentially expressed in some but not all of the paired dataset of breast cancer and normal samples. In other words with Type 1, the normal samples are in the OFF mode while the breast cancer samples contain both ON and OFF states. The other type of gene switch, Type 2, shows predominantly one modality in the breast cancer samples (ON) vs. in normal samples (OFF), thus the genes are differentially expressed in almost all breast cancer/normal pairs.
Wu et al. BMC Genomics 2011 12:547 doi:10.1186/1471-2164-12-547