Overview of the data analysis process. Genomic data was collected from TCGA and Broad FireHose websites. Samples have complete data for mutations, copy number alteration, gene expression and methylation. Gene expression from microarrays and RNASeq were combined using principal component analysis into a single measure for each gene and then concatenated to the data for the other assays. Missing data was imputed using the median value across samples. Finally, regularized regression (elastic-net) was used to identify a minimal set of features that delineated clinical stage, the extent of tumor invasion into the colon, metastasis in lymph nodes, metastasis in other organs and microsatellite instability (MSI).
Lee et al. BMC Medical Genomics 2013 6:54 doi:10.1186/1755-8794-6-54