Additional File 1.

Example of the scoring procedure for two genes in a simulated gene expression data set. For each gene within the data set, a Pearson's correlation coefficient was computed for that gene vs. every other gene in the data set. In addition, a p-value for the univariate hazard ratio was also calculated for each gene. A scatter plot was constructed to compare the log of the p-value for the univariate hazard ratio for all genes versus their correlation to the individual gene being scored. Each graph shown includes only those genes that are positively correlated with the gene being scored. Each point on the graph represents the data for one of these positively correlated genes. The values for the covariance (cov) and the Pearson's correlation coefficient (corr), both computed using the data shown on the graphs, as well as the composite score are shown on the figures. The composite score was computed as follows:

Score = AbsoluteValue(Cov) * Corr

This approach was repeated for every gene in the data set. A. Scatterplot for a gene that is highly correlated with a set of genes whose expression is also associated with outcome. This is the top-scoring gene in the simulated data set. Note that many genes with a high correlation with the gene of interest also have a small significant hazards ratio p-value. B. Scatterplot for a gene receiving a low score that is not correlated with genes associated with the outcome.

Format: TIFF Size: 1.8MB Download file

Mosley et al. BMC Medical Genomics 2008 1:11   doi:10.1186/1755-8794-1-11