Table 2 

Empirically estimated effects and covariance 

p 
Bayes Acc. 
n 
Prev. 
%t 
Full data Accuracy 
Opt. Vs. t = 2/3 
Opt. Vs. t = 1/2 


0.9 
0.962 
240 
50% 
58.3 
0.961 
0.001 
0.002 
0.6 
0.861 
240 
50% 
54.2 
0.860 
0.003 
0.002 


Simulation results based on empirical estimates of covariance matrix and effect sizes. Columns are: p is the weight on a diagonal matrix, Bayes Acc. is the optimal accuracy possible, n is the total sample size, Prev. is the prevalence from the most prevalent group, %t is the optimal allocation proportion to training, Full data Accuracy is the mean accuracy when n = 240, and Opt. vs t = 2/3 is the root mean squared difference (RMSD) for the optimal rule and the 2/3 rdstotraining rule, and Opt vs t = 1/2 is the RMSD between the optimal rule and the 1/2totraining rule. Sample covariance matrix S calculated from [12]. Effect sizes are estimated by the Empirical Bayes method of [10] with effect sizes shrunk to 80% of the empirical size. We followed methods similar to those previously proposed ([16], [17], [18]) to obtain nonsingular covariance matrix estimates, namely , where diag(S) is a matrix of zero's and diagonal elements of S. Bayes accuracy is the optimal accuracy for a linear classifier in the population, which is (e.g., [13] where is a vector of halfdistances between the class means. The number of informative genes was selected to achieve realistic Bayes (optimal) accuracies, so that all other gene effects were set to zero. Genes with largest standardized fold changes were selected as informative. 

Dobbin and Simon BMC Medical Genomics 2011 4:31 doi:10.1186/17558794431 