Figure 1.

Fold increase in speed (ratio of user wall times) of R code from code changes to sequential code (in a) and parallelization (a and b). a) Timings from functions "gdcvpl" (original code) and its equivalent "tauBestP" (SignS), which use cross-validation to find the best parameters. b, c, d) Timings using analysis that include cross-validation of the final model. Numbers on top of points: user wall times in seconds. Benchmarks obtained in an otherwise idle cluster with 30 nodes, each with two dual-core AMD Opteron 2.2 GHz CPUs and 6 GB RAM, running Debian GNU/Linux and a stock 2.6.8 kernel, version 7.1.2 of LAM/MPI and version 2.1.4 (patched) of R. DLBCL data set from [4]; when number of arrays, n, ≤ 160 and number of genes, p, ≤ 7399, we use the first n arrays and the first p genes of the data set. For number of genes p > 7399 we expand the data set creating new genes from the previous (real) ones with Gaussian noise added.

Diaz-Uriarte BMC Bioinformatics 2008 9:30   doi:10.1186/1471-2105-9-30
