Table 4

Run Time SAMQA on Hadoop vs. Standard Server

SAMQA Run

Single Core Server (in hours)

Hadoop Cluster (aggregate in hours)


1 Sample (23GB)

1460.22

18.25


10 Samples (405GB)

1615.00

20.19


100 Samples (3904GB)

14435.42

180.44


The SAMQA toolset consists of eleven different technical and biologically relevant tests run over each BAM file. Using the Hadoop cluster and data as referenced in Table 3, all of the tests can be run in parallel decreasing the overall time for analysis (files used for this test are listed in additional file 2).

Robinson et al. BMC Genomics 2011 12:419   doi:10.1186/1471-2164-12-419

Open Data