Table 3

Run Time for SAMQA on Hadoop

Validation Test

One Patient, aggregate time in minutes (over 23GB)

Ten Patients, aggregate time in minutes (over 405GB)

One Hundred Patients, aggregate time in minutes (over 3,904GB)


Mapping Ratio Test

1.10

6.98

66.91


Sequence Validation

5.40

66.11

621.54


Read Group Validation

8.40

86.38

670.79


Mapping Quality Test

9.03

57.27

706.02


Read Statistics

11.28

67.49

753.00


File Structure Validation

15.10

59.57

700.67


Chromosome Check

63.15

102.02

768.37


Pearson Coverage Correlation

279.41

280.50

3211.33


Pearson Mapping Quality Correlation

702.28

484.92

3327.94


This table lists required time to completion at multiples of ten samples over an 80-core computational cluster running Hadoop. Each validation step includes a full Map-and-Reduce pass. All times are given in aggregate time to completion, assuming each file is scheduled for analysis after the previous one completes. The files selected are listed in additional file 2. File size is the only consideration for speed comparisons.

Robinson et al. BMC Genomics 2011 12:419   doi:10.1186/1471-2164-12-419

Open Data