Table 1

Comparison of search times for standard X!Tandem and Hydra
Mode Scans Nodes (Cores) DB Name Proteins (K) Peptides (M) Dot product (M) Tim (min)
Hadoop 16000 43 (344) ecoli 5.4 1.3 164 9.8
Hadoop 256000 43 (344) ecoli 5.4 1.3 23395 338
Tandem 4663 1 (4) human 222 168 477 29
Hadoop 4663 43 (344) human 222 168 477 4.7
Tandem 184880 1 (4) nr 4370 692 3291 2280
Hadoop 184880 43 (344) nr 4370 692 3291 15.4
Tandem 184880 1 (4) nr 16392 1248 13167 8410
Hadoop 184880 43 (344) nr 16392 1248 13167 52.7

Example of comparison of run time for different complexities of search using the standard X!Tandem implementation and Hydra. The scans columns gives the number of spectra searched against, the Nodes column is the number of resources used (the first number of the number of machines, the second number is the number of total cores), the database name is the species database used, the Database Proteins is the number of proteins in the database, the dot product is the number of actual calculations. The times show that Hydra, unlike X!Tandem, is able to scale nearly linearly with the size of the problem. However, due to the startup costs associated with Hydra it is not suited for small searches. The PRIDE accession numbers for the spectra used were 10295 and 7962.

Lewis et al.

Lewis et al. BMC Bioinformatics 2012 13:324   doi:10.1186/1471-2105-13-324

Open Data