This article is part of the supplement: Selected articles from the First IEEE International Conference on Computational Advances in Bio and medical Sciences (ICCABS 2011): Bioinformatics
Multi-dimensional characterization of electrostatic surface potential computation on graphics processors
1 Department of Computer Science, Virginia Tech, Blacksburg, VA 24060, USA
2 Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA 24061, USA
3 Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061, USA
BMC Bioinformatics 2012, 13(Suppl 5):S4 doi:10.1186/1471-2105-13-S5-S4Published: 12 April 2012
Calculating the electrostatic surface potential (ESP) of a biomolecule is critical towards understanding biomolecular function. Because of its quadratic computational complexity (as a function of the number of atoms in a molecule), there have been continual efforts to reduce its complexity either by improving the algorithm or the underlying hardware on which the calculations are performed.
We present the combined effect of (i) a multi-scale approximation algorithm, known as hierarchical charge partitioning (HCP), when applied to the calculation of ESP and (ii) its mapping onto a graphics processing unit (GPU). To date, most molecular modeling algorithms perform an artificial partitioning of biomolecules into a grid/lattice on the GPU. In contrast, HCP takes advantage of the natural partitioning in biomolecules, which in turn, better facilitates its mapping onto the GPU. Specifically, we characterize the effect of known GPU optimization techniques like use of shared memory. In addition, we demonstrate how the cost of divergent branching on a GPU can be amortized across algorithms like HCP in order to deliver a massive performance boon.
We accelerated the calculation of ESP by 25-fold solely by parallelization on the GPU. Combining GPU and HCP, resulted in a speedup of at most 1,860-fold for our largest molecular structure. The baseline for these speedups is an implementation that has been hand-tuned SSE-optimized and parallelized across 16 cores on the CPU. The use of GPU does not deteriorate the accuracy of our results.