Email updates

Keep up to date with the latest news and content from BMC Research Notes and BioMed Central.

Open Access Technical Note

hPDB – Haskell library for processing atomic biomolecular structures in protein data bank format

Michał Jan Gajda

Author Affiliations

NMR-2, Max Planck Institute for Biophysical Chemistry, Am Faßberg 11, Göttingen, Germany

BMC Research Notes 2013, 6:483  doi:10.1186/1756-0500-6-483

Published: 23 November 2013

Abstract

Background

Protein DataBank file format is used for the majority of biomolecular data available today. Haskell is a lazy functional language that enjoys a high-level class-based type system, a growing collection of useful libraries and a reputation for efficiency.

Findings

I present a fast library for processing biomolecular data in the Protein Data Bank format. I present benchmarks indicating that this library is faster than other frequently used Protein Data Bank parsing programs. The proposed library also features a convenient iterator mechanism, and a simple API modeled after BioPython.

Conclusion

I set a new standard for convenience and efficiency of Protein Data Bank processing in a Haskell library, and release it to open source.

Keywords:
Structural biology; Protein DataBank file format; Parallel parser; Parser efficiency; Column-based parsing