Email updates

Keep up to date with the latest news and content from BMC Research Notes and BioMed Central.

Open Access Open Badges Technical Note

hPDB – Haskell library for processing atomic biomolecular structures in protein data bank format

Michał Jan Gajda

Author Affiliations

NMR-2, Max Planck Institute for Biophysical Chemistry, Am Faßberg 11, Göttingen, Germany

BMC Research Notes 2013, 6:483  doi:10.1186/1756-0500-6-483

Published: 23 November 2013



Protein DataBank file format is used for the majority of biomolecular data available today. Haskell is a lazy functional language that enjoys a high-level class-based type system, a growing collection of useful libraries and a reputation for efficiency.


I present a fast library for processing biomolecular data in the Protein Data Bank format. I present benchmarks indicating that this library is faster than other frequently used Protein Data Bank parsing programs. The proposed library also features a convenient iterator mechanism, and a simple API modeled after BioPython.


I set a new standard for convenience and efficiency of Protein Data Bank processing in a Haskell library, and release it to open source.

Structural biology; Protein DataBank file format; Parallel parser; Parser efficiency; Column-based parsing