Open Access Highly Accessed Research article

Two Pfam protein families characterized by a crystal structure of protein lpg2210 from Legionella pneumophila

Penelope Coggill12*, Ruth Y Eberhardt12, Robert D Finn3, Yuanyuan Chang45, Lukasz Jaroszewski45, Adam Godzik45, Debanu Das56, Qingping Xu56, Herbert L Axelrod56, L Aravind7, Alexey G Murzin8 and Alex Bateman2

  • * Corresponding author: Penelope Coggill

  • † Equal contributors

Author Affiliations

1 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK

2 European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK

3 Howard Hughes Medical Institute, Janelia Farm Research Campus, 19700 Helix Drive, Ashburn VA 20147, USA

4 Program on Bioinformatics and Systems Biology, Sanford-Burnham Medical Research Institute, La Jolla, CA 92037, USA

5 Joint Center for Structural Genomics, SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA

6 Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA

7 National Center for Biotechnology Information, National Library of Medicine, Building 38A, Bethesda, MD 20894, USA

8 MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK

For all author emails, please log on.

BMC Bioinformatics 2013, 14:265  doi:10.1186/1471-2105-14-265

Published: 3 September 2013



Every genome contains a large number of uncharacterized proteins that may encode entirely novel biological systems. Many of these uncharacterized proteins fall into related sequence families. By applying sequence and structural analysis we hope to provide insight into novel biology.


We analyze a previously uncharacterized Pfam protein family called DUF4424 [Pfam:PF14415]. The recently solved three-dimensional structure of the protein lpg2210 from Legionella pneumophila provides the first structural information pertaining to this family. This protein additionally includes the first representative structure of another Pfam family called the YARHG domain [Pfam:PF13308]. The Pfam family DUF4424 adopts a 19-stranded beta-sandwich fold that shows similarity to the N-terminal domain of leukotriene A-4 hydrolase. The YARHG domain forms an all-helical domain at the C-terminus. Structure analysis allows us to recognize distant similarities between the DUF4424 domain and individual domains of M1 aminopeptidases and tricorn proteases, which form massive proteasome-like capsids in both archaea and bacteria.


Based on our analyses we hypothesize that the DUF4424 domain may have a role in forming large, multi-component enzyme complexes. We suggest that the YARGH domain may play a role in binding a moiety in proximity with peptidoglycan, such as a hydrophobic outer membrane lipid or lipopolysaccharide.

Domain of unknown function; Protein family; Protein structure; DUF4424; YARHG domain; Sequence analysis