Open Access Highly Accessed Research article

Structural genomics analysis of uncharacterized protein families overrepresented in human gut bacteria identifies a novel glycoside hydrolase

Anna Sheydina12, Ruth Y Eberhardt34, Daniel J Rigden5, Yuanyuan Chang12, Zhanwen Li2, Christian C Zmasek2, Herbert L Axelrod16 and Adam Godzik127*

Author Affiliations

1 Joint Center for Structural Genomics, 10550 North Torrey Pines Road, BCC-206, La Jolla, California 92037, USA

2 Bioinformatics and Systems Biology Program, Sanford-Burnham Medical Research Institute, La Jolla, CA 92037, USA

3 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK

4 European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK

5 Institute of Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK

6 Stanford Synchrotron Radiation Lightsource, Menlo Park, CA 94025, USA

7 Center for Research in Biological Systems, University of California, 9500 Gilman Dr., La Jolla, CA 92093-0446, USA

For all author emails, please log on.

BMC Bioinformatics 2014, 15:112  doi:10.1186/1471-2105-15-112

Published: 17 April 2014

Abstract

Background

Bacteroides spp. form a significant part of our gut microbiome and are well known for optimized metabolism of diverse polysaccharides. Initial analysis of the archetypal Bacteroides thetaiotaomicron genome identified 172 glycosyl hydrolases and a large number of uncharacterized proteins associated with polysaccharide metabolism.

Results

BT_1012 from Bacteroides thetaiotaomicron VPI-5482 is a protein of unknown function and a member of a large protein family consisting entirely of uncharacterized proteins. Initial sequence analysis predicted that this protein has two domains, one on the N- and one on the C-terminal. A PSI-BLAST search found over 150 full length and over 90 half size homologs consisting only of the N-terminal domain. The experimentally determined three-dimensional structure of the BT_1012 protein confirms its two-domain architecture and structural analysis of both domains suggests their specific functions. The N-terminal domain is a putative catalytic domain with significant similarity to known glycoside hydrolases, the C-terminal domain has a beta-sandwich fold typically found in C-terminal domains of other glycosyl hydrolases, however these domains are typically involved in substrate binding. We describe the structure of the BT_1012 protein and discuss its sequence-structure relationship and their possible functional implications.

Conclusions

Structural and sequence analyses of the BT_1012 protein identifies it as a glycosyl hydrolase, expanding an already impressive catalog of enzymes involved in polysaccharide metabolism in Bacteroides spp. Based on this we have renamed the Pfam families representing the two domains found in the BT_1012 protein, PF13204 and PF12904, as putative glycoside hydrolase and glycoside hydrolase-associated C-terminal domain respectively.

Keywords:
Glycoside hydrolase; Carbohydrate metabolism; 3D structure; Protein family; Protein function prediction; Domain of unknown function; DUF