Assessing the ability of sequence-based methods to provide functional insight within membrane integral proteins: a case study analyzing the neurotransmitter/Na+ symporter family
- Equal contributors
1 Department of Computer Science and Bioinformatics Research Center, University of North Carolina at Charlotte, Charlotte, NC 28262, USA
2 Biological Sciences Department, California State Polytechnic University, Pomona, CA 91768, USA
3 Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA
BMC Bioinformatics 2007, 8:397 doi:10.1186/1471-2105-8-397Published: 17 October 2007
Efforts to predict functional sites from globular proteins is increasingly common; however, the most successful of these methods generally require structural insight. Unfortunately, despite several recent technological advances, structural coverage of membrane integral proteins continues to be sparse. ConSequently, sequence-based methods represent an important alternative to illuminate functional roles. In this report, we critically examine the ability of several computational methods to provide functional insight within two specific areas. First, can phylogenomic methods accurately describe the functional diversity across a membrane integral protein family? And second, can sequence-based strategies accurately predict key functional sites? Due to the presence of a recently solved structure and a vast amount of experimental mutagenesis data, the neurotransmitter/Na+ symporter (NSS) family is an ideal model system to assess the quality of our predictions.
The raw NSS sequence dataset contains 181 sequences, which have been aligned by various methods. The resultant phylogenetic trees always contain six major subfamilies are consistent with the functional diversity across the family. Moreover, in well-represented subfamilies, phylogenetic clustering recapitulates several nuanced functional distinctions. Functional sites are predicted using six different methods (phylogenetic motifs, two methods that identify subfamily-specific positions, and three different conservation scores). A canonical set of 34 functional sites identified by Yamashita et al. within the recently solved LeuTAa structure is used to assess the quality of the predictions, most of which are predicted by the bioinformatic methods. Remarkably, the importance of these sites is largely confirmed by experimental mutagenesis. Furthermore, the collective set of functional site predictions qualitatively clusters along the proposed transport pathway, further demonstrating their utility. Interestingly, the various prediction schemes provide results that are predominantly orthogonal to each other. However, when the methods do provide overlapping results, specificity is shown to increase dramatically (e.g., sites predicted by any three methods have both accuracy and coverage greater than 50%).
The results presented herein clearly establish the viability of sequence-based bioinformatic strategies to provide functional insight within the NSS family. As such, we expect similar bioinformatic investigations will streamline functional investigations within membrane integral families in the absence of structure.