This article is part of the supplement: SNP-SIG 2012: Identification and annotation of SNPs in the context of structure, function, and disease
WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation
1 Division of Informatics, Department of Pathology, University of Alabama at Birmingham, Birmingham AL, USA
2 S-IN Soluzioni Informatiche Srl, Vicenza, 36100, Italy
3 Department of Computer Science, University of Bologna, Bologna, 40126, Italy
4 Laboratory of Biocomputing, Department of Biology, University of Bologna, Bologna, 40126, Italy
5 Departments of Bioengineering and Genetics, Stanford University, Stanford, CA, USA
BMC Genomics 2013, 14(Suppl 3):S6 doi:10.1186/1471-2164-14-S3-S6Published: 28 May 2013
SNPs&GO is a method for the prediction of deleterious Single Amino acid Polymorphisms (SAPs) using protein functional annotation. In this work, we present the web server implementation of SNPs&GO (WS-SNPs&GO). The server is based on Support Vector Machines (SVM) and for a given protein, its input comprises: the sequence and/or its three-dimensional structure (when available), a set of target variations and its functional Gene Ontology (GO) terms. The output of the server provides, for each protein variation, the probabilities to be associated to human diseases.
The server consists of two main components, including updated versions of the sequence-based SNPs&GO (recently scored as one of the best algorithms for predicting deleterious SAPs) and of the structure-based SNPs&GO3d programs. Sequence and structure based algorithms are extensively tested on a large set of annotated variations extracted from the SwissVar database. Selecting a balanced dataset with more than 38,000 SAPs, the sequence-based approach achieves 81% overall accuracy, 0.61 correlation coefficient and an Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve of 0.88. For the subset of ~6,600 variations mapped on protein structures available at the Protein Data Bank (PDB), the structure-based method scores with 84% overall accuracy, 0.68 correlation coefficient, and 0.91 AUC. When tested on a new blind set of variations, the results of the server are 79% and 83% overall accuracy for the sequence-based and structure-based inputs, respectively.
WS-SNPs&GO is a valuable tool that includes in a unique framework information derived from protein sequence, structure, evolutionary profile, and protein function. WS-SNPs&GO is freely available at