Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

This article is part of the supplement: SNP-SIG 2012: Identification and annotation of SNPs in the context of structure, function, and disease

Open Access Research

WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation

Emidio Capriotti1*, Remo Calabrese2, Piero Fariselli3, Pier Luigi Martelli4, Russ B Altman5 and Rita Casadio4*

Author Affiliations

1 Division of Informatics, Department of Pathology, University of Alabama at Birmingham, Birmingham AL, USA

2 S-IN Soluzioni Informatiche Srl, Vicenza, 36100, Italy

3 Department of Computer Science, University of Bologna, Bologna, 40126, Italy

4 Laboratory of Biocomputing, Department of Biology, University of Bologna, Bologna, 40126, Italy

5 Departments of Bioengineering and Genetics, Stanford University, Stanford, CA, USA

For all author emails, please log on.

BMC Genomics 2013, 14(Suppl 3):S6  doi:10.1186/1471-2164-14-S3-S6

Published: 28 May 2013

Abstract

Background

SNPs&GO is a method for the prediction of deleterious Single Amino acid Polymorphisms (SAPs) using protein functional annotation. In this work, we present the web server implementation of SNPs&GO (WS-SNPs&GO). The server is based on Support Vector Machines (SVM) and for a given protein, its input comprises: the sequence and/or its three-dimensional structure (when available), a set of target variations and its functional Gene Ontology (GO) terms. The output of the server provides, for each protein variation, the probabilities to be associated to human diseases.

Results

The server consists of two main components, including updated versions of the sequence-based SNPs&GO (recently scored as one of the best algorithms for predicting deleterious SAPs) and of the structure-based SNPs&GO3d programs. Sequence and structure based algorithms are extensively tested on a large set of annotated variations extracted from the SwissVar database. Selecting a balanced dataset with more than 38,000 SAPs, the sequence-based approach achieves 81% overall accuracy, 0.61 correlation coefficient and an Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve of 0.88. For the subset of ~6,600 variations mapped on protein structures available at the Protein Data Bank (PDB), the structure-based method scores with 84% overall accuracy, 0.68 correlation coefficient, and 0.91 AUC. When tested on a new blind set of variations, the results of the server are 79% and 83% overall accuracy for the sequence-based and structure-based inputs, respectively.

Conclusions

WS-SNPs&GO is a valuable tool that includes in a unique framework information derived from protein sequence, structure, evolutionary profile, and protein function. WS-SNPs&GO is freely available at

    http://snps.biofold.org/snps-and-go
.