Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Research article

FISH Amyloid – a new method for finding amyloidogenic segments in proteins based on site specific co-occurence of aminoacids

Pawel Gasior and Malgorzata Kotulska*

Author Affiliations

Institute of Biomedical Engineering and Instrumentation, Wroclaw University of Technology, 50-370 Wroclaw, Poland

For all author emails, please log on.

BMC Bioinformatics 2014, 15:54  doi:10.1186/1471-2105-15-54

Published: 24 February 2014

Abstract

Background

Amyloids are proteins capable of forming fibrils whose intramolecular contact sites assume densely packed zipper pattern. Their oligomers can underlie serious diseases, e.g. Alzheimer’s and Parkinson’s diseases. Recent studies show that short segments of aminoacids can be responsible for amyloidogenic properties of a protein. A few hundreds of such peptides have been experimentally found but experimental testing of all candidates is currently not feasible. Here we propose an original machine learning method for classification of aminoacid sequences, based on discovering a segment with a discriminative pattern of site-specific co-occurrences between sequence elements. The pattern is based on the positions of residues with correlated occurrence over a sliding window of a specified length. The algorithm first recognizes the most relevant training segment in each positive training instance. Then the classification is based on maximal distances between co-occurrence matrix of the relevant segments in positive training sequences and the matrix from negative training segments. The method was applied for studying sequences of aminoacids with regard to their amyloidogenic properties.

Results

Our method was first trained on available datasets of hexapeptides with the amyloidogenic classification, using 5 or 6-residue sliding windows. Depending on the choice of training and testing datasets, the area under ROC curve obtained the value up to 0.80 for experimental, and 0.95 for computationally generated (with 3D profile method) datasets. Importantly, the results on 5-residue segments were not significantly worse, although the classification required that algorithm first recognized the most relevant training segments. The dataset of long sequences, such as sup35 prion and a few other amyloid proteins, were applied to test the method and gave encouraging results. Our web tool FISH Amyloid was trained on all available experimental data 4-10 residues long, offers prediction of amyloidogenic segments in protein sequences.

Conclusions

We proposed a new original classification method which recognizes co-occurrence patterns in sequences. The method reveals characteristic classification pattern of the data and finds the segments where its scoring is the strongest, also in long training sequences. Applied to the problem of amyloidogenic segments recognition, it showed a good potential for classification problems in bioinformatics.

Keywords:
Machine learning; Amyloid; Intramolecular contact sites; Hot spot