Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Selected Articles on Computational Vaccinology

Open Access Research

Evaluation and integration of existing methods for computational prediction of allergens

Jing Wang12, Yabin Yu3, Yunan Zhao4, Dabing Zhang1 and Jing Li25*

Author Affiliations

1 Bor Luh Food Safety Center, National Center for Molecular Characterization of Genetically Modified Organisms, State Key Laboratory of Hybrid Rice, School of Life Science and Biotechnology, Shanghai Jiao Tong University, China

2 Department of Bioinformatics & Biostatistics, School of Life Science and Biotechnology, Shanghai Jiao Tong University, China

3 School of Medicine, Shanghai Jiao Tong University, China

4 School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, China

5 Shanghai Center for Bioinformation Technology, China

For all author emails, please log on.

BMC Bioinformatics 2013, 14(Suppl 4):S1  doi:10.1186/1471-2105-14-S4-S1

Published: 8 March 2013



Allergy involves a series of complex reactions and factors that contribute to the development of the disease and triggering of the symptoms, including rhinitis, asthma, atopic eczema, skin sensitivity, even acute and fatal anaphylactic shock. Prediction and evaluation of the potential allergenicity is of importance for safety evaluation of foods and other environment factors. Although several computational approaches for assessing the potential allergenicity of proteins have been developed, their performance and relative merits and shortcomings have not been compared systematically.


To evaluate and improve the existing methods for allergen prediction, we collected an up-to-date definitive dataset consisting of 989 known allergens and massive putative non-allergens. The three most widely used allergen computational prediction approaches including sequence-, motif- and SVM-based (Support Vector Machine) methods were systematically compared using the defined parameters and we found that SVM-based method outperformed the other two methods with higher accuracy and specificity. The sequence-based method with the criteria defined by FAO/WHO (FAO: Food and Agriculture Organization of the United Nations; WHO: World Health Organization) has higher sensitivity of over 98%, but having a low specificity. The advantage of motif-based method is the ability to visualize the key motif within the allergen. Notably, the performances of the sequence-based method defined by FAO/WHO and motif eliciting strategy could be improved by the optimization of parameters. To facilitate the allergen prediction, we integrated these three methods in a web-based application proAP, which provides the global search of the known allergens and a powerful tool for allergen predication. Flexible parameter setting and batch prediction were also implemented. The proAP can be accessed at webcite.


This study comprehensively evaluated sequence-, motif- and SVM-based computational prediction approaches for allergens and optimized their parameters to obtain better performance. These findings may provide helpful guidance for the researchers in allergen-prediction. Furthermore, we integrated these methods into a web application proAP, greatly facilitating users to do customizable allergen search and prediction.