ElliPro: a new structure-based tool for the prediction of antibody epitopes

Ponomarenko, Julia; Bui, Huynh-Hoa; Li, Wei; Fusseder, Nicholas; Bourne, Philip E; Sette, Alessandro; Peters, Bjoern

doi:10.1186/1471-2105-9-514

Software
Open access
Published: 02 December 2008

ElliPro: a new structure-based tool for the prediction of antibody epitopes

Julia Ponomarenko^1,2,
Huynh-Hoa Bui³,
Wei Li,
Nicholas Fusseder,
Philip E Bourne^1,2,
Alessandro Sette⁴ &
…
Bjoern Peters⁴

BMC Bioinformatics volume 9, Article number: 514 (2008) Cite this article

22k Accesses
849 Citations
7 Altmetric
Metrics details

Abstract

Background

Reliable prediction of antibody, or B-cell, epitopes remains challenging yet highly desirable for the design of vaccines and immunodiagnostics. A correlation between antigenicity, solvent accessibility, and flexibility in proteins was demonstrated. Subsequently, Thornton and colleagues proposed a method for identifying continuous epitopes in the protein regions protruding from the protein's globular surface. The aim of this work was to implement that method as a web-tool and evaluate its performance on discontinuous epitopes known from the structures of antibody-protein complexes.

Results

Here we present ElliPro, a web-tool that implements Thornton's method and, together with a residue clustering algorithm, the MODELLER program and the Jmol viewer, allows the prediction and visualization of antibody epitopes in a given protein sequence or structure. ElliPro has been tested on a benchmark dataset of discontinuous epitopes inferred from 3D structures of antibody-protein complexes. In comparison with six other structure-based methods that can be used for epitope prediction, ElliPro performed the best and gave an AUC value of 0.732, when the most significant prediction was considered for each protein. Since the rank of the best prediction was at most in the top three for more than 70% of proteins and never exceeded five, ElliPro is considered a useful research tool for identifying antibody epitopes in protein antigens. ElliPro is available at http://tools.immuneepitope.org/tools/ElliPro.

Conclusion

The results from ElliPro suggest that further research on antibody epitopes considering more features that discriminate epitopes from non-epitopes may further improve predictions. As ElliPro is based on the geometrical properties of protein structure and does not require training, it might be more generally applied for predicting different types of protein-protein interactions.

Background

An antibody epitope, aka B-cell epitope or antigenic determinant, is a part of an antigen recognized by either a particular antibody molecule or a particular B-cell receptor of the immune system [1]. For a protein antigen, an epitope may be either a short peptide from the protein sequence, called a continuous epitope, or a patch of atoms on the protein surface, called a discontinuous epitope. While continuous epitopes can be directly used for the design of vaccines and immunodiagnostics, the objective of discontinuous epitope prediction is to design a molecule that can mimic the structure and immunogenic properties of an epitope and replace it either in the process of antibody production–in this case an epitope mimic can be considered as a prophylactic or therapeutic vaccine–or antibody detection in medical diagnostics or experimental research [2, 3].

If continuous epitopes can be predicted using sequence-dependent methods built on available collections of immunogenic peptides (for review see [4]), discontinuous epitopes–that are mostly the case when a whole protein, pathogenic virus, or bacteria is recognized by the immune system–are difficult to predict or identify from functional assays without knowledge of a three-dimensional (3D) structure of a protein [5, 6]. The first attempts at epitope prediction based on 3D protein structure began in 1984 when a correlation was established between crystallographic temperature factors and several known continuous epitopes of tobacco mosaic virus protein, myoglobin and lysozyme [7]. A correlation between antigenicity, solvent accessibility, and flexibility of antigenic regions in proteins was also found [8]. Thornton and colleagues [9] proposed a method for identifying continuous epitopes in the protein regions protruding from the protein's globular surface. Regions with high protrusion index values were shown to correspond to the experimentally determined continuous epitopes in myoglobin, lysozyme and myohaemerythrin [9].

Here we present ElliPro (derived from Elli psoid and Pro trusion), a web-tool that implements a modified version of Thornton's method [9] and, together with a residue clustering algorithm, the MODELLER program [10] and the Jmol viewer, allows the prediction and visualization of antibody epitopes in protein sequences and structures. ElliPro has been tested on a benchmark dataset of epitopes inferred from 3D structures of antibody-protein complexes [11] and compared with six structure-based methods, including the only two existing methods developed specifically for epitope prediction, CEP [12] and DiscoTope [13]; two protein-protein docking methods, DOT [14] and PatchDock [15]; and two structure-based methods for protein-protein binding site prediction, PPI-PRED [16] and ProMate [17]. ElliPro is available at http://tools.immuneepitope.org/tools/ElliPro.

Implementation

The tool input

ElliPro is implemented as a web accessible application and accepts two types of input data: protein sequence or structure (Fig. 1, Step 1). In the first case, the user may input either a protein SwissProt/UniProt ID or a sequence in either FASTA format or single letter codes and select threshold values for BLAST e-value and the number of structural templates from PDB that will be used to model a 3D structure of the submitted sequence (Fig. 1, Step 2a). In the second case, the user may input either a four-character PDB ID or submit her own PDB file in PDB format (Fig. 1, Step 2b). If the submitted structure consists of more than one protein chain, ElliPro will ask the user to select the chain(s) upon which to base the calculation. The user can change threshold values on the parameters used by ElliPro for epitope prediction, namely, the minimum residue score (protrusion index), denoted here as S, between 0.5 and 1.0 and the maximum distance, denoted as R, in the range 4 – 8Å.

3D Structure Modeling

If a protein sequence is used as input, ElliPro searches for the protein or its homologues in PDB [18], using a BLAST search [19]. If a protein cannot be found in PDB that matches the BLAST criteria, MODELLER [10] is run to predict the protein 3D structure. The user may change the threshold values for BLAST e-value and a number of templates that MODELLER uses as an input (Fig. 1, Step 2a).

ElliPro Method

ElliPro implements three algorithms performing the following tasks: (i) approximation of the protein shape as an ellipsoid [20]; (ii) calculation of the residue protrusion index (PI) [9]; and (iii) clustering of neighboring residues based on their PI values.

Thornton's method for continuous epitope prediction was based on the two first algorithms and only considered Cα atoms [9]. It approximated the protein surface as an ellipsoid, which can vary in sizes to include different percentages of the protein atoms; for example, the 90% ellipsoid includes 90% of the protein atoms. For each residue, a protrusion index (PI) was defined as percentage of the protein atoms enclosed in the ellipsoid at which the residue first becomes lying outside the ellipsoid; for example, all residues that are outside the 90% ellipsoid will have PI = 9 (or 0.9 in ElliPro). In implementing the first two algorithms, ElliPro differs from Thornton's method by considering each residue's center of mass rather than its Cα atom.

The third algorithm for clustering residues defines a discontinuous epitope based on the threshold values for the protrusion index S and the distance R between each residue's centers of mass. All protein residues with a PI values greater than S are considered when calculating discontinuous epitopes. Clustering separate residues into discontinuous epitopes involves three steps that are recursively repeated until distinct clusters with no overlapping residues are formed. First, primary clusters are formed from single residues and their neighboring residues within the distance R. Second, secondary clusters are formed from primary clusters where at least three centers of mass are within the distance R from each other. Third, tertiary clusters are formed from secondary clusters which contain common residues. These tertiary clusters of residues represent distinct discontinuous epitopes predicted in the protein. The score for each epitope is defined as a PI value averaged over epitope residues.

3D visualization of Predicted Epitopes

An open-source molecular viewer Jmol [21] was used to visualize linear and discontinuous epitopes on the protein 3D structure. An example of epitope visualization is shown in Fig. 2.

Results and Discussion

For evaluation of ElliPro performance and comparison with other methods we used a previously established benchmark approach for discontinuous epitopes [11]. We tested ElliPro on a dataset of 39 epitopes present in 39 protein structures where only one discontinuous epitope was known based on 3D structures of two-chain antibody fragments with one-chain protein antigens [11].

Depending on the threshold values for parameters R and S, ElliPro predicted different number of epitopes in each protein; for an R of 6Å and S of 0.5, the average number of predicted epitopes in each protein analyzed was 4, with a variance from 2 to 8. For example, for Plasmodium vivax ookinete surface protein Pvs25 [PDB: 1Z3G, chain A], ElliPro predicted four epitopes with scores of 0.763, 0.701, 0.645, and 0.508, respectively (Fig. 2).

For each predicted epitope in each protein, we calculated the correctly (TP) and incorrectly predicted epitope residues (FN) and non-epitope residues, which were defined as all other protein residues (TN and FN). The statistical significance of a prediction, that is, the difference between observed and expected frequencies of an actual epitope/non-epitope residue in the predicted epitope/non-epitope, was determined using Fisher's exact test (right-tailed). The prediction was considered significant if the P-value was = 0.05. Then, for each prediction the following parameters were calculated:

Sensitivity (recall or true positive rate (TPR)) = TP/(TP + FN) – a proportion of correctly predicted epitope residues (TP) with respect to the total number of epitope residues (TP+FN).

Specificity (or 1 – false positive rate (FPR)) = 1 - FP/(TN + FP) – a proportion of correctly predicted non-epitope residues (TN) with respect to the total number of non-epitope residues (TN+FP).

Positive predictive value (PPV) (precision) = TP/(TP + FP) – a proportion of correctly predicted epitope residues (TP) with respect to the total number of predicted epitope residues (TP+FN).

Accuracy (ACC) = (TP + TN)/(TP + FN + FP + TN) – a proportion of correctly predicted epitope and non-epitope residues with respect to all residues.

Area under the ROC Curve (AUC) – area under a graph representing a dependency of TPR against FPR; that is, sensitivity against 1-specificity. The AUC gives the general performance of the method and is "equivalent to the probability that the classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance" [22].

For example, for the first predicted epitope in Plasmodium vivax ookinete surface protein Pvs25 [PDB:1Z3G, chain A] (Fig. 2), for an R of 6Å and S of 0.5, TP = 13, FP = 13, TN = 156, FN = 4, P-value = 5.55E-10, giving a sensitivity of 0.76, a specificity of 0.92, an accuracy of 0.91, and an AUC of 0.84. The results and detailed statistics of ElliPro performance for each epitope and other threshold values for R and S are provided in the supplementary materials [see Additional file 1].

The statistics averaged over all epitopes and overall statistics calculated from FP, FN, TP, and TN values summarized for the whole pool of epitope and non-epitope residues are presented in Table 1 and Fig. 3. The results for the methods other than ElliPro have been obtained as described in [11]. ElliPro performed best, by AUC values, with the score S set at 0.7 and the distance R set at 6Å when the prediction with the highest score was considered for each protein and with the score S set at 0.5 and the distance R set at 6Å when the best by significance or average prediction was taken into account. Results are described using these thresholds (Table 1, Fig. 3); the results at other threshold values are provided in the supplementary materials [see Additional file 1].

Table 1 Overall performance of ElliPro in comparison with other methods^#.

Full size table

ElliPro's top predictions, that are those with the highest scores, correlated poorly with the discontinuous epitopes known from 3D structures of antibody-protein complexes (Table 1, overall statistics, AUC = 0.523). DiscoTope and the first models from the docking methods performed better, giving AUC values above 0.6, whereas protein-protein binding site predicting methods, ProMate and PPI-PRED, performed worse. However, when the first predictions with the highest score were considered, ElliPro was the best among all the methods based on specificity (1-specificity = 0.047) and comparable with DiscoTope by precision (PPV = 0.158) (Table 1, overall statistics).

In a next set of metrics, we compared the performance between prediction methods when choosing the best hit within the top 10 predictions of each method. This approach takes into account that each antigen harbors multiple distinct binding sites for different antibodies. Therefore it is expected that the top predicted site is not necessarily recognized by the specific antibody used in the dataset. This comparison directly applies only to the docking methods DOT and PatchDock as well as ElliPro. For DiscoTope, only one epitope is predicted, while for CEP no ranking is available to identify the top 10 predictions.

The docking methods DOT and PatchDock have an intrinsic advantage in this comparison over ElliPro, because they use structures of both protein antigen and antibody from the same antibody-protein complex in order to predict binding sites. To our surprise, when the best significant prediction was considered for each protein, ElliPro nevertheless gave the highest AUC value of 0.732, the highest sensitivity of 0.601 and the second highest precision value of 0.29 among all the compared methods (Table 1; Fig. 3, red circle). The docking methods gave the AUC values of 0.693 for DOT and 0.656 for PatchDock, when also the best prediction of the top ten was considered (Table 1, overall statistics; Fig. 3). The average number of predicted epitopes for the analyzed proteins was four, with the rank of the best prediction at most fifth; for more than a half of proteins the rank was first or second, and the rank first, second, or third for more than 70% of all proteins [see Additional file 1].

ElliPro is based on simple concepts. First, regions protruding from the globular surface of the protein are more available for interaction with an antibody [9] and second those protrusions can be determined by treating the protein as a simple ellipsoid [20]. Obviously, this is not always the case, especially for multi-domain or large single-domain proteins. However, no correlation between the protein size, which varied from 51 to 429 residues with an average value of 171, or number of domains (8 proteins among the 39 analyzed contained more than one domain) and ElliPro performance was found (data not shown).

Conclusion

ElliPro is a web-based tool for the prediction of antibody epitopes in protein antigens of a given sequence or structure. It implements a previously developed method that represents the protein structure as an ellipsoid and calculates protrusion indexes for protein residues outside of the ellipsoid. ElliPro was tested on a benchmark dataset of discontinuous epitopes inferred from 3D structures of antibody-protein complexes. In comparison with six other structure-based methods that can be used for epitope prediction, ElliPro performed the best (AUC value of 0.732) when the most significant prediction was considered for each protein. Since the rank of the best prediction was at most three in more than 70% of proteins and never exceeded five, ElliPro is considered a potentially useful research tool for identifying antibody epitopes in protein antigens.

While ElliPro was tested on antibody-protein binding sites, it might be interesting to test it on other protein-protein interactions since it implements a method that is based on geometrical properties of protein structure and does not require training.

Comparison with DiscoTope, which is based on training and utilizes epitope features such as amino acid propensities, residue solvent accessibility, spatial distribution, and inter-molecular contacts, suggests that further research on antibody epitopes which considers more features that discriminate epitopes from non-epitopes may improve the prediction of antibody epitopes.

Availability and requirements

Project name: ElliPro
Project home page: http://tools.immuneepitope.org/tools/ElliPro
Operating system(s): Platform independent
Programming language: Java
Other requirements: None
License: None
Any restrictions to use by non-academics: None

Abbreviations

PI:: protrusion index
TP:: true positives
FP:: false positives
TN:: true negatives, FN: false negatives
ROC:: Receiver Operating Characteristics
AUC:: area under the ROC curve.

References

Peters B, Sidney J, Bourne P, Bui HH, Buus S, Doh G, Fleri W, Kronenberg M, Kubo R, Lund O, et al.: The design and implementation of the immune epitope database and analysis resource. Immunogenetics 2005, 57(5):326–336. 10.1007/s00251-005-0803-5
Article CAS PubMed Google Scholar
Bijker MS, Melief CJ, Offringa R, Burg SH: Design and development of synthetic peptide vaccines: past, present and future. Expert Rev Vaccines 2007, 6(4):591–603. 10.1586/14760584.6.4.591
Article CAS PubMed Google Scholar
Gomara MJ, Haro I: Synthetic peptides for the immunodiagnosis of human diseases. Curr Med Chem 2007, 14(5):531–546. 10.2174/092986707780059698
Article CAS PubMed Google Scholar
Greenbaum JA, Andersen PH, Blythe M, Bui HH, Cachau RE, Crowe J, Davies M, Kolaskar AS, Lund O, Morrison S, et al.: Towards a consensus on datasets and evaluation metrics for developing B-cell epitope prediction tools. J Mol Recognit 2007, 20(2):75–82. 10.1002/jmr.815
Article CAS PubMed Google Scholar
Laver WG, Air GM, Webster RG, Smith-Gill SJ: Epitopes on protein antigens: misconceptions and realities. Cell 1990, 61(4):553–556. 10.1016/0092-8674(90)90464-P
Article CAS PubMed Google Scholar
Van Regenmortel MHV: Mapping Epitope Structure and Activity: From One-Dimensional Prediction to Four-Dimensional Description of Antigenic Specificity. Methods 1996, 9(3):465–472. 10.1006/meth.1996.0054
Article CAS PubMed Google Scholar
Westhof E, Altschuh D, Moras D, Bloomer AC, Mondragon A, Klug A, Van Regenmortel MH: Correlation between segmental mobility and the location of antigenic determinants in proteins. Nature 1984, 311(5982):123–126. 10.1038/311123a0
Article CAS PubMed Google Scholar
Novotny J, Handschumacher M, Haber E, Bruccoleri RE, Carlson WB, Fanning DW, Smith JA, Rose GD: Antigenic determinants in proteins coincide with surface regions accessible to large probes (antibody domains). Proc Natl Acad Sci USA 1986, 83(2):226–230. 10.1073/pnas.83.2.226
Article PubMed Central CAS PubMed Google Scholar
Thornton JM, Edwards MS, Taylor WR, Barlow DJ: Location of 'continuous' antigenic determinants in the protruding regions of proteins. EMBO J 1986, 5(2):409–413.
PubMed Central CAS PubMed Google Scholar
Eswar NWB, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A: Comparative Protein Structure Modeling With MODELLER. In Current Protocols in Bioinformatics. 5.6.1–5.6.30. John Wiley & Sons, Inc; 2006.
Ponomarenko JV, Bourne PE: Antibody-protein interactions: benchmark datasets and prediction tools evaluation. BMC Struct Biol 2007, 7: 64. 10.1186/1472-6807-7-64
Article PubMed Central PubMed Google Scholar
Kulkarni-Kale U, Bhosle S, Kolaskar AS: CEP: a conformational epitope prediction server. Nucleic Acids Res 2005, 33(Web Server issue):W168–171. 10.1093/nar/gki460
Article PubMed Central CAS PubMed Google Scholar
Haste Andersen P, Nielsen M, Lund O: Prediction of residues in discontinuous B-cell epitopes using protein 3D structures. Protein Sci 2006, 15(11):2558–2567. 10.1110/ps.062405906
Article PubMed Central PubMed Google Scholar
Mandell JG, Roberts VA, Pique ME, Kotlovyi V, Mitchell JC, Nelson E, Tsigelny I, Ten Eyck LF: Protein docking using continuum electrostatics and geometric fit. Protein Eng 2001, 14(2):105–113. 10.1093/protein/14.2.105
Article CAS PubMed Google Scholar
Schneidman-Duhovny D, Inbar Y, Polak V, Shatsky M, Halperin I, Benyamini H, Barzilai A, Dror O, Haspel N, Nussinov R, et al.: Taking geometry to its edge: fast unbound rigid (and hinge-bent) docking. Proteins 2003, 52(1):107–112. 10.1002/prot.10397
Article CAS PubMed Google Scholar
Bradford WD Jr: Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics 2005, 21(8):1487–94. 10.1093/bioinformatics/bti242
Article CAS PubMed Google Scholar
Neuvirth H, Raz R, Schreiber G: ProMate: a structure based prediction program to identify the location of protein-protein binding sites. J Mol Biol 2004, 338(1):181–199. 10.1016/j.jmb.2004.02.040
Article CAS PubMed Google Scholar
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28(1):235–242. 10.1093/nar/28.1.235
Article PubMed Central CAS PubMed Google Scholar
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215(3):403–410.
Article CAS PubMed Google Scholar
Taylor WR, Thornton JM, Turnell WG: An ellipsoidal approximation of protein shape. Journal of Molecular Graphics 1983, 1: 30–38. 10.1016/0263-7855(83)80001-0
Article CAS Google Scholar
Jmol[http://jmol.sourceforge.net]
Fawcett T: An introduction to ROC analysis. Pattern Recognition Letters 2006, 27: 861–874. 10.1016/j.patrec.2005.10.010
Article Google Scholar

Download references

Acknowledgements

The work was supported by the National Institutes of Health Contract HHSN26620040006C.

Author information

Authors and Affiliations

San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive, La Jolla, California, 92093, USA
Julia Ponomarenko & Philip E Bourne
Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 9500 Gilman Drive, La Jolla, California, 92093, USA
Julia Ponomarenko & Philip E Bourne
Isis Pharmaceuticals, Inc., 1896 Rutherford Road, Carlsbad, California, 92008, USA
Huynh-Hoa Bui
La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, California, 92037, USA
Alessandro Sette & Bjoern Peters

Authors

Julia Ponomarenko
View author publications
You can also search for this author in PubMed Google Scholar
Huynh-Hoa Bui
View author publications
You can also search for this author in PubMed Google Scholar
Wei Li
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas Fusseder
View author publications
You can also search for this author in PubMed Google Scholar
Philip E Bourne
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Sette
View author publications
You can also search for this author in PubMed Google Scholar
Bjoern Peters
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julia Ponomarenko.

Additional information

Authors' contributions

HHB conceived, designed and programmed the tool. JVP tested the tool and wrote the manuscript. WL and NF participated in programming the tool. PEB, BP and AS contributed to writing the manuscript. All authors have read and approved the final version of the manuscript.

Electronic supplementary material

12859_2008_2499_MOESM1_ESM.xls

Additional file 1: The detailed statistics on the prediction results for 39 epitopes analyzed. This table provides additional information that complements the Table 1 and Figure 3. (XLS 514 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Ponomarenko, J., Bui, HH., Li, W. et al. ElliPro: a new structure-based tool for the prediction of antibody epitopes. BMC Bioinformatics 9, 514 (2008). https://doi.org/10.1186/1471-2105-9-514

Download citation

Received: 24 September 2008
Accepted: 02 December 2008
Published: 02 December 2008
DOI: https://doi.org/10.1186/1471-2105-9-514

ElliPro: a new structure-based tool for the prediction of antibody epitopes