The accurate prediction of ligand binding residues from amino acid sequences is important for the automated functional annotation of novel proteins. In the previous two CASP experiments, the most successful methods in the function prediction category were those which used structural superpositions of 3D models and related templates with bound ligands in order to identify putative contacting residues. However, whilst most of this prediction process can be automated, visual inspection and manual adjustments of parameters, such as the distance thresholds used for each target, have often been required to prevent over prediction. Here we describe a novel method FunFOLD, which uses an automatic approach for cluster identification and residue selection. The software provided can easily be integrated into existing fold recognition servers, requiring only a 3D model and list of templates as inputs. A simple web interface is also provided allowing access to non-expert users. The method has been benchmarked against the top servers and manual prediction groups tested at both CASP8 and CASP9.
The FunFOLD method shows a significant improvement over the best available servers and is shown to be competitive with the top manual prediction groups that were tested at CASP8. The FunFOLD method is also competitive with both the top server and manual methods tested at CASP9. When tested using common subsets of targets, the predictions from FunFOLD are shown to achieve a significantly higher mean Matthews Correlation Coefficient (MCC) scores and Binding-site Distance Test (BDT) scores than all server methods that were tested at CASP8. Testing on the CASP9 set showed no statistically significant separation in performance between FunFOLD and the other top server groups tested.
The FunFOLD software is freely available as both a standalone package and a prediction server, providing competitive ligand binding site residue predictions for expert and non-expert users alike. The software provides a new fully automated approach for structure based function prediction using 3D models of proteins.
Because of a protein's essential cellular role, it is important to fully understand its structure and interactions. Predicting the location of the binding site and the ligand binding residues, is a necessary step towards elucidating how a protein functions . The determination of the ligand binding site residues in a protein is also important, because substrate specificity of an enzyme is determined by the fine details of the binding site residues, such as side chain orientation and physiochemical properties .
Several different protein ligand binding site prediction methods have been developed, which mostly fall into two major categories: sequence-based methods and structure-based methods . The sequence-based methods rely on identifying conserved residues that may be structurally or functionally important and include methods such as firestar (CASP9 group FN315) , WSsas , FRcons , ConFunc (CASP8 - FN437) , ConSurf , FPSDP (CASP8 - FN242)  and INTERPID . The structure-based methods can be further subdivided into the geometric methods utilised by FINDSITE  and SiteHunter (CASP8 - FN163) ; energetic methods utilised by SITEHOUND  and Q-SiteFinder  and miscellaneous methods that utilise information from; homology modelling, such as 3D LigandSite  (CASP9- FN017, FN415, FN057, FN072), Mariner1 (CASP8 - FN450)  and MetSite ; surface accessibility used by the LIGSITEcsc  method and physiochemical properties utilised by SCREEN .
In the CASP6 experiment, a function prediction category was included for the first time, where groups were required to predict Enzyme Commission numbers (EC) and Gene Ontology (GO) terms . As a result of the difficulty in assessing these terms, the CASP7 assessors made a decision to modify the function prediction category . Consequently in CASP8, the function prediction was included in a different format, with ligand binding site residues assessed, as many CASP targets were found to crystallize with biologically relevant ligands .
In CASP8, the top methods in the function prediction category were the methods by the Lee group  and the Sternberg group . The Lee group had two main methods: a manual method (FN407) and an extended deadline server method (FN293). The Lee extended deadline server method relied on manual configuration of distance cut-offs for each target and, at the time of writing, a server based on this method is not publicly available. The Sternberg group was also registered as a manual prediction group (FN202), however since CASP8, the group has produced a publicly available server, 3D LigandSite, which is reported to be similar in performance to their manual prediction method from CASP8 .
The Lee group's manual and server methods from CASP8 only differed in the top 3D models used for prediction. The methods were both based on 3D superposition of structurally similar proteins that contain ligands. Residues were considered to be in contact with the ligands if the distance between them was less than 0.5Å + the Van der Waals radii, and the cut-offs for including residues in predictions were manually altered depending on the target .
The Sternberg group's manual predictions in CASP8, and their 3D LigandSite  server, used a very similar method  to the Lee group for ligand binding site prediction. The Sternberg group used the 3D-Jury method  in order to select from amongst the CASP8 server models followed by structural superposition of models and related templates. In addition, residue conservation was considered using ConFunc .
However, for the 3D LigandSite  server method, Wass et al. used the top 3D models from the Phyre server  and MAMMOTH  for structural superposition of similar templates with bound ligands. The residue conservation scoring was found to cause significant over prediction of residues  and so data from ConFunc is not directly included in the current 3D LigandSite server predictions . To the best of our knowledge the 3D LigandSite server is the best publicly available fully automated method for prediction of ligand binding residues that has been independently benchmarked on the CASP8 data set and it produces comparable results to that of group FN202 . Hence in this study we firstly compare our novel approach, FunFOLD, against all the groups at CASP8, paying particular attention to the comparison with the predictions from CASP8 group FN202.
In CASP9, the top few ranked methods in the function prediction category were by the Zhang group (FN096, FN339) and the firestar group (FN035, FN315). The I-TASSER-FUNCTION and Zhang human methods are again based on model-to-template superposition, to predict the ligand binding site residues. However, at the time of writing no publicly available server has been produced that implements the I-TASSER-FUNCTION method. The firestar (FN315) server, which is publicly available, also utilized sequence conservation from PSI-BLAST  profile alignments in order to predict ligand binding site residues .
According to the official CASP9 assessment , it was difficult to find statistically significant separations between the top 11 or so groups in the function prediction category, which included - Zhang (FN096), I-TASSER-FUNCTION (FN339), firestar (FN315), FAMSSEC (FN113), CNIO-firestar (FN035), Sternberg (FN110), Seok (FN242), Jones-UCL (FN104), Seok-server (FN452), Lee (FN114), McGuffin (FN094) and gws (FN236). Hence we also compare FunFOLD against all groups at CASP9, paying particular attention to the comparison with these top performing groups and their associated servers: I-TASSER-FUNCTION (FN339), firestar (FN315), Seok-server (FN452), gws (FN236) and the 3D LigandSite servers (FN017, FN415, FN057, FN072).
The FunFOLD method is similar in concept to other groups' methods, such as the Lee group and the Sternberg group methods, which use protein structure superposition of distantly related templates to a modelled protein to identify ligand binding sites. However, the FunFOLD algorithm uses a novel automated method for ligand clustering and identification of binding residues. We also investigate the use of ModFOLDclust2  to select different starting models from amongst the pool of alternative server models and gauge the affect on accuracy.
The FunFOLD method for predicting ligand binding site residues is based on the concept that, ligand containing templates from the PDB with the same folds (according to TMalign ) as the 3D model of the target protein under analysis, may contain similar binding sites. The FunFOLD standalone method takes as its input a 3D model of the protein under analysis and a list of template PDB IDs, which can be obtained from the templates used to build the model. The prototype version of the FunFOLD server method, which was developed during the CASP9 prediction season (listed as the group "IntFOLD-FN" - FN425 (server)), queried the FireDB  in order to identify if ligands in each template PDB file were biologically relevant. This method worked well, but it relied on the FireDB database, which has been updated since CASP8. Although it was unlikely that the identified biological ligands within templates would have changed since CASP8 (Mike Tress pers. comm.), there was a small chance the list of relevant ligands in FireDB could have been influenced by new related PDB entries since CASP8. In addition, during the CASP9 prediction season our prototype version of the server often had connectivity problems with the database. In order to become independent of querying the FireDB database, a list of biologically relevant ligands was obtained from the Sternberg group, which was originally used by the 3D LigandSite server . Using this list allows the FunFOLD standalone software and the current version of the FunFOLD server to be more reliable, independent on external servers and suitable for benchmarking on both of the CASP datasets.
The FunFOLD algorithm used the TM-align method  to superpose each of the template structures containing relevant ligands onto the 3D protein model. Each model-to-template superposition was saved if the resulting TM-score ≥0.4 (TM-scores ~0.4-0.6 have been shown to mark the transition phase of significantly related folds ). A simple PyMOL script allowed each of the superposition files to be used in order to orientate the original PDB structures with bound ligands correctly relative to the model. The resulting PyMOL superposition of all templates and models was then saved and parsed to leave only the coordinates of the model and relevant ligands.
Ligands were then assigned to clusters using an agglomerative hierarchical clustering algorithm that identified each continuous mass of contacting ligands, thereby indicating putative binding pockets. Ligands were considered to be part of a cluster if any of their atoms were in contact with the continuous mass. Thus, the linkage criteria for clustering were determined by contacts between ligands, which were defined as ≤ the Van der Waals radius of an atom plus 0.5 Ångströms. Once each continuous mass of contacting ligands was identified, the cluster with the largest number of ligands was selected as the location of the most likely binding pocket. The distances between all atoms within the mass of ligands and all atoms within the 3D protein model were then calculated. Again, residues were determined to be in contact with the ligand if the distance between a ligand atom and a residue atom was ≤ the Van der Waals radius plus 0.5 Ångströms.
In order to determine which residues were most likely to bind to the predicted ligand, a "residue voting" procedure was carried out. For a residue to be included in a prediction it must have had at least one contact with 2 or more ligands and at least 25% of the ligands in the cluster. This cut-off was determined during the CASP9 prediction season whilst the method was in development and was based on comparisons of the FunFOLD server output with that obtained from state-of-the-art servers, such as 3D LigandSite, that were publicly available at the time. However, no optimization of this cut-off was carried out prior to testing of the FunFOLD software on the CASP8 and CASP9 data sets.
The residue voting system used in FunFOLD can be illustrated in Figure 1. In Figure 1 we can see a cluster that contains a continuous mass of metals (1 ZN, 6 FEs, 1 NI and 1 MG), 1 GDP and 3 PO4s. The residues that are only contacting the GDP ligand are not considered as potential ligand binding residues, as the residues only receive 1/13 votes each (7.69%). Whilst the residues that are in contact with GDP and one PO4 molecule receive 2 votes, they only receive a 15% share of the vote (2/13) and so they are also excluded from the prediction. The final predicted ligand binding residues are therefore: HIS29 with 7/13 votes (53.85%), HIS58 with 11/13 votes (84.62%), ASP59 with 12/13 votes (92.31%) and ASP122 with 5/13 votes (38.46%); all are above the 25% threshold required to be included in the prediction.
Figure 1. The FunFOLD residue voting system. Ribbon diagram of CASP8 target T0470 (PDBID 3djb) which illustrates how the voting system works in FunFOLD. The green sticks represented predicted ligand binding residues (which are the same as the observed ligand binding residues) and the white spheres and sticks represent predicted ligands.
The FunFOLD method was benchmarked using only the information concerning templates and models for each target that could be obtained from the CASP8 and CASP9 server predictions. Thus all of the information used was available to predictors during both CASP prediction seasons. The FN prediction files for the 27 targets analysed for function prediction in CASP8 , the 30 targets analysed for function prediction in CASP9, and all associated 3D server models were downloaded from the CASP website .
For each CASP8 target, the ModFOLDclust2 method  was used to select the top 3D model from the server models submitted at CASP8. However, the top model from the IntFOLD-TS server (TS275)  was used for testing performance on each of the CASP9 targets. The IntFOLD-TS models were used so that the method is the same as the current FunFOLD server implementation. The top model for each target was then used as the starting model for predicting ligand binding residues. The parent records from each server model were examined in order to construct a list of template PDB IDs for each target that was available at the time of each CASP prediction season. The resulting template list was then filtered using FASTA  to ensure it was 70% non-redundant according to pairwise sequence identity. This type of filtering is in line with that carried out during the construction of the non-redundant fold libraries used by many fold recognition servers, such as IntFOLD-TS. Finally, a maximum of 40 templates were used in our analysis for efficiency.
The FunFOLD prediction results were compared against all of the function prediction groups participating in CASP8 and CASP9 using the MCC scores  as an indicator of performance. An analysis of the statistical significance between the differences in mean scores was also carried out, similar to that of the official CASP assessments [1,27]. In addition, a new metric, the Binding-site Distance Test (BDT) score  was used with the d0 threshold set to 1Ǻ, in order to stringently assess the accuracy of predictions. The BDT score ranges between 0 (random) and 1 (perfect) and relates to the actual 3D distance between the predicted residues and the observed residues. The BDT score appropriately penalizes both under and over predictions, whilst also considering the 3D distance of predicted residues from the observed binding site. Thus, predictions close to the binding site score higher than more distant predictions .
The top function prediction methods in CASP8 were methods by the Lee group (FN407 & FN293)  and the Sternberg group (FN202). In order to more stringently compare the performance of the FunFOLD method against that of the Lee and Sternberg methods, for each prediction we also used the top 3D models that the Lee and the Sternberg groups submitted for their CASP8 server and manual predictions (LEE-S_TS1, LEE_TS1 and Phyre-de-novo_TS1). This analysis was also carried out on the CASP9 data set, using models from both the IntFOLD-TS server (FN275) and the Zhang-server (FN428) methods. This allowed us to gauge how much of the difference in performance was due to the initial model selection. Finally, the analysis was repeated using native structures for each CASP target in order to evaluate performance using "perfect models".
The FunFOLD method for the prediction of ligand binding residues is benchmarked using the set of 27 CASP8 function prediction targets and the set of 30 CASP9 function prediction targets. The CASP sets are further subdivided according to the types of ligand bound, as below:
CASP8 targets: Metal - T0391, T0406, T0407, T0410, T0425, T0426, T0440, T0444, T0453, T0457, T0461, T0470, T0476, T0478, T0480, T0487; Non-metal - T0394, T0396, T0422, T0430, T0431, T0450, T0477, T0483, T0485, T0490, T0508
CASP9 targets: Metal - T0518, T0521, T0529, T0539, T0548, T0585, T0625, T0629, T0635; Non-metal - T0515, T0516, T0524, T0526, T0547, T0565, T0570, T0582, T0584, T0591, T0597, T0599, T0604, T0607, T0609, T0613, T0615, T0622, T0632, T0636, T0641
(In CASP8 some targets had both metals and non-metals, but these were defined by the assessors as non-metals. For consistency, we do the same with the CASP9 targets.)
The performance of FunFOLD is compared against that of groups that participated in the CASP8 and CASP9 function prediction categories.
The results of an assessment of binding site predictions, similar to the official CASP8 function prediction assessment carried out by Lopez et al. (2009) , are shown in Figure 2, 3, 4, 5, 6 and 7 and in all tables. The Matthews Correlation Coefficient (MCC) and the Binding-site Distance Test (BDT) method are used to measure prediction success and the resulting scores achieved by the different groups are compared with those from the FunFOLD method. The FunFOLD method is shown to outperform all other methods tested at CASP8 according to both the mean per-target MCC scores and BDT scores (Figure 2).
Figure 2. MCC scores and BDT scores for CASP8 benchmarking. Mean per-target MCC scores and BDT scores for each CASP8 function prediction group and the FunFOLD method (minimum of 15 predictions). * indicates server methods.
The FunFOLD method is also shown to be competitive with the methods tested at CASP9. The FunFOLD method outperformed the 3DLigandSite methods (FN017, FN415, FN057, FN072), but did not outperform the top server methods, I-TASSER-FUNCTION (FN339) and firestar (FN315), according to mean per-target MCC and BDT scores for both partial and extended binding site analysis (Figure 3 and Figure 4). (The CASP9 assessors defined partial binding site residues as binding site residues from CASP9 targets that contain ligands, with extended binding sites referring to CASP9 targets that bound ligands that were not the target's natural ligand, thus, the natural ligand was docked into the binding site and the binding site residues predicted.).
Figure 3. MCC scores and BDT scores for CASP9 benchmarking (partial binding site definition). Mean per-target MCC scores and BDT scores for each CASP9 function prediction group and the FunFOLD method (minimum of 15 predictions). * indicates server methods.
Figure 4. MCC scores and BDT scores for CASP9 benchmarking (extended binding site definition). Mean per-target MCC scores and BDT scores for each CASP9 function prediction group and the FunFOLD method (minimum of 15 predictions). * indicates server methods.
The mean MCC Z-scores and BDT Z-scores were also calculated for each group in order to normalize scores across targets and the results are shown in Figure 5, Figure 6 and Figure 7. Again, the predictions from the FunFOLD method are shown to outperform those from all of the other CASP8 groups (Figure 5) and outperformed all server methods tested at CASP9 with the exception of I-TASSER-FUNCTION (FN339) and firestar (FN315) (Figures 6 and 7).
Figure 5. MCC Z-scores and BDT Z-scores for CASP8 benchmarking. Mean per-target MCC Z-scores and BDT Z-scores for each CASP8 function prediction group and the FunFOLD method (minimum of 15 predictions; only groups with positive Z-scores are shown). * indicates server methods.
Figure 6. MCC Z-scores and BDT Z-scores for CASP9 benchmarking (partial binding site definition). Mean per-target MCC Z-scores and BDT Z-scores for each CASP9 function prediction group and the FunFOLD method (minimum of 15 predictions; only groups with positive Z-scores are shown). * indicates server methods.
Figure 7. MCC Z-scores and BDT Z-scores for CASP9 benchmarking (extended binding site definition). Mean per-target MCC Z-scores and BDT Z-scores for each CASP9 function prediction group and the FunFOLD method (minimum of 15 predictions; only groups with positive Z-scores are shown). * indicates server methods.
Whilst the results shown in Figures 2, 3, 4, 5, 6 and 7 indicate that the FunFOLD method is competitive, each group has made predictions for different numbers of targets. Therefore, a direct comparison of methods from CASP8 and CASP9 on common subsets of targets must be carried out along with an analysis of statistical significance of the differences in performance. The results in Table 1 show that the FunFOLD method does achieve higher mean scores and mean Z-scores than every other CASP8 method, when common subsets of predictions are directly compared.
Table 1. The mean MCC and BDT raw scores and Z-scores obtained by FunFOLD are compared with those obtained by each CASP8 function prediction group.
The difference in mean MCC performance is >28% for each of the true server methods tested at CASP8. In addition, the FunFOLD method shows a 13% improvement over group FN202's CASP8 predictions, a 5% improvement over group FN293's CASP8 predictions and a 1% improvement over FN407's CASP8 predictions. The improvement is statistically significant for all CASP8 groups tested, except the methods by the Lee group - FN407 and FN293. A similar increase in performance is shown if the BDT score is used for the assessment (Table 1).
Tables 2 and 3 show the performance of methods on the subsets of targets; those containing only metal ligands (Table 2) and those containing non-metal ligands (Table 3). On the CASP8 data set the FunFOLD method performs slightly better on the subset of targets containing metal ligands than those containing non-metals. The FunFOLD method does improve upon the majority of other methods when tested on the non-metal subset; however there is no significant difference compared with the top four methods (Table 3).
Table 2. The mean MCC and BDT raw scores and Z-scores obtained by FunFOLD are compared with those obtained by each CASP8 function prediction group - as in Table 1 except for CASP8 targets containing only metal ligands.
Table 3. The mean MCC and BDT raw scores and Z-scores obtained by FunFOLD are compared with those obtained by each CASP8 function prediction group - as in Table 1 except for CASP8 targets containing non-metal ligands or metal and non-metal ligands in the same binding site.
In order to produce the FunFOLD results shown in Table 1, 2 and 3, the ModFOLDclust2 quality assessment program  was used to select a 3D model for each target and this model was then used as an input for the FunFOLD executable. However, the results in Table 4 are also shown in order to compare the performance of FunFOLD if the TS1 3D models submitted by each CASP8 comparison group are used instead. An improvement in MCC score of around 10% is shown compared to group FN202's predictions, which again is shown to be statistically significant. The improvement over group FN293's predictions is ~3%, whilst the improvement over group FN407 is increased to ~2%. Again, using the BDT score shows that a significant increase over group FN202 is maintained along with an increase in mean scores compared with those obtained by groups FN407 and FN293. In Table 5, the coordinates from the native structures are used in order to determine the difference in performance compared with the top 3 groups. Using native structures instead of models, the FunFOLD method shows a significant improvement over both groups FN293 and FN202.
Table 4. What is the effect on FunFOLD performance if different starting models are used?
Table 5. What is the effect on FunFOLD performance if the native structures are used for each target?
In order to produce the FunFOLD results shown in Tables 6, 7, 8, 9, 10 and 11, the top IntFOLD-TS model for each target was used as an input for the FunFOLD executable. This was done so that the method matched the current implementation of the FunFOLD server. The results indicate that there is no statistically significant difference between the top server methods tested at CASP9 and the FunFOLD method, according to p-values of the raw and z-scores, on the partial (Table 6, 7 and 8) and extended binding sites (Tables 9, 10 and 11). Again, the CASP9 dataset is divided into subsets of targets containing metal and non-metal ligands and the results are shown in Tables 7, 8, 10 and 11. However, there are no notable differences compared with using the full data (Tables 6 and 9).The results actually show fewer significant differences between methods, which is expected due to the reduced sizes of the subsets (Tables 7, 8, 10 and 11).
Table 6. The mean MCC and BDT raw scores and Z-scores obtained by FunFOLD are compared with those obtained by the top CASP9 function prediction groups for the partial binding sites analysis.
Table 7. The mean MCC and BDT raw scores and Z-scores obtained by FunFOLD are compared with those obtained by the top CASP9 function prediction groups - as in Table 6 except for CASP9 targets containing only metal ligands.
Table 8. The mean MCC and BDT raw scores and Z-scores obtained by FunFOLD are compared with those obtained by the top CASP9 function prediction groups - as in Table 6 except for CASP9 targets containing non-metal ligands or metal and non-metal ligands in the same binding site.
Table 9. The mean MCC and BDT raw scores and Z-scores obtained by FunFOLD are compared with those obtained by the top CASP9 function prediction groups for the extended binding sites analysis.
Table 10. The mean MCC and BDT raw scores and Z-scores obtained by FunFOLD are compared with those obtained by the top CASP9 function prediction groups - as in Table 9 except for CASP9 targets containing only metal ligands, with a minimum of 6 predictions in common.
Table 11. The mean MCC and BDT raw scores and Z-scores obtained by FunFOLD are compared with those obtained by the top CASP9 function prediction groups - as in Table 9 except for CASP9 targets containing non-metal ligands or metal and non-metal ligands in the same binding site.
In line with the analysis performed using the CASP8 dataset, we also tested the FunFOLD method using the CASP9 TS1 3D models obtained from the Zhang-Server, but again we saw no significant difference in performance by using a different starting model (results not shown). The results in Tables 12 and 13 compare the FunFOLD performance using CASP9 native structures with partial and extended binding site definitions respectively. In Table 12, no improvement is seen from using native structures, whilst Table 13 shows that a slight performance increase in mean scores can be achieved, however this is not significant.
Table 12. What is the effect on FunFOLD performance if the native structures are used for each target? A repeat comparison of FunFOLD performance versus the top 3 groups for at CASP9 using partial binding site definitions.
Table 13. What is the effect on FunFOLD performance if the native structures are used for each target? A repeat comparison of FunFOLD performance versus the top 3 groups for at CASP9 using extended binding site definitions.
Examples of predictions
If the FunFOLD method is provided with a good quality model and a list of templates that are structurally similar containing biologically relevant ligands, then highly accurate predictions can be achieved (Figure 8A and 8B; Figure 9 A and 9B).
Figure 8. Examples of binding site predictions from CASP8 targets using FunFOLD. The green sticks represent residues in the model that FunFOLD has predicted as binding to the ligands. The red sticks represent residues that were not predicted or incorrectly predicted as potential ligand binding residues. The blue sticks represent the observed ligand binding site residues in the experimental structure. The white spheres and the thin white sticks represent ligands either predicted (A and C) or observed (B and D). A) An example of a good FunFOLD prediction for CASP8 target T0407 (3e38), with the predicted binding site residues and ligands shown. The FunFOLD method predicted both metals and SO4 in the ligand cluster (CL-3, ZN-6, SO4-2 and FE-2), with the centroid ligand predicted to be SO4. B): The predicted binding site for T0407 using the model superposed onto the experimental structure, with both the observed and predicted binding site residues shown. The observed binding site ligands are also shown (ZN - 3). The red sticks represent the under-prediction of HIS157. C) An example where FunFOLD under-predicts to a greater extent for CASP8 target T0483 (3dls). The predicted ligand binding site residues and ligands are shown. The FunFOLD method predicted the centroid ligand as a nucleotide (ANP), with the ligand cluster containing both nucleotides and metals (ANP-3, ATP-1, STU-1, ADP-1, MG-5 and MN-2). D) The predicted binding site of the top model for T0483 superposed onto the experimental structure, with both the observed and predicted binding site residues shown. The binding site ligands ADP and Mg-2 are also shown.
Figure 9. Examples of binding site predictions from CASP9 targets using FunFOLD. The green sticks represent residues in the model that FunFOLD has predicted as binding to the ligands. The red sticks represent residues that were not predicted or incorrectly predicted as potential ligand binding residues. The blue sticks represent the observed ligand binding site residues in the experimental structure. The white spheres represent ligands either predicted (A and C) or observed (B and D). A) An example of a good FunFOLD prediction for CASP9 target T0635 (3n1u). The predicted binding site residues and ligands are shown. The ligand cluster contains both metals and sulphates (CL-1, CA-2, SO4-2 and MG-4), with the centroid ligand predicted as MG. The red stick represents the over-prediction of residue THR69. B): The predicted binding site for T0635 using the model superposed onto the experimental structure. The observed and predicted binding site residues and the observed ligand (CA) are shown. The red sticks represent the over-prediction of THR69. C) An example where FunFOLD over-predicts to a greater extent for CASP9 target T0582 (3o14). The predicted binding site residues and ligands (ZN-1, CU-1, FE-3, FE2-1, MN-2) are shown. The centroid ligand in the ligand cluster is predicted to be ZN. D) The predicted binding site of the top model for T0582 superposed onto the experimental structure, with predicted and observer ligand binding site residues and the observed ligand (ZN) are shown.
Figure 8A and 8B represents accurate predictions for the CASP8 target T0407 (PDBID 3e38) with an MCC score of 0.941 and BDT score of 0.888. For comparison, the prediction by group FN202 was also very accurate with an MCC = 0.902 and BDT = 0.827, however the MCC and BDT scores for group FN407 were only 0.579 and 0.563, respectively. Group FN293 did not make a prediction for target T0407.
Analysing the prediction for T0407 in more detail, the FunFOLD method correctly predicted the binding site as being a metal binding site and the observed zinc ligands to be in the binding pocket. However, the method also under predicted one residue - HIS157. This under prediction occurred because the aromatic ring of the histidine residue was orientated away from the binding site in the model, and therefore HIS157 received a low vote of 1/13 (7.69%).
Figure 8C and 8D illustrate where FunFOLD did not succeed in predicting all of the correct residues within the binding site of CASP8 target T0483 producing many false negatives, as well as incorrectly predicting other residues which are not officially considered to be involved in binding to the ligand (false positives). Despite these errors the FunFOLD prediction for T0483 achieved an MCC = 0.616 and a BDT = 0.503, which were higher scores than those produced by FN202 (MCC = 0.453; BDT = 0.492). However, group FN293 achieved an MCC = 0.898 and a BDT = 0.876, and group FN407 achieved an MCC = 0.904 and a BDT = 0.887, for the same target. The FunFOLD method also over predicted 3 residues in target T0483 as ligand binding residues: 35, 37 and 112. The votes for residues 35 and 112 were just above the threshold of 25% at 30.77%, whilst residue 37 had 61.54% of the votes.
According to the ModFOLDclust2  predictions, the global model quality score for the model used in the prediction for target T0483 is ~0.6 (scores >0.4 indicate the fold has a good probability of being correct). However, the local model quality around the binding site is comparatively bad. Thus, FunFOLD did not predict the observed residues 33, 110, 111, 114, 116, 159 and 162. Residues 33, 110 and 159 simply did not receive enough ligand votes to be considered; with residue 33 falling below the cut-off with a vote of 15.39%, 110 getting a low vote of 7.69% and 159 getting a vote just below the cut-off of 23.08%. In addition FunFOLD did not identify any ligand contacts with residues 116 and 162, and residues 111 and 114 were identified in an alternative binding site cluster.
Figure 9A and 9B also represents accurate predictions for CASP9 target T0635 (PDB ID 3n1u), with an MCC score of 0.864 and a BDT score of 0.759. For comparison the Zhang group (FN096) and the I-TASSER-FUNCTION method (FN339) achieved and MCC = 0.603 and a BDT = 0.392, firestar (FN315) achieved an MCC = 0.770 and BDT = 0.617, whilst the Jones-UCL group (FN104), the Sternberg group (FN110), the LEE group (FN114) and the gws server (LEE server FN236) all achieved an MCC = 1.0 and BDT = 1.0.
When a more detailed analysis of the predictions for T0635 was carried out, the FunFOLD method correctly predicted the binding site as being a metal binding site, the correct location of the binding site and the three binding site residues. However, the method over-predicted one residue THR69. This over prediction occurred because the residue was in contact with three ligands (SO4-2 and CL-1), which were not well superposed onto the ligand cluster receiving a vote of 3/9 (33.33%).
Figure 9C and 9D, showing CASP9 target T0582 (PDB ID 3o14), again represents a case where FunFOLD did not succeed in predicting all of the binding site residues and incorrectly predicted other residues that were not part of the official binding site. Despite these errors FunFOLD achieved an MCC = 0.395 and BDT = 0.348. The FunFOLD method correctly predicted residues 60 and 64, but over-predicted residues 59, 115, 116, and 117 and failed to predict binding site residues 58 and 94. Residue 59 received 2/8 votes, residues 115 and 116 received 8/8 votes and residue 117 received 7/8 votes. The under-predicted residues 58 received 1/9 votes, while residue 94 was not predicted to be in contact with the ligand cluster.
According to ModFOLDclust2  predictions, the global model quality score for the model used in the prediction for CASP9 target T0582 = 0.466, which is a reasonable global model quality score. However yet again, the local model quality around the binding site for this model is comparatively bad. This resulted in both over and under-predictions, with residues 115, 116 and 117 being poorly modelled.
In this study we describe a novel method, FunFOLD, for the prediction of ligand binding site residues, which shows a significant improvement over all of the true server methods that were tested at CASP8, as well as the predictions from one of the top manual groups - FN202. In addition, the method was tested on the CASP9 set and was found to be competitive with the top server groups and statistically inseparable in performance from most of the top manual groups. Whilst a prototype version of a server was tested in the CASP9 function prediction category (IntFOLD-FN - FN425), we have since improved the reliability of the FunFOLD server and the current automated implementation more closely resembles the performance of our manual function predictions (McGuffin - FN094).
The performance of all methods was measured using the standard Matthews Correlation Coefficient (MCC) , on both the CASP8 and CASP9 function (FN) targets. The mean MCC Z-scores for FunFOLD, were shown to be higher than the mean MCC Z-scores for all other methods tested at CASP8 (Table 1). The FunFOLD predictions were an improvement upon those made by the Lee manual group (+1% MCC), the Lee server group (+4%) and the Sternberg group (+13%), which were the top groups tested during CASP8. The improvement over the Sternberg group on the CASP8 data set is statistically significant at the 99% level according to the Wilcoxon Signed rank sum test. The FunFOLD method is also shown to be competitive with all methods tested at CASP9 and shows no statistically significant difference with the top ranking server methods: I-TASSER-FUNCTION and firestar  (Tables 4, 5, 6 and 7). However, a statistically significant improvement was seen over the 3DLigandSite methods that were tested at CASP9, at the 99% level according to the Wilcoxon Signed rank sum test.
Intuitively, the quality of the starting 3D model that is used for the prediction of the ligand binding site will have an effect on the accuracy of results. Thus, in this paper for the CASP8 analysis we used ModFOLDclust2 model quality assessment method  to select high quality input models. However, we also tested FunFOLD using alternative 3D models that were submitted by the top function prediction groups and found that the improvement in MCC scores was maintained - the Lee manual method was improved upon by +2%, Lee server by +3% and the Sternberg group by +10%, which was again shown to be statistically significant. In addition, when native structures were used, as might be expected, the improvement in performance was maintained.
On the CASP9 data set we tested starting models from both our IntFOLD-TS server and from the better performing Zhang-Server, however again no significant difference in performance was seen. Furthermore, whilst using the native structures improved performance marginally in some cases, in the majority of cases the increases were not significant. According to these benchmarks, the results indicate that selecting an alternative starting model does not have much of an influence. Therefore, where there has been a significant improvement over other groups, this must have arisen from the FunFOLD algorithm itself, rather than from improved initial model selection.
One of the top ligand binding site prediction servers tested at CASP9 was the I-TASSER-FUNCTION server method (Zhang group), which also relies on 3D model-to-template superposition, for binding site residue prediction. However, in addition to global model to template superposition, the I-TASSER-FUNCTION method also carries out local alignment of the proposed binding site region, in order to improve local superposition. In light of the results shown in Figures 8 and 9, local superposition scores, such as those used in I-TASSER-FUNCTION, could be adopted which may help to improve future versions of FunFOLD. However, in our benchmarking on the CASP9 set we could measure no significant increase in performance of the I-TASSER-FUNCTION method over FunFOLD.
At the time of writing the I-TASSER-FUNCTION server is currently not publicly available. Therefore, arguably the top ranking publicly available server tested at CASP9 was firestar , which predicts residue conservation in target sequences based on PSI-BLAST alignments to the large catalogue of sites in PDB structures contained in the FireDB . However, the firestar method is not currently available as a standalone program and again we could measure no significant performance gain over the FunFOLD method.
The FunFOLD method is clearly also competitive with the top manual groups that were tested in CASP8, however no manual intervention is required for our approach. Furthermore, the method significantly outperforms each of the true server methods tested at CASP8. Whilst the Lee server group (FN293) was mostly automated, the group was counted in the extended deadline category and the authors reported a small amount of human intervention , hence in this study we have considered group FN293 in CASP8 as a non-server group.
The FunFOLD method also significantly outperformed one of the top manual groups (FN202) and since CASP8, the authors have developed a fully automated publicly available server, called 3DLigandSite , variations of which participated in CASP9. The authors reported that the predictions from the 3DLigandSite server were comparable in performance to their manual predictions at CASP8 , therefore in this study we can consider predictions from group FN202 to be the gold standard for fully automated ligand binding residue prediction for testing on the CASP8 data set.
The standalone FunFOLD software uses similar input data to the version of the FINDSITE  software that is currently available to download. For both programs, a 3D model and a list of templates is required, however, the methods differ in the output produced; the FunFOLD method outputs binding residue predictions in CASP FN format, where as the FINDSITE method outputs a list of putative locations for the centre of each binding pocket i.e. locations in 3D space rather than binding site residues. Thus, the FunFOLD software cannot be directly compared to the FINDSITE software as they both produce different output. Furthermore, the FINDSITE dataset  cannot be directly used in our analysis, as the location of the binding site residues for each template is not defined and would have to be predicted, adding potential for errors in methods comparison. However, the latest FINDSITE-DBDT method did compete in CASP9, but to our knowledge the server is not publicly available. The current implementation of FunFOLD, the prototype version of the server (FN425) and our manual prediction group (FN094) performed statistically significantly better than the FINDSITE-DBDT method on the CASP9 data set .
The FunFOLD method uses a similar procedure to that carried out by the most successful prediction groups participating CASP; the 3D input model is superposed onto structurally similar ligand containing PDB files and the putative binding residues are then determined. However, the FunFOLD method uses a novel, fully automated approach for both identifying clusters of ligands and determining putative binding site residues. The novel ligand residue voting method used in FunFOLD reduces the rate of over predictions, which appears to be one of the main problems with many structure based approaches.
There are several caveats to consider when benchmarking methods for the prediction of protein ligand binding residues. Firstly, uncertainties can arise if there are several ligand binding sites within a protein either predicted or observed. For this analysis we only considered the binding residues that were defined by the CASP assessors and for each target only one binding site was defined. Secondly, the inherent flexibility of proteins may make it difficult to determine which residues are actually in contact with the ligand. This is further exacerbated if the binding site is located in a disordered region of a protein. Thirdly, the definition of the distance cut-off for a residue that is in contact with a ligand is the Van der Waals radii + 0.5 Å, but this definition is subjective. Finally, there may be ambiguity about whether the ligand bound to the solved structure is the protein's ideal ligand. Thus a ligand used in a prediction may not necessarily be incorrect. Indeed one binding site may bind more than one ligand.
Each of these issues creates difficulties for the fair assessment of methods as defining a list of observed binding site residues may be subjective. Some of these issues were addressed by the exclusion of "neutral residues" in the CASP 8 analysis , which have also been excluded in this analysis for the CASP8 data (the CASP8 assessors defined neutral residues as those which would potentially bind to an alternative ligand, but which were not observed binding to the alternative ligand within the solved 3D structure). In CASP9 the assessors used two classifications for binding sites - partial and extended. From the official analysis there was not a significant difference in assessment, if methods were analyzed using either partial or extended binding site definitions. However, the use of the MCC statistic for assessment does compound some of these issues and the prediction of a binding residue that is defined as incorrect, but which is nevertheless close to the observed binding pocket will therefore obtain the same score as a random incorrect prediction.
Therefore we recently proposed a novel scoring method - the Binding-site Distance Test (BDT) score, which addresses some of the shortcomings of using MCC scores, whilst maintaining the advantages . Predicted residues that are close to the observed residues will obtain a higher BDT score than more distant predictions. The BDT score was used by the CASP9 assessors, in addition to the MCC score, to investigate if it caused a significant difference in the rankings of the methods but no significant changes in the grouping of top methods were reported . In this analysis, the list of top groups identified using BDT scoring is again roughly in agreement with the top groups obtained using the MCC scoring; however the ranking of some of the less accurate methods does appear to change. Using the mean BDT score, we also see higher scores for FunFOLD compared with all methods tested on CASP8 on equivalent subsets of data and the difference is again significant for all but the top two manual groups (Figure 2, Figure 6 and Table 1). When the CASP9 predictions are analysed using the mean BDT scores the FunFOLD method is ranked below the top two server methods I-TASSER-FUNCTION and firestar , however again the difference in performance is not significant.
An obvious way of improving future versions of the FunFOLD software would be to optimize the voting threshold for the inclusion of predicted residues. Furthermore, predicting the ligand binding site residues on multiple models and then pooling the results may help to increase accuracy. The specific physiochemical properties of the residues could be studied, such as charge and polarity, and residues exhibiting more favorable physiochemical properties for binding to a ligand could be weighted more heavily in the residue voting process. A prediction of the enzymes functional family could be performed, with the prediction then used to weight residues that bind to ligands that occur more often within these families, more heavily in the voting process. In addition to undertaking global model-to-template superpositions, local superposition of the binding site regions could also be carried out to increase accuracy. As previously mentioned this was carried out by one of the top server groups at CASP9 (I-TASSER-FUNCTION - FN339).
A general function prediction quality assessment tool could also be developed in order to weight predictions or provide probabilities scores for individual residues. Features of the quality assessment might include: the type of ligands within the cluster, with clusters containing a large number of similar ligands receiving a higher score; the distance of the superposed ligands from the centroid ligand within the cluster, with clusters containing ligands that are superposed perfectly receiving higher scores; the global and local model quality scores of the starting 3D model could also be factored into the analysis using our ModFOLD methods [28,37], with residues in poorly modeled regions down weighted; the probability of bound residues occurring in disordered regions could also be considered by integrating our DISOclust results , with residues in regions of high disorder receiving an appropriate weighting. In future, each of these features could be integrated into an automated quality assessment tool in order to produce more appropriate confidence scores which could be used for ranking binding residue predictions.
The FunFOLD software implements a competitive method for the prediction of protein binding site residues, which can also be used to determine the putative ligands interacting with a protein. The method is available as a standalone program, which can be used to predict binding residues and ligands based on user supplied 3D models and template lists. We also provide access to FunFOLD via a simple web server, which only requires users to supply an amino acid sequence.
Availability and requirements
• Project name: FunFOLD
• Project homepages: The standalone software can be downloaded from http://www.reading.ac.uk/bioinf/downloads/ webcite. The web server can be accessed at http://www.reading.ac.uk/bioinf/FunFOLD/ webcite. The FunFOLD method is also part of the IntFOLD server pipeline: http://www.reading.ac.uk/bioinf/IntFOLD/ webcite
• Operating system(s): Platform independent (Linux preferred)
• Other requirements: The standalone version requires a recent version of Java, a recent version of PyMOL (http://www.pymol.org webcite), and the TMalign program (http://zhanglab.ccmb.med.umich.edu/TM-align webcite).
• Licence: freely available
Conflict of interest
The authors declare that they have no competing interests.
DBR contributed code, carried out testing of the system, carried out benchmarking, contributed figures and drafted the manuscript. SJT carried out testing of the system, and contributed text and figures to the manuscript. LJM conceived the idea, produced the executable, carried out statistical data analysis, contributed text and figures to the manuscript and carried out final editing of the manuscript. All authors read and approved the final manuscript.
We would like to thank Dr Mark Wass from Imperial for supplying us with his list of biologically relevant ligands. Funding: This work was funded by an RCUK Academic Fellowship (LJM) and a University of Reading Faculty Studentship (DBR).
CASP9 function prediction assessment [http://www.predictioncenter.org/casp9/doc/presentations/CASP9_FN.pdf] webcite
Roche DB, Buenavista MT, Tetchner SJ, McGuffin LJ: The IntFOLD server: an integrated web resource for protein fold recognition, 3D model quality assessment, intrinsic disorder prediction, domain prediction and ligand binding site prediction.
Biochim Biophys Acta 1975, 405(2):442-451. PubMed Abstract