Skip to main content
  • Research article
  • Open access
  • Published:

Evaluation of 3D-Jury on CASP7 models

Abstract

Background

3D-Jury, the structure prediction consensus method publicly available in the Meta Server http://meta.bioinfo.pl/, was evaluated using models gathered in the 7thround of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7). 3D-Jury is an automated expert process that generates protein structure meta-predictions from sets of models obtained from partner servers.

Results

The performance of 3D-Jury was analysed for three aspects. First, we examined the correlation between the 3D-Jury score and a model quality measure: the number of correctly predicted residues. The 3D-Jury score was shown to correlate significantly with the number of correctly predicted residues, the correlation is good enough to be used for prediction. 3D-Jury was also found to improve upon the competing servers' choice of the best structure model in most cases. The value of the 3D-Jury score as a generic reliability measure was also examined. We found that the 3D-Jury score separates bad models from good models better than the reliability score of the original server in 27 cases and falls short of it in only 5 cases out of a total of 38. We report the release of a new Meta Server feature: instant 3D-Jury scoring of uploaded user models.

Conclusion

The 3D-Jury score continues to be a good indicator of structural model quality. It also provides a generic reliability score, especially important for models that were not assigned such by the original server. Individual structure modellers can also benefit from the 3D-Jury scoring system by testing their models in the new instant scoring feature http://meta.bioinfo.pl/compare_your_model_example.pl available in the Meta Server.

Background

The number of protein structure prediction servers has increased over the past years [1]. The use of many different methods to predict the structure of a protein is now state-of-the-art in protein structure prediction [2]. However, the number of available servers, taken together with the number of models returned exceeds the limit a human researcher is likely to scan. Fortunately, structure prediction meta-servers address this problem: they gather models from various other servers and employ automated processes successfully applied by human experts in order to deliver a correct prediction [1]. Since existing structure prediction servers are constantly upgraded while new servers appear, it is necessary to re-evaluate the fitness of the aforementioned expert processes.

The latest, 7thround of the Critical Assessment of Techniques for Protein Structure Prediction [3] has provided us with a fair amount of structure prediction server models. With the help of the Structure Prediction Meta Server [4], we have evaluated the servers returning these models using the same protocols as in previous Livebench experiments [5], results are available at [6].

Standard evaluation methods take into account the first (top ranked) model of the prediction servers. The Meta Server assigns a new reliability score to each model using 3D-Jury [7]. This score can be used to re-rank the models and thus affect the evaluation results. The aim of the present work was to verify the continued applicability of this model ranking method, focusing on the version available on-line. We were interested in answering the following three questions: Can we use 3D-Jury to estimate model quality? Does 3D-Jury select a model more accurate than the choice of the generating server? Could the 3D-Jury score be used as a generic model reliability score?

Results and Discussion

3D-Jury score correlates with the number of correctly predicted residues

The correlation of the 3D-Jury score (Jscore) with model quality is of fundamental importance to the operation of the Meta Server. Therefore we first examined the correlation of the 3D-Jury score returned by the default on-line version of 3D-Jury: 3J1,A(see Methods: 3D-Jury operating modes), with the number of correctly predicted residues ( N C α 3.5 Å MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGobGtdaWgaaWcbaGaem4qam0aaSbaaWqaaGGaciab=f7aHbqabaWccqGHKjYOcqaIZaWmcqGGUaGlcqaI1aqntCvAUfeBSjuyZL2yd9gzLbvyNv2CaeHbhv2BYDwAHbaceiGaa4xXaaqabaaaaa@413E@ ).

3D-Jury scores correlate with the number of correctly predicted residues ( N C α 3.5 Å MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGobGtdaWgaaWcbaGaem4qam0aaSbaaWqaaGGaciab=f7aHbqabaWccqGHKjYOcqaIZaWmcqGGUaGlcqaI1aqntCvAUfeBSjuyZL2yd9gzLbvyNv2CaeHbhv2BYDwAHbaceiGaa4xXaaqabaaaaa@413E@ ): the correlation coefficient is 0.95. A linear model (LM1) is presented on Figure 1. The residual error, 20.15, is low enough to enable meaningful estimation of the number of correctly positioned residues.

Figure 1
figure 1

Correlation of 3D-Jury score with the number of correctly predicted C α atoms. N C α 3.5 Å MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGobGtdaWgaaWcbaGaem4qam0aaSbaaWqaaGGaciab=f7aHbqabaWccqGHKjYOcqaIZaWmcqGGUaGlcqaI1aqntCvAUfeBSjuyZL2yd9gzLbvyNv2CaeHbhv2BYDwAHbaceiGaa4xXaaqabaaaaa@413E@ – the number of C α atoms predicted within 3.5 Å from their respective locations in the crystal structure; Jscore – 3J1,Ascore; solid green line – prediction of linear model LM1; blue longdash lines: confidence interval at 95% confidence level; blue dashed lines: prediction interval at 90% confidence level; blue dotdash lines: prediction interval at 95% confidence level; blue dotted lines: prediction interval at 99% confidence level; x – slope; the colour bar is key to the approximate density of models A linear model (LM1) was fitted to the 3D-Jury score vs. N C α 3.5 Å MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGobGtdaWgaaWcbaGaem4qam0aaSbaaWqaaGGaciab=f7aHbqabaWccqGHKjYOcqaIZaWmcqGGUaGlcqaI1aqntCvAUfeBSjuyZL2yd9gzLbvyNv2CaeHbhv2BYDwAHbaceiGaa4xXaaqabaaaaa@413E@ of 19,558 models. The residual standard error is 20.15. The 95% confidence interval as well as prediction intervals for 90%, 95% and 99% confidence levels are indicated on the figure. The vertical and horizontal histograms show the distributions of N C α 3.5 Å MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGobGtdaWgaaWcbaGaem4qam0aaSbaaWqaaGGaciab=f7aHbqabaWccqGHKjYOcqaIZaWmcqGGUaGlcqaI1aqntCvAUfeBSjuyZL2yd9gzLbvyNv2CaeHbhv2BYDwAHbaceiGaa4xXaaqabaaaaa@413E@ and 3D-Jury scores respectively.

A better model (LM2) can be obtained by fitting to the [30, 100) 3D-Jury score range only. This range represents difficult targets. Figure 2 shows the linear model obtained. The residual error is 13.37, offering narrower, better prediction intervals for the number of correctly positioned residues.

Figure 2
figure 2

Correlation of 3D-Jury score in the [30–100) range with the number of correctly predicted C α atoms. N C α 3.5 Å MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGobGtdaWgaaWcbaGaem4qam0aaSbaaWqaaGGaciab=f7aHbqabaWccqGHKjYOcqaIZaWmcqGGUaGlcqaI1aqntCvAUfeBSjuyZL2yd9gzLbvyNv2CaeHbhv2BYDwAHbaceiGaa4xXaaqabaaaaa@413E@ – the number of C α atoms predicted within 3.5 Å from their respective locations in the crystal structure; Jscore – 3J1,Ascore; solid green line – prediction of linear model LM2; blue longdash lines: confidence interval at 95% confidence level; blue dashed lines: prediction interval at 90% confidence level; blue dotdash lines: prediction interval at 95% confidence level; blue dotted lines: prediction interval at 99% confidence level; x – slope; the colour bar is key to the approximate density of models A linear model (LM2) was fitted to the 3D-Jury score vs. N C α 3.5 Å MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGobGtdaWgaaWcbaGaem4qam0aaSbaaWqaaGGaciab=f7aHbqabaWccqGHKjYOcqaIZaWmcqGGUaGlcqaI1aqntCvAUfeBSjuyZL2yd9gzLbvyNv2CaeHbhv2BYDwAHbaceiGaa4xXaaqabaaaaa@413E@ of 6,710 models. The residual standard error is 13.37. The 95% confidence interval as well as prediction intervals for 90%, 95% and 99% confidence levels are indicated on the figure. The vertical and horizontal histograms show the distributions of N C α 3.5 Å MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGobGtdaWgaaWcbaGaem4qam0aaSbaaWqaaGGaciab=f7aHbqabaWccqGHKjYOcqaIZaWmcqGGUaGlcqaI1aqntCvAUfeBSjuyZL2yd9gzLbvyNv2CaeHbhv2BYDwAHbaceiGaa4xXaaqabaaaaa@413E@ and 3D-Jury scores respectively. The 30 to 100 3D-Jury score range was chosen to represent difficult targets.

As an example to the use of LM2, let's assume that our model has 3D-Jury score 44.5. We can expect to have 13 to 82 well positioned residues in this model on the 99% confidence level, 21 to 74 on the 95% confidence level. For a score of 59 the 99% prediction interval for the number of correct residues is 26–94, the 95% prediction interval is narrower: 34–86.

A key to which residues are likely to be well-positioned is provided on the model-centred 3D-Jury page, accessible by selecting a model in the Model column of the main 3D-Jury page. Here, residues that are likely to be correctly positioned would have grey background at the corresponding positions of most of the other aligned models, forming a column of grey background.

3D-Jury improves overall server prediction results

We examined whether 3D-Jury could improve overall server performance by selecting a better model when multiple models are returned by a prediction server. We tested four operating modes of 3D-Jury: 3J1,A– uses one model of the default servers (a mode typical for on-line predictions); 3Ja,Aall models of default servers; 3J1,Cone model of all servers; 3J a,C all models of all servers. We have computed the MaxSub score (MaxS) [8] of 25,215 models for this analysis. Four 3D-Jury scores (Jscore) were also computed for each model, respective to the four 3D-Jury operating modes mentioned above. The servers' choice of the best model was evaluated by summing the MaxS' of the first models returned for each target. The four 3D-Jury variants' choice of the best model was evaluated by summing the MaxS' of the models with the highest respective 3D-Jury score for each target. We also summed up the highest MaxS score for each target, giving an upper limit to possible improvements. Results for 3J1,Aare presented in Table 1, column Q%. The order of the five model ranking approaches is revealed by the grand total of MaxS: 3Ja,C(20,006) > 3J1,C(19,983) > 3J1,A(19,690) > 3 a,A (19,655) > first server model (19,039) (the sum of MaxS over the highest scoring models is 20,718). Table 1, column N j shows the number of targets where 3J1,Amade a better choice about the best model than the original server. In the case of pmodeller6 [9] and 3dpro [10], we can see that 3D-Jury 3J1,Apredicts more targets better, but its overall performance is slightly worse than the original servers'. The reason for this is that 3J1,A's more numerous choices of better models were not good enough to counteract its loss of MaxSub scores on the bad choices. In the case of inub [11] and BasD [12] the situation is inverse: 3J1,Aimproved fewer targets, but the net improvement is positive. For many servers the improvement – or worsening – of the targets is marginal (e.g. phyre-2 = 0.6%). Nevertheless we can see that even in these cases there is room for a 4 – 5% improvement (Table 1, column Q%, values in parentheses). Moreover, it appears that for at least 14 targets every server fails to pick the best model.

Table 1 Server prediction results improved by 3D-Jury. 3J1,A– the default on-line version of 3D-Jury, uses one model of the default servers [7]; N s – number of targets better predicted (in terms of MaxSub score) by the server; N j – number of targets better predictedby 3J1,A, in parentheses: number of improvable targets, i.e. those with a suboptimal choice of the first model; Q%, Q % m a x MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaeiyjaucabaGaemyBa0MaemyyaeMaemiEaGhaaaaa@32FD@ – see Methods: Measures for comparing model selection methods Servers are ordered by N j -N s descending, three servers with ∑MaxS s = 0 are not shown. Servers not improved by the re-ranking of models (N s > N j ) are shown in italics. 3J1,Aselects better models on the whole for 50 servers out of the 56 shown, considering either Q% or the number of targets. Re-ranking of models by 3D-Jury does not improve the performance of 6 servers.

3D-Jury scores as generic model reliability scores

In order to assess the advantage of using 3D-Jury scores as generic reliability scores we conducted a receiver operating characteristic (ROC) analysis adapted for CASP and Livebench [5] evaluation. The analysis shows how well a reliability score separates good models from bad ones, in terms of the average number of good models seen before encountering 1 to 11 bad models ( t p ¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqdaaqaaiabdsha0jabdchaWbaaaaa@2F97@ ). We compared the 3D-Jury scores returned by the on-line version 3J1,Ato the reliability scores of the original servers, when available. Results are shown in Table 2. The 3D-Jury score exceeds the original server score ( t p ¯ R MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqdaaqaaiabdsha0jabdchaWbaadaWgaaWcbaGaemOuaifabeaaaaa@30F0@ ) in 27 cases and falls short of it in only 5 cases out of the 38 analysed. The exceptions are pmodeller6 [9], pcons6 [2], ffas03 [13], inub [11] and shub [11].

Table 2 3D-Jury receiver operating characteristic (ROC) analysis. t p ¯ R MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqdaaqaaiabdsha0jabdchaWbaadaWgaaWcbaGaemOuaifabeaaaaa@30F0@ – average number of true positive (tp) models in the [0 – 10] false positive (fp) range, using the reliability score provided by the server as the discrimination threshold; t p ¯ J MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqdaaqaaiabdsha0jabdchaWbaadaWgaaWcbaGaemOsaOeabeaaaaa@30E0@ – average number of tp in the [0 – 10] fp range using 3D-Jury score as the discrimination threshold; J0 – lowest 3D-Jury score before observing the first bad model; t p J 0 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWG0baDcqWGWbaCdaWgaaWcbaGaemOsaO0aaSbaaWqaaiabicdaWaqabaaaleqaaaaa@31F5@ – number of good models at or above J0 score; N t – number of targets The table shows results for the on-line default version of 3D-Jury: 3J1,A. Servers are ordered by t p ¯ J MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqdaaqaaiabdsha0jabdchaWbaadaWgaaWcbaGaemOsaOeabeaaaaa@30E0@ descending. Missing t p ¯ R MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqdaaqaaiabdsha0jabdchaWbaadaWgaaWcbaGaemOuaifabeaaaaa@30F0@ values indicate servers that did not return reliability scores. Five servers with t p ¯ R > t p ¯ J MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqdaaqaaiabdsha0jabdchaWbaadaWgaaWcbaGaemOuaifabeaakiabg6da+maanaaabaGaemiDaqNaemiCaahaamaaBaaaleaacqWGkbGsaeqaaaaa@3636@ are shown in italics. In order to assess 3D-Jury scores (Jscore) as reliability scores, we performed a ROC analysis adapted for CASP and Livebench data, comparing Jscore to the reliability scores provided by the servers. In terms of the average number of true positive models ( t p ¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqdaaqaaiabdsha0jabdchaWbaaaaa@2F97@ ), the 3D-Jury score exceeds the original server score in 27 cases, it falls short of it in 5 cases out of the 38 analysed.

The J0 scores listed in Table 2 indicate the lowest 3D-Jury score seen before a bad model was encountered from the indicated server. In other words, no bad model above J0 score was seen in the test model set of the server. J0 scores are of practical value: they can be used as server-specific score thresholds, since a score above J0 is likely to indicate a good model.

3D-Jury scoring of user models

In order to encourage model selection and refinement using 3D-Jury, we introduced a new feature: instant 3D-Jury scoring of user models. This feature, available for any completed job by selecting the job in the Queue and uploading a model, enables the user to score a set of models and obtain a ranking based on the 3D-Jury score. Pop-up hints and an on-line tutorial [14], available from the job page, offer help with this new feature.

Conclusion

In this report we present the evaluation of 3D-Jury [7] on models gathered in CASP7. We found good correlation between the 3D-Jury score and a model quality measure: the number of correctly predicted residues. This correlation can be used to predict important model features such as the number of correctly positioned residues. Using Figure 2, 3D-Jury scores can be translated to the estimated number of correctly predicted residues. We plan to upgrade the on-line 3D-Jury to provide the 90%, 95% and 99% prediction intervals for the number of correctly predicted residues automatically.

3D-Jury, in general, also appears to boost server predictions by identifying better models. Our results show that 3D-Jury performs best when all models of all servers are used to calculate the J score. This option, however, is not feasible in the Meta Server since many of the servers participating in CASP7 are not currently available on-line. Nevertheless, 3J1,A, the provided on-line default presents a reasonable choice. We found that 3D-Jury scores can be used as generic reliability scores, an especially important feature for models that are not provided with such values. We have also extracted serverwise 3D-Jury score thresholds to help identifying reliable models. We report the release of a new Meta Server feature: instant 3D-Jury scoring of uploaded user models.

3D-Jury remains to be a valuable tool in the hands of protein structure modellers. Its ability to pinpoint the best server models is founded by the results of our analysis.

Methods

Test model set

In order to assess 3D-Jury we downloaded the complete set of server structure predictions from the Protein Structure Prediction Center [15]. Predictions from our partner servers (BasD [12], ffas03 [13], inub [11], mgenthreader [16], ORFeus-2 [17], pdbblast [18] and 3D-PSSM [19]) were added if missing.

Servers that predicted less than two targets and/or returned only one model for each target were excluded from the server model ranking tests (reported in Table 1). The resulting set contains 25,215 models for 85 targets from 59 servers – a 5 models per server average.

Models with Jscore = 0 were excluded from all correlation and regression analyses.

Server reliability scores (Rscore) that anti-correlate with model quality were multiplied by -1.

Model quality measures

MaxSub [8] score and N C α 3.5 Å MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGobGtdaWgaaWcbaGaem4qam0aaSbaaWqaaGGaciab=f7aHbqabaWccqGHKjYOcqaIZaWmcqGGUaGlcqaI1aqntCvAUfeBSjuyZL2yd9gzLbvyNv2CaeHbhv2BYDwAHbaceiGaa4xXaaqabaaaaa@413E@ (defined below) were used to measure the quality of models. Maxsub returns a score between 0.0 (incorrect prediction) and 1.0 (perfect prediction). In this study the score was multiplied by 10.0 as is customary on the 3D-Jury web pages [20]. We say that models with MaxS > 0 are good, while models with MaxS = 0 are bad.

N C α 3.5 Å MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGobGtdaWgaaWcbaGaem4qam0aaSbaaWqaaGGaciab=f7aHbqabaWccqGHKjYOcqaIZaWmcqGGUaGlcqaI1aqntCvAUfeBSjuyZL2yd9gzLbvyNv2CaeHbhv2BYDwAHbaceiGaa4xXaaqabaaaaa@413E@

is the number of C α atoms that are predicted within 3.5 Å from their respective locations in the solved structure, as reported by the MaxSub tool [8] operating on the C α atoms of the structures compared. We say that N C α 3.5 Å MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGobGtdaWgaaWcbaGaem4qam0aaSbaaWqaaGGaciab=f7aHbqabaWccqGHKjYOcqaIZaWmcqGGUaGlcqaI1aqntCvAUfeBSjuyZL2yd9gzLbvyNv2CaeHbhv2BYDwAHbaceiGaa4xXaaqabaaaaa@413E@ gives the number of correctly predicted residues.

3D-Jury model scoring

The 3D-Jury score of a model M is calculated by first comparing M to a set of other models available to the system for the same target. The way these other models are selected is a tunable parameter of 3D-Jury. M is compared to each selected model, and a pairwise similarity score (S M,i , for pair i) is assigned that equals to the number of respective C α atoms that are within 3.5 Å of each other after optimal superposition of the structures represented by their the C α atoms. MaxSub [8] is used to carry out this step. In case a pairwise similarity score falls below a certain cutoff value, it is set to zero. The 3D-Jury score (Jscore) of model M is the sum of its pairwise similarity scores divided by the number of these scores (n) + 1 [7]: J s c o r e M = i n S M , i n + 1 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGkbGscqWGZbWCcqWGJbWycqWGVbWBcqWGYbGCcqWGLbqzdaWgaaWcbaGaemyta0eabeaakiabg2da9maalaaabaWaaabCaeaacqWGtbWudaWgaaWcbaGaemyta0KaeiilaWIaemyAaKgabeaaaeaacqWGPbqAaeaacqWGUbGBa0GaeyyeIuoaaOqaaiabd6gaUjabgUcaRiabigdaXaaaaaa@440E@ .

3D-Jury parameters

3D-Jury offers three tunable parameters: the list of servers to draw models from for pairwise score calculation; the method of server model selection (applicable in case of multiple available models, the name of the method is shown in italics): first model, most similar (in terms of S M,i ) one, or all models; and the pairwise similarity score cutoff [7]. In this analysis we used the publicly available BasD [12], ffas03 [13], inub [11], mgenthreader [16], ORFeus-2 [17], pdbblast [18] and 3D-PSSM [19] as default servers and a constant similarity cutoff of 40 in order to simulate regular on-line use of the service.

3D-Jury operating modes

The four operating modes of 3D-Jury used in this report are: 3J1,A– uses one model of the default servers (a mode typical for on-line predictions); 3J a,A all models of default servers; 3J1,Cone model of all servers; 3J a,C all models of all servers.

Measures for comparing model selection methods

Q%– 3D-Jury vs. original server

Q % = ( M a x S j M a x S s 1 ) × 100 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaGaeiyjaucabeaakiabg2da9maabmaabaWaaSaaaeaadaaeabqaaiabd2eanjabdggaHjabdIha4jabdofatnaaBaaaleaacqWGQbGAaeqaaaqabeqaniabggHiLdaakeaadaaeabqaaiabd2eanjabdggaHjabdIha4jabdofatnaaBaaaleaacqWGZbWCaeqaaaqabeqaniabggHiLdaaaOGaeyOeI0IaeGymaedacaGLOaGaayzkaaGaey41aqRaeGymaeJaeGimaaJaeGimaadaaa@49B4@

MaxS j – sum of MaxSub scores of models selected by 3J1,A

MaxS s – sum of MaxSub scores of the server's first models

Q % m a x MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaeiyjaucabaGaemyBa0MaemyyaeMaemiEaGhaaaaa@32FD@ – 'best model' vs. original server

Q % m a x = ( m a x ( M a x S ) M a x S s 1 ) × 100 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaeiyjaucabaGaemyBa0MaemyyaeMaemiEaGhaaOGaeyypa0ZaaeWaaeaadaWcaaqaamaaqaeabaGaemyBa0MaemyyaeMaemiEaGhaleqabeqdcqGHris5aOWaaeWaaeaacqWGnbqtcqWGHbqycqWG4baEcqWGtbWuaiaawIcacaGLPaaaaeaadaaeabqaaiabd2eanjabdggaHjabdIha4jabdofatnaaBaaaleaacqWGZbWCaeqaaaqabeqaniabggHiLdaaaOGaeyOeI0IaeGymaedacaGLOaGaayzkaaGaey41aqRaeGymaeJaeGimaaJaeGimaadaaa@520E@

max(MaxS) – sum of the server's highest, best MaxSub scores per target

MaxS s – sum of MaxSub scores of the server's first models

Receiver operating characteristic (ROC) analysis

We performed a ROC analysis adapted for CASP and Livebench [18] model evaluation for each server. Server models were ordered by the original reliability score (Rscore, when available), or the 3D-Jury score (Jscore). The highest scoring models for each target were collected into separate sets M R and M J , corresponding to the Rscore or Jscore used for ordering. Models in both sets were ordered by their respective scores. Good models (MaxS > 0) were labelled positive, bad models (MaxS = 0) were labelled negative. Using Rscore or Jscore as the discrimination threshold, we plotted the number of true positives (tp) versus the number of false positives (fp) on the [0 – 10] fp range. This was to take into account the absolute number of targets predicted by the servers, focusing on the hardest targets. We used the number of true positives averaged over the [0 – 10] false positive range as a quality measure for the reliability scores, the higher values indicating better reliability scores.

Statistics and figures

Reported correlation coefficients are significant at the 95% significance level.

Statistics and figures were prepared using R [21].

Availability and requirements

Project name: Meta Server/3D-Jury

Project home page: http://meta.bioinfo.pl/

Operating system: Linux

Programming language: Perl

Other requirements: SQL server, web server, mail server, procmail

Licence: the web service is freely accessible to everybody

References

  1. Fischer D: Servers for protein structure prediction. Curr Opin Struct Biol 2006, 16(2):178–82.

    Article  CAS  PubMed  Google Scholar 

  2. Wallner B, Elofsson A: Pcons5: combining consensus, structural evaluation and fold recognition scores. Bioinformatics 2005, 21(23):4248–54.

    Article  CAS  PubMed  Google Scholar 

  3. 7th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction[http://www.predictioncenter.org/casp7/]

  4. Bujnicki JM, Elofsson A, Fischer D, Rychlewski L: Structure prediction meta server. Bioinformatics 2001, 17(8):750–1.

    Article  CAS  PubMed  Google Scholar 

  5. Rychlewski L, Fischer D: LiveBench-8: the large-scale, continuous assessment of automated protein structure prediction. Protein Sci 2005, 14: 240–5.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  6. Livebench-style evaluation of CASP 7 predictions[http://metav1.bioinfo.pl/results.pl?B=CASP&V=7]

  7. Ginalski K, Elofsson A, Fischer D, Rychlewski L: 3D-Jury: a simple approach to improve protein structure predictions. Bioinformatics 2003, 19(8):1015–8.

    Article  CAS  PubMed  Google Scholar 

  8. Siew N, Elofsson A, Rychlewski L, Fischer D: MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics 2000, 16(9):776–85.

    Article  CAS  PubMed  Google Scholar 

  9. Wallner B, Fang H, Elofsson A: Automatic consensus-based fold recognition using Pcons, ProQ, and Pmodeller. Proteins 2003, 53(Suppl 6):534–41.

    Article  CAS  PubMed  Google Scholar 

  10. Cheng J, Baldi P: A machine learning information retrieval approach to protein fold recognition. Bioinformatics 2006, 22(12):1456–63.

    Article  CAS  PubMed  Google Scholar 

  11. Fischer D: 3D-SHOTGUN: a novel, cooperative, fold-recognition meta-predictor. Proteins 2003, 51(3):434–41.

    Article  CAS  PubMed  Google Scholar 

  12. Ginalski K, von Grotthuss M, Grishin NV, Rychlewski L: Detecting distant homology with Meta-BASIC. Nucleic Acids Res 2004, (32 Web Server):W576–81.

    Google Scholar 

  13. Jaroszewski L, Rychlewski L, Li Z, Li W, Godzik A: FFAS03: a server for profile-profile sequence alignments. Nucleic Acids Res 2005, (32 Web Server):W284–8.

    Google Scholar 

  14. Guide to the BioInfoBank Meta Server 'Upload and score your model' feature[http://meta.bioinfo.pl/compare_your_model_example.pl]

  15. Protein Structure Prediction Center – CASP7 predictions[http://www.predictioncenter.org/casp7/SERVER_HTML/tarballs/]

  16. Bryson K, McGuffin LJ, Marsden RL, Ward JJ, Sodhi JS, Jones DT: Protein structure prediction servers at University College London. Nucleic Acids Res 2005, (33 Web Server):W36–8.

    Google Scholar 

  17. Ginalski K, Pas J, Wyrwicz LS, von Grotthuss M, Bujnicki JM, Rychlewski L: ORFeus: Detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acids Res 2003, 31(13):3804–7.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  18. Bujnicki JM, Elofsson A, Fischer D, Rychlewski L: LiveBench-1: continuous benchmarking of protein structure prediction servers. Protein Sci 2001, 10(2):352–61.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  19. Kelley LA, MacCallum RM, Sternberg MJ: Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol 2000, 299(2):499–520.

    Article  CAS  PubMed  Google Scholar 

  20. BioInfoBank Meta Server[http://meta.bioinfo.pl/]

  21. The R Project for Statistical Computing[http://www.r-project.org/]

  22. Hung LH, Ngan SC, Liu T, Samudrala R: PROTINFO: new algorithms for enhanced protein structure predictions. Nucleic Acids Res 2005, (33 Web Server):W77–80.

    Google Scholar 

  23. Xu J, Li M, Kim D, Xu Y: RAPTOR: optimal protein threading by linear programming. J Bioinform Comput Biol 2003, 1: 95–117.

    Article  CAS  PubMed  Google Scholar 

  24. Bates PA, Kelley LA, MacCallum RM, Sternberg MJ: Enhancement of protein modeling by human intervention in applying the automatic programs 3D-JIGSAW and 3D-PSSM. Proteins 2001, (Suppl 5):39–46.

    Google Scholar 

  25. Shi J, Blundell TL, Mizuguchi K: FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol 2001, 310: 243–57.

    Article  CAS  PubMed  Google Scholar 

  26. Yamaguchi A, Iwadate M, Suzuki E, Yura K, Kawakita S, Umeyama H, Go M: Enlarged FAMSBASE: protein 3D structure models of genome sequences for 41 species. Nucleic Acids Res 2003, 31: 463–8.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  27. Karplus K, Barrett C, Hughey R: Hidden Markov models for detecting remote protein homologies. Bioinformatics 1998, 14(10):846–56.

    Article  CAS  PubMed  Google Scholar 

  28. Heger A, Holm L: More for less in structural genomics. J Struct Funct Genomics 2003, 4(2–3):57–66.

    Article  CAS  PubMed  Google Scholar 

  29. Tosatto SCE, Albrecht M, Cestaro A, Toppo S, Valle G: Secondary Structure Prediction by Consensus and Homology.[http://www.forcasp.org/modules.php?name=Papers&file=article&sid=1731]

  30. Torda AE, Procter JB, Huber T: Wurst: a protein threading server with a structural scoring function, sequence profiles and optimized substitution matrices. Nucleic Acids Res 2004, (32 Web Server):W532–5.

    Google Scholar 

  31. Liu S, Zhang C, Liang S, Zhou Y: Fold recognition by concurrent use of solvent accessibility and residue depth. Proteins 2007, 68(3):636–645.

    Article  CAS  PubMed  Google Scholar 

  32. Teodorescu O, Galor T, Pillardy J, Elber R: Enriching the sequence substitution matrix by structural information. Proteins 2004, 54: 41–8.

    Article  CAS  PubMed  Google Scholar 

  33. Kurowski MA, Bujnicki JM: GeneSilico protein structure prediction meta-server. Nucleic Acids Res 2003, 31(13):3305–7.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  34. Karplus K, Karchin R, Draper J, Casper J, Mandel-Gutfreund Y, Diekhans M, Hughey R: Combining local-structure, fold-recognition, and new fold methods for protein structure prediction. Proteins 2003, 53(Suppl 6):491–6.

    Article  CAS  PubMed  Google Scholar 

  35. Kalisman N, Keasar C: Protein Structure Prediction with an Ant Lion Town Potential.[http://www.forcasp.org/modules.php?name=Papers&file=article&sid=1785]

  36. Tomii K, Akiyama Y: FORTE: a profile-profile comparison tool for protein fold recognition. Bioinformatics 2004, 20(4):594–5.

    Article  CAS  PubMed  Google Scholar 

  37. Kim DE, Chivian D, Baker D: Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res 2004, (32 Web Server):W526–31.

    Google Scholar 

  38. McGuffin LJ, Jones DT: Improvement of the GenTHREADER method for genomic fold recognition. Bioinformatics 2003, 19(7):874–81.

    Article  CAS  PubMed  Google Scholar 

  39. DeRonne KW, Karypis G: Effective optimization algorithms for fragment-assembly based protein structure prediction. Comput Syst Bioinformatics Conf 2006, 19–29.

    Chapter  Google Scholar 

  40. Karplus K, Karchin R, Barrett C, Tu S, Cline M, Diekhans M, Grate L, Casper J, Hughey R: What is the value added by human intervention in protein structure prediction? Proteins 2001, (Suppl 5):86–91.

    Google Scholar 

  41. Zhang Y, Arakaki AK, Skolnick J: TASSER: an automated method for the prediction of protein tertiary structures in CASP6. Proteins 2005, 61(Suppl 7):91–8.

    Article  CAS  PubMed  Google Scholar 

  42. Jin W, Furuta T, Park SJ, Koga N, Fujitsuka Y, Chikenji G, Takada S: ROKKY: structure prediction server that integrates PDB-BLAST, 3D-Jury, and the SimFold fragment assembly simulator.[http://www.forcasp.org/modules.php?name=Papers&file=article&sid=2195]

  43. Vullo A, Walsh I, Pollastri G: A two-stage approach for improved prediction of residue contact maps. BMC Bioinformatics 2006, 7: 180.

    Article  PubMed Central  PubMed  Google Scholar 

  44. Fischer D: Hybrid fold recognition: combining sequence derived properties with evolutionary information. Pac Symp Biocomput 2000, 119–30.

    Google Scholar 

  45. Wu S, Skolnick J, Zhang Y: Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biol 2007, 5: 17.

    Article  PubMed Central  PubMed  Google Scholar 

  46. Soding J, Biegert A, Lupas AN: The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 2005, (33 Web Server):W244–8.

    Google Scholar 

  47. Jaśkowski W, Blazewicz J, Lukasiak P, Milostan M, Krasnogor N: 3D-Judge – A Metaserver Approach to Protein Structure Prediction. Foundations of Computing and Decision Sciences 2007., 31: [http://www.cs.put.poznan.pl/wjaskowski/pub/papers/jaskowski073djudge.pdf]

    Google Scholar 

  48. Lund O, Hansen J, Brunak S, Bohr J: Relationship between protein structure and geometrical constraints. Protein Sci 1996, 5: 2217–25.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  49. Marin A, Pothier J, Zimmermann K, Gibrat JF: FROST: a filter-based fold recognition method. Proteins 2002, 49(4):493–509.

    Article  CAS  PubMed  Google Scholar 

  50. Canutescu AA, Shelenkov AA, Dunbrack RL Jr: A graph-theory algorithm for rapid protein side-chain prediction. Protein Sci 2003, 12: 2001–14.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors wish to thank the CASP organisers for their on-going efforts to maintain this important experiment and the developers of public protein structure prediction servers for providing their models for this analysis. This work was supported by the European Commission grants GeneFun (LSHG-CT-2004-503567) and BioSapiens (LSHG-CT-2003-503265).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to László Kaján.

Additional information

Authors' contributions

LK carried out the statistical analysis of the data, programmed the user model scoring feature and prepared the first draft of the manuscript. LR conceived of the study, coordinated it and revised this manuscript.

Authors’ original submitted files for images

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Kaján, L., Rychlewski, L. Evaluation of 3D-Jury on CASP7 models. BMC Bioinformatics 8, 304 (2007). https://doi.org/10.1186/1471-2105-8-304

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1471-2105-8-304

Keywords