|
Prediction results for binding sites in 62 Proteins with different data sets used for generating PSSM. |
||||
| Reference Data |
Overall Correct predictions (%) |
Sensitivity (S1) % |
Specificity (S2) % |
Net Prediction (S1+S2)/2 % |
|
|
||||
| Sequence only (No PSSM) |
73.6 |
40.6 |
76.2 |
58.4(2.5) |
| PDNA-NR90 375 sequences |
63.8 |
65.9 |
63.4 |
64.6(2.1) |
| PDNA-RDN 1386 sequences |
64.0 |
67.1 |
63.3 |
65.2(2.1) |
| NCBI-NR 1,547,365 sequences |
66.7 |
69.5 |
63.9 |
66.7(1.4) |
| PDB-ALL 47,179 sequences |
62.6 |
65.6 |
61.8 |
64.7(1.8) |
| PIR 283,177 sequences |
66.4 |
68.2 |
66.0 |
67.1(2.7) |
|
PDNA refers to sequences from Protein-DNA complexes in the Protein Data Bank; NR90 means non-redundant at 90% sequence identity; RDN means data is redundant because similar proteins have not been removed. Values in the brackets show the standard deviation in values obtained from six cross-validation sets. Note that the sensitivity and specificity values shown in this table only refer to those values which sum up to give the best net prediction. These two scores can be mutually adjusted by changing cutoff threshold as described in the text and hence comparison between the data sets should only be made for the net prediction value (the last column) which is the score optimized during training. | ||||
Ahmad and Sarai BMC Bioinformatics 2005 6:33 doi:10.1186/1471-2105-6-33 |
||||