Enzyme catalysis is involved in numerous biological processes and the disruption of enzymatic activity has been implicated in human disease. Despite the functional importance, various aspects of catalytic reactions are not completely understood, such as the mechanics of reaction chemistry and the geometry of catalytic residues within active sites. As a result, the computational prediction of catalytic residues has the potential to identify novel catalytic pockets, aid in the design of more efficient enzymes and also predict the molecular basis of disease.
We proposed a new kernel-based algorithm for the prediction of catalytic residues and functional sites in general in protein structures . The method relies upon explicit modelling of similarity between residue-centred neighbourhoods in protein structures. Specifically, we start with a construction of oriented structural neighbourhoods followed by separating the neighbourhood volume into small cells. The similarities between two structural neighbourhoods are accumulation of their similarity in each cell. The kernel function is a product of three kernels, each addressing a separate aspect of protein function: (i) the geometric kernel addresses the shape similarity, (ii) the chemical kernel addresses the similarity in physicochemical properties, and (iii) the evolutionary kernel addresses the evolutionary similarity of conservation patterns for the residues in two structural neighbourhoods. Our approach was favourably evaluated against two of the leading alternative approaches, FEATURE  and GBT , as shown in Table 1. The new algorithm was used to identify known mutations associated with inherited disease whose molecular mechanism might be predicted to operate specifically though the loss or gain of catalytic residues. It should therefore provide a viable approach in identifying the molecular basis of disease in which the loss or gain of function is not caused solely by the disruption of protein stability. Our analysis suggests that both loss and gain of catalytic residues are actively involved in human inherited disease.
Table 1. Performance comparison between the three methods of catalytic residue prediction when evaluation was carried out by chain, family, superfamily and fold. Methods were evaluated on the same data set using 10-fold cross-validation. sn means the sensitivity when specificity is 0.95.
Our kernel method for functional sites prediction based on protein structures evaluates favourably against established methods on the same data set using the same evaluation procedure. The results from applying our catalytic residue predictor to disease mutations indicated that both loss and gain of catalytic residues are actively involved in human inherited disease.