Extraction of consensus protein patterns in regions containing non-proline cis peptide bonds and their functional assessment
1 Unit of Medical Technology and Intelligent Information Systems, Dept. of Materials Science and Engineering, University of Ioannina, GR 45110, Ioannina, Greece
2 Institute of Biomedical Technology, CERETETH, GR 38500, Larissa, Greece
3 Dept. of Medical Physics, Medical School, University of Ioannina, GR 45110, Ioannina, Greece
4 Biomedical Research Institute, Foundation for Research and Technology-Hellas, University of Ioannina, GR 45110, Ioannina, Greece
5 Dept. of Biological Applications and Technology, University of Ioannina, GR 45110, Ioannina, Greece
BMC Bioinformatics 2011, 12:142 doi:10.1186/1471-2105-12-142Published: 10 May 2011
In peptides and proteins, only a small percentile of peptide bonds adopts the cis configuration. Especially in the case of amide peptide bonds, the amount of cis conformations is quite limited thus hampering systematic studies, until recently. However, lately the emerging population of databases with more 3D structures of proteins has produced a considerable number of sequences containing non-proline cis formations (cis-nonPro).
In our work, we extract regular expression-type patterns that are descriptive of regions surrounding the cis-nonPro formations. For this purpose, three types of pattern discovery are performed: i) exact pattern discovery, ii) pattern discovery using a chemical equivalency set, and iii) pattern discovery using a structural equivalency set. Afterwards, using each pattern as predicate, we search the Eukaryotic Linear Motif (ELM) resource to identify potential functional implications of regions with cis-nonPro peptide bonds. The patterns extracted from each type of pattern discovery are further employed, in order to formulate a pattern-based classifier, which is used to discriminate between cis-nonPro and trans-nonPro formations.
In terms of functional implications, we observe a significant association of cis-nonPro peptide bonds towards ligand/binding functionalities. As for the pattern-based classification scheme, the highest results were obtained using the structural equivalency set, which yielded 70% accuracy, 77% sensitivity and 63% specificity.