Transmembrane proteins are divided into two main classes based upon their conformation: β-barrels and α-helical bundles. Computational methods based on learning are poorly tractable since the transmembrane structures are difficult to be determined by standard experimental methods. Generally, those structures are not only a series of β-strands or α-helices where each is bonded to the ones immediately before and after in the primary sequence, but they may contain Greek key, sometimes Jelly roll, motifs as well. This level of structure may be described as a permutation of the order of the bonded segments .
The algorithm is implemented for general β-barrels or α-bundles in which the Greek key motifs may occur as a topological signature. The protein folding problem is modeled into finding the longest closed path in a graph G(, ℰ, ν, ω) with respect to some given permutation, where:
• V is the set of vertices which represent segments (amino acid subsequences) satisfying given conformational constraints (e.g. strand length, propensity to be a β-strand). A simple Markov model is used to discard the segments unlikely to be a β-strand or an α-helix.
• E is the set of edges connecting every two segments satisfying given adjacency constraints (e.g. length, ﬂexibility, propensity of the turn or loop in between).
• ν, φ and λ are weight functions defined for every segment, pairing segments and turn or loop respectively, based on hydrophobicity, electrostatic interaction and environmental effects (e.g. extracellular, intracellular effect, membrane thickness). These functions are tuned accordingly to the studied class of proteins.
By dynamic programming, the algorithm runs in O(n3) for an identity permutation, and at most O(n5) for the Greek key motifs, where n is the number of amino acids. Finally, a three-dimensional structure is constructed using the geometric criteria.
The prediction accuracy, for the class of β-barrel transmembrane proteins, evaluated by the percentage of well-predicted residues, reaches 70–85%. The number of strands is found correctly, whereas another main geometric characteristic of β-barrel, the shear number, is relatively suitable. The running time for a graph of 1400 vertices (potential segments in a sequence of 180 amino acids) is about 40 seconds. These results compare favorably to existing works as [2-4].
Our algorithm can be used to predict the structure of different families of proteins and is tested with the β-barrel transmembrane proteins. We consider to carry out a screening on genomes of certain species such as Paramecium and Neisseria meningitidis.