Email updates

Keep up to date with the latest news and content from BMC Structural Biology and BioMed Central.

Open Access Research article

"Hot cores" in proteins: Comparative analysis of the apolar contact area in structures from hyper/thermophilic and mesophilic organisms

Alessandro Paiardini*, Riccardo Sali, Francesco Bossa and Stefano Pascarella

Author Affiliations

Dipartimento di Scienze Biochimiche "A. Rossi Fanelli", Università La Sapienza, P.le A. Moro 5, 00185 Roma, Italy

For all author emails, please log on.

BMC Structural Biology 2008, 8:14  doi:10.1186/1472-6807-8-14

The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1472-6807/8/14


Received:28 June 2007
Accepted:29 February 2008
Published:29 February 2008

© 2008 Paiardini et al; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

A wide variety of stabilizing factors have been invoked so far to elucidate the structural basis of protein thermostability. These include, amongst the others, a higher number of ion-pairs interactions and hydrogen bonds, together with a better packing of hydrophobic residues. It has been frequently observed that packing of hydrophobic side chains is improved in hyperthermophilic proteins, when compared to their mesophilic counterparts. In this work, protein crystal structures from hyper/thermophilic organisms and their mesophilic homologs have been compared, in order to quantify the difference of apolar contact area and to assess the role played by the hydrophobic contacts in the stabilization of the protein core, at high temperatures.

Results

The construction of two datasets was carried out so as to satisfy several restrictive criteria, such as minimum redundancy, resolution and R-value thresholds and lack of any structural defect in the collected structures. This approach allowed to quantify with relatively high precision the apolar contact area between interacting residues, reducing the uncertainty due to the position of atoms in the crystal structures, the redundancy of data and the size of the dataset. To identify the common core regions of these proteins, the study was focused on segments that conserve a similar main chain conformation in the structures analyzed, excluding the intervening regions whose structure differs markedly. The results indicated that hyperthermophilic proteins underwent a significant increase of the hydrophobic contact area contributed by those residues composing the alpha-helices of the structurally conserved regions.

Conclusion

This study indicates the decreased flexibility of alpha-helices in proteins core as a major factor contributing to the enhanced termostability of a number of hyperthermophilic proteins. This effect, in turn, may be due to an increased number of buried methyl groups in the protein core and/or a better packing of alpha-helices with the rest of the structure, caused by the presence of hydrophobic beta-branched side chains.

Background

Earth's environments exhibit the most diverse physico-chemical conditions, including extremes of temperature, pressure, salinity and pH. Among these factors, temperature certainly exerts a deep selective pressure on cell biochemistry and physiology [1]. Indeed, temperatures approaching 100°C usually denature proteins and nucleic acids, and increase the fluidity of membranes to lethal levels [2]. It is therefore of great interest to study how organisms coped with the molecular adaptations required to thrive in extreme environments, particularly at high temperatures. Such organisms, which are distributed among the three domains of life, are called "thermophiles" or "hyperthermophiles", if they exhibit an optimal growth in either a 45°C – 80°C or a 80°C – 110°C temperature range, respectively [3].

To date, a number of studies has been carried out to understand how proteins found in hyper/thermophilic organisms are stabilized [1-6]. Thanks to the wealth of sequence and structural information available today on hyper/thermophilic proteins, it is becoming clear that there is not a general rule for the stabilization of proteins at high temperatures. Rather, an increased thermal stability seems to be achieved through a combination of different small structural modifications involving, amongst the others, ion-pairs interactions, hydrogen bonds and packing of hydrophobic residues [6].

Regarding the latter, one frequently invoked theory is that the packing of hydrophobic side chains is improved in thermophilic and hyperthermophilic proteins, when compared to their mesophilic counterparts [7]. Many studies on proteins adaptation to high temperatures focused on the differences in compactness between hyper/thermophilic and mesophilic proteins using accessible surface area [6] or cavity size [8] as judgment criteria. However, as discussed by Robinson-Rechavi and Godzik [9], and by Gromiha [10], these approaches present several drawbacks, e.g., the individual contribution to the enhanced thermostability of different structural environments and inter-residue contacts cannot be assessed. Hence, alternative ways to quantify protein compactness were adopted. For example, Gromiha [10] analyzed the long range and inter-residue contacts in mesophilic and thermophilic proteins of sixteen different protein families, and found that an increase in contacts between hydrogen-bond forming residues increases protein stability. Very recently, the contact order [11] is receiving increasing attention, thanks to the findings obtained by Godzik and his research group [9,12], who found that hyperthermophilic proteins from T. maritima have higher contact order than their mesophilic counterparts. Most importantly, contact order is correlated to the folding rate of proteins that fold with a two-states mechanism [11].

However, a severe limitation of this and other [10,13] studies is that two residues are considered to be in contact if the distance between their Cα atoms or between one atom and any other atom is below an arbitrary threshold. For example, Robinson-Rechavi et al. [12] considered two residues to be in contact if any of their atoms are closer than 4.5 Å, while Gromiha [10] made use of a sphere of 8.0 Å centered on Cα atoms to define long-range contacts. Furthermore, this approach bears another important drawback: it does not permit to quantify the hydrophobic contact area between two interacting residues. The hydrophobic contact area between buried residues represents in fact an indirect measure of both entropic (entropy change due to the rearrangement of the local water molecules as two hydrophobic residues interact [14]) and enthalpic (van der Waals forces in protein core, due to tight packing of neighboring residues [4]) effects (Figure 1).

thumbnailFigure 1. Computation of the apolar contact area. A-B) Initially, for each amino acid pair (in this case two sample residues, Phe and Lys, are considered), the Van der Walls surface is generated. C) Then, the solvent accessible surface is computed. D) The latter is used to compute the hydrophobic contact surface between the two interacting residues.

Therefore, despite a series of experimental and theoretical studies on the molecular mechanisms of protein folding [15,16] and stability [3,9,17] argued that the hydrophobic contacts play a role of paramount importance in such processes, the difference of apolar contact area between large datasets of proteins from hyper/thermophilic organisms and their mesophilic homologs, to our knowledge, has been never quantified.

Such consideration, along with the wealth of information provided very recently by structural genomics projects, prompted the comparison of a large number of protein crystal structures from hyper/thermophilic organisms and their mesophilic homologs, in order to assess the role played by the hydrophobic contacts in the stabilization of the protein core, at high temperatures.

Results

Analysis of the Apolar Contact Area

Two datasets were obtained from a collection of 1563 hyperthermophilic and thermophilic proteins, retrieved from structural databases using several keywords (see Methods section; Table 1 and 2). In the first case a choice criteria favouring quality over quantity of data yielded a non redundant dataset, which will be referred to as "A", including 38 crystal structures, lacking any structural defect and displaying a maximum resolution of 2.0 Å and a maximum R-value of 0.25. Dataset A represents a subset of a second dataset, which will be referred to as "B". Dataset B is composed of 59 crystal structures lacking any structural defect, displaying a maximum resolution of 3.0 Å and a maximum R-value of 0.30. For each structure composing the two datasets, a mesophilic homologous counterpart was collected, following the same above mentioned choice criteria. The computation of the total apolar contact area (ACA) between the residues of each structure pair composing dataset A and B was then carried out. The statistical significance of the observed differences of ACA between hyper/thermophilic proteins and their mesophilic counterparts was assessed with a paired t-test. The results are reported in Table 3 (see also Additional file 1 for additional information). T-test values are expressed as the associated probability P of acceptance of the null hypothesis, that is, there are no significant differences of ACA between hyper/thermophilic and mesophilic pairs. T-values scoring > 2.0 (P(t) < 0.05) are considered statistically significant. Figure 2 shows the difference of apolar contact area computed over the whole structures of the protein pairs composing the two analysed datasets. The obtained values were normalized by the sequence length of each protein. In dataset A, 22 (13 hyperthermophilic/mesophilic and 9 thermophilic/mesophilic protein pairs) of the 38 considered protein pairs showed an increase of the ACA (Figure 2A); the corresponding P(t) was ~0.086 (0.079 for hyperthermophiles and 0.690 for thermophiles). In dataset B, 38 (24 hyperthermophilic/mesophilic and 14 thermophilic/mesophilic protein pairs) of the 59 protein pairs showed an increase of the ACA (Figure 2B); the corresponding P(t) was ~0.012 (0.020 for hyperthermophiles and 0.474 for thermophiles). Although the obtained differences were not considered statistically significant, according to the t-test validation analysis, for both datasets (Table 3), nonetheless they indicated a general increase of the apolar contact area in hyperthermophilic proteins, compared to their mesophilic counterparts.

Additional file 1. Paired T-Test analysis, datasets and distribution of data. This file includes the datasets A and B, described in this paper, and the statistical analysis of the distribution of ACA.

Format: XLS Size: 776KB Download file

This file can be viewed with: Microsoft Excel ViewerOpen Data

thumbnailFigure 2. Differences in the apolar contact area (ΔACA) for each protein pair, composing dataset A and B, computed over the whole protein structure. Values for hyperthermophilic/mesophilic protein pairs and thermophilic/mesophilic pairs are expressed in Å2/residue and represented as light grey and dark grey bars, respectively. Numbers on X-axis refer to Table 1 (A) and Table 2 (B).

Table 1. Hyperthermophilic/Mesophilic (1–24) and Thermophilic/Mesophilic (25–38) pairs in dataset A*

Table 2. Hyperthermophilic/Mesophilic (1–38) and Thermophilic/Mesophilic (39–59) pairs in dataset B

Table 3. T-tests results for the ACA distributions, measured in different structural environments*

A more detailed analysis on the structurally conserved regions [18] (SCRs; see methods section) of the structures composing dataset A and B indicated that, in both datasets, a number of hyperthermophilic proteins underwent a highly significant (P(t) < 0.001) increase of the hydrophobic contact area of those residues composing the SCRs (Figure 3; Table 3). SCRs were defined as regions displaying a similar local conformation, lacking insertions and deletions and composed of at least three consecutive residues. SCRs are therefore protein segments that conserve the same main-chain conformation in each pair of structures analysed, excluding the intervening regions whose structure differs markedly amongst different proteins [19]. Considering the role of great importance played by the hydrophobic contacts in stabilizing and possibly driving the protein folding mechanism, it seemed interesting to analyse how, during evolution, the SCRs coped with the modifications of the hydrophobic contacts necessary to achieve the correct fold at high temperatures. In dataset A (Figure 3A), 22 (17 hyperthermophilic/mesophilic and 5 thermophilic/mesophilic protein pairs, respectively) of the 38 considered protein pairs showed an increase of the ACA (P(t) ~0.0029). The same trend was also observed for dataset B (Figure 3B), in which 37 of 59 protein pairs (27 hyperthermophilic/mesophilic and 10 thermophilic/mesophilic) displayed an increased ACA in the direction mesophile → hyper/thermophile (P(t) ~0.0001). The measured mean ΔACA was 0.39 Å2/residue and 0.37 Å2/residue for datasets A and B, respectively. However, if only the hyperthermophilic/mesophilic pairs were considered, the mean ΔACA was 0.74 Å2/residue and 0.63 Å2/residue for datasets A and B, respectively. The maximum measured difference was 2.92 Å2/residue for the pair 1V7R/1K7K (nucleotide triphosphate pyrophosphatase from P. horikoshii/E. coli). Since these quite high differences of ACA can be due to other factors than acquired thermostability (i.e., different overall conformations), the t-test validation analysis was repeated without these extreme pairs, obtaining again not significant results (see "Methods" section and supplementary material).

thumbnailFigure 3. Differences in the apolar contact area (ΔACA) for each protein pair, composing dataset A and B, computed over the SCRs. Values for hyperthermophilic/mesophilic protein pairs and thermophilic/mesophilic pairs are expressed in Å2/residue and represented as light grey and dark grey bars, respectively. Numbers on X-axis refer to Table 1 (A) and Table 2 (B).

To get a deeper insight into the statistically significant increase of the hydrophobic contact area of protein cores from hyperthermophilic organisms, the possible occurrence of a larger amount of hydrophobic contact area has been examined in different secondary structure elements. In dataset A (Figure 4A), 16 out of the 24 hyperthermophilic proteins considered showed an increase of ACA in the α-helices of the protein core, compared to their mesophilic counterparts, while in dataset B (Figure 4B) the same ratio was 25 out of 37 proteins, with a measured significance P(t) ~0.0524 and P(t) ~0.0113 for datasets A and B, respectively. Although in this latter case significant deviations from normality, as judged by the application of the Shapiro-Wilk normality test, were observed for the distribution of mesophilic values, nonetheless removing three outliers gave a Shapiro-Wilk P(t) ~0.62 and a t-test P(t) ~0.001. These results indicated that α-helices are mainly involved in the increased amount of hydrophobic contact area which was observed comparing hyperthermophilic/mesophilic proteins. Conversely, no statistically significant trends have been observed in the comparison of the ACA in the β-strands of the SCRs (Table 3). In dataset A, 21 (14 hyperthermophilic/mesophilic protein pairs) of the 38 considered protein pairs showed an increase of the ACA, while in dataset B, 34 (24 hyperthermophilic/mesophilic proteins) of the 59 pairs exhibited an increase of the ACA. The mean value of ΔACA is -0.02 Å2/residue and 0.34 Å2/residue for dataset A and B. Therefore, at least for the hyperthermophilic/mesophilic protein pairs, it can be concluded that the statistically significant increase of the hydrophobic contact area of protein cores involves mainly the α-helices and not the β-strands.

thumbnailFigure 4. Differences in the apolar contact area (ΔACA) for each protein pair, composing dataset A and B, computed over the α-helices of the SCRs. Values for hyperthermophilic/mesophilic protein pairs and thermophilic/mesophilic pairs are expressed in Å2/residue and represented as light grey and dark grey bars, respectively. Numbers on X-axis refer to Table 1 (A) and Table 2 (B).

Differences in the amino acid composition of the residues involved in conserved hydrophobic contacts

The differences of amino acid composition of the residues involved in conserved hydrophobic contacts (CHCs; Table 4) [19] between hyperthermophilic proteins and their mesophilic counterparts is expressed in units of standard deviation from the measured mean value, Raa. Raa values > 0 or < 0 indicate, respectively, a frequency of residue type aa higher or lower than the expected mean. Raa values ≥ 3.0 standard deviations (P ≤ 0.01) from the mean value (that approximates zero) were considered statistically significant. Compositional analysis shows no statistically significant differences between hyperthermophilic and mesophilic proteins, regarding the identity of the residues involved in the formation of hydrophobic contacts, except for isoleucine, that scored at ~3.6 standard deviations from the mean in both datasets A and B. It is important to emphasize that, in evaluating the differences of amino acid composition of the residues involved in conserved hydrophobic contacts, dataset B, containing 13 hyperthermophilic/mesophilic protein pairs more than dataset A, is probably more confident. In any case, since both datasets A and B gave very similar results, the role played by isoleucine is probably independent from the number and type of structures analysed.

Table 4. Amino acid composition of CHCs*

Preferred amino acid interactions in conserved hydrophobic contacts

In order to further investigate the statistically significant increase of isoleucine in CHCs of hyperthermophilic proteins, compared to their mesophilic counterparts, an analysis was carried out to infer which amino acid pairs are preferred in the formation of hydrophobic contacts. Preferred amino acid pairs forming hydrophobic contacts were identified by computing the number of times a particular pair of residues comprised in SCRs makes a hydrophobic contact, displaying an apolar contact area > 0.0 Å2. The results of this analysis are shown in Tables 5 and 6, where each element ij of the interaction matrix reports, in units of standard deviation from the mean value, the measured frequency of interaction between residue i and residue j. For dataset A, accounting for 17864 apolar contacts, five types of interactions (Ile/Ala, Ile/Val, Ile/Phe, Ile/Ile and Ile/Leu) showed a frequency ≥ 3.0 standard deviations from the mean value; in every case, isoleucine is involved in such interactions. Similar results were obtained for dataset B, where 33546 interactions were counted: of six types of interactions scoring at > 3.0 standard deviations, five (Ile/Ala, Ile/Val, Ile/Tyr, Ile/Ile and Ile/Leu) involved the amino acid isoleucine. The other statistically significant interaction is between glutamate and lysine, scoring at 3.28 standard deviations from the mean. The closeness between the apolar atoms composing Glu and Lys residues might be only a secondary effect in the generation of strong ion-pairs between these two residues.

Table 5. Preferred amino acid interactions in CHCs. Hyperthermophilic versus mesophilic proteins of dataset A are compared*

Table 6. Preferred amino acid interactions in CHCs. Thermophilic versus mesophilic proteins of dataset B are compared*

Preferred amino acid substitutions in conserved hydrophobic contacts

Favoured amino acid substitutions between the hyperthermophilic and mesophilic proteins were calculated from the results obtained by the CHC_FIND tool [19]. The residues exchange analysis was indeed limited to the identified conserved hydrophobic contacts. The obtained substitution matrices are shown in Tables 7 and 8. Values are expressed in units of standard deviation from the mean. Only values scoring at 3.0 standard deviations or more from the mean were considered statistically significant. Again, almost all of the most significant exchanges involve isoleucine in both datasets (dataset A: Val→Ile 6.32, Leu→Ile 6.36; dataset B: Val→Ile 6.39, Leu→Ile 6.84 and Phe→Ile 3.12). These exchanges are reflected in the variation of average amino acid composition of hyperthermophiles (Table 4), where a marked increase of isoleucine content can be detected. The only other exchange observed not involving isoleucine is Ala→Val, scoring at 3.20 standard deviations from the mean.

Table 7. Preferred amino acid substitutions in CHCs. Hyperthermophilic versus mesophilic proteins of dataset A are compared*

Table 8. Preferred amino acid substitutions in CHCs. Hyperthermophilic versus mesophilic proteins of dataset B are compared*

Discussion

The main goal of this study was to evaluate on a quantitative basis the relationship between hydrophobic contacts and proteins adaptation to high temperatures.

An essential prerequisite to carry out such a study is to assemble a large and minimally redundant set of very high resolution crystal structures. Indeed, despite the observation that each protein family seems to adopt different structural strategies to adapt to high temperatures [5], common trends may be outlined if a large number of structural data is available [8]. At the same time, since computed values of apolar contact area are mostly influenced by the relative position of the interacting residues, their precision is affected by the resolution of the crystal structures analysed. Therefore two datasets were culled from a set of 1563 crystal structures from thermophilic (optimal growth temperature between 50°C and 80°C) and hyperthermophilic (optimal growth temperature above 80°C) organisms, and their mesophilic counterparts. The rationale of this choice was to assure that the obtained results were not biased either by the paucity of data, or by the quality of the collected crystal structures.

As already discussed by Chen et al. [7], the increase of the apolar contact area in hyperthermophilic and thermophilic proteins may be achieved at least by two different mechanisms: an evenly distributed increase over all residues; a local increase over key residues. The latter mechanism, that has been shown to be a major contribute to the enhanced thermostability of proteins from T. maritima [9], seems to involve mainly residues already implied in the formation of hydrophobic contacts. This suggests that a better compactness may originate from an even better connectivity in those protein regions that already have a tendency to compactness and not by simply "tightening the loops" [9]. The results obtained in this work on the difference of apolar contact area (ΔACA) agree with this hypothesis: a significant increase of ACA was measured in both datasets only when the analysis was limited to the SCRs of the hyperthermophilic structures. The SCRs were presumably subject to similar constraints during the divergent evolution of a family of proteins from a common ancestor, and therefore they possibly contain most of the determinants necessary to maintain the fold. Considering the role played by hydrophobic contacts in this sense, it is not surprising that the residues composing the SCRs and engaging hydrophobic contacts were mostly involved in the structural modifications necessary to achieve and maintain a proper fold at high temperatures. Moreover, the finding that the measure of the difference of ACA resulted highly significant only when limited to the SCRs, could explain some apparently not significant results previously obtained by measuring accessible surface area [8] or cavity size [6].

The statistically significant increase of ~0.75 Å2/residue of apolar contact area was observed only in the SCRs of hyperthermophilic proteins. Therefore, it can be argued that proteins from thermophilic organisms usually adopt different strategies to enhance thermostability. Indeed, it has been demonstrated that moderately and extremely thermostable proteins rely on different mechanisms to achieve greater stability [8,20]. Ion-pairs interactions represent presumably a predominant force in thermophilic proteins, as well as in many hyperthermophilic proteins [8,21]. On the other hand, comparisons of mesophilic and hyperthermophilic protein structures indicate that the hydrophobic effect has a contribution to stability only at high temperatures, while only moderately thermophilic proteins show an increase in the polarity of their exposed surface [20]. Two factors could be responsible for this difference: the temperature dependence of the thermodynamic forces involved in protein stabilization, and/or the phylogenetic origin of the extremely thermophilic organisms, that belong to the domain Archaea, and are therefore distinct from moderately thermophilic organisms, which are mostly Bacteria. In any case, the obtained results strongly suggest that packing of hyperthermophilic proteins, in comparison with their mesophilic homologs, has improved significantly, and it is reasonable to deduce that this increased amount of apolar contact area contributes to the stabilization of the native state of the protein.

Our analysis revealed that α-helices were mainly involved in the increased amount of ACA. Surprisingly, no statistically significant trends have been observed in the comparison of the ACA in the β-strands of the SCRs. We cannot provide a clear explanation of this different behaviour between secondary structures. An intriguing possibility is that β-strands are, generally, already almost optimally packed, even in mesophilic proteins, resulting in a small margin of improvement. However, it is also possible that this observation is due to 'sample bias' e.g., the peculiarities of the available protein structures.

Structural stabilization of α-helices in protein cores may therefore represent a component of great importance for the enhanced termostability of hyperthermophilic proteins. A number of studies in the past has stressed the importance of the enhanced stability of α-helices as a general feature of many hyperthermophilic proteins. In order to investigate the role of α-helices in protein thermostability, Petukhov et al. [22] compared energy characteristics of α-helices from four families of hyperthermophilic and mesophilic proteins, using statistical mechanical theory for describing helix/coil transitions. They found that the magnitude of the observed decrease in intrinsic free energy on α-helix formation of the thermostable proteins was sufficient to explain the experimentally determined increase of their thermostability. Furthermore, protein engineering studies showed that a well-packed α-helix structure is related to large increase in thermostability [23,24]. It is well known that the flexibility of α-helices is often required to assure protein function, such as conformational transitions in substrate binding or protein-protein interactions [25]. However, an excessive flexibility of this secondary structure element, at high temperatures, could result in an insufficient stability to maintain its native conformation, causing the entire protein to unfold.

According to thermodynamic studies on model peptides in aqueous environments, two main factors appear to play a key role in the structural stability of the α-helices: the presence of amino acids with intrinsic helical propensity, and side chain-side chain interactions [26,27]. Therefore, we further investigated the nature of the increased stabilization of α-helices composing the SCRs of hyperthermostable proteins, determining the differences in amino acid composition of the residues involved in CHCs. The results of this analysis strongly suggest that isoleucine and, to a lesser extent valine, mostly to the detriment of leucine, are involved in the formation of more hydrophobic contacts in hyperthermophilic proteins, compared to their mesophilic counterparts. Likewise, the importance of isoleucine in the formation of CHCs of hyperthermophilic proteins was confirmed by the analysis of the preferred amino acid interactions in CHCs, where almost all types of interactions scoring at > 3.0 standard deviations involved the amino acid isoleucine, and by the favoured amino acid substitutions between the hyperthermophilic and mesophilic proteins in CHCs. A large amount of theoretical and experimental studies demonstrates the importance of isoleucine in the stabilization of protein structures from thermophilic organisms. Malakauskas and Mayo [24] reported the computer-aided engineering of a seven-fold mutant of the β1 domain of the Streptococcal protein G, exhibiting a melting temperature above 100°C and an enhancement in thermodynamic stability of 4.3 kcal mol-1 at 50°C over the wild-type protein. Of seven mutations, five were of type XXX→ Ile, and they improved side-chain packing in the interior of the protein. An increased content of isoleucine in thermophilic and hyperthermophilic proteins, to the detriment of leucine, was also noted by Haney et al. [28] and Kumar et al. [6]. More recently, a structural genomics based study carried out by Chakravarty and Varadarajan [29] reported that leucine is preferentially substituted by the β-branched residues valine and isoleucine, at buried sites.

Several studies have demonstrated in the past that leucine has a slightly higher α-helix propensity than isoleucine and, generally, β-branched residues [27,30]. This assumption, which is apparently in contrast with the results obtained by this work, derives from substitution experiments in short polyalanine α-helices-forming peptides in water [31]. This process is mainly associated with the loss of conformational entropy of residues during the folding of α-helices in an aqueous environment: freezing side chain with fewer internal rotational degrees in the α-helix conformation would be entropically less expensive. However, it must be noted that these experiments, and many derived propensity scales, do not take into account solvent entropy effects. As discussed by Creamer and Rose [30], neglect of solvent entropy appears justified for a peptide side chain because no significant differences in solvation energy are expected in the side chain of a solitary polyalanyl helix during a helix-coil transition. In either case, the side chain is highly solvent-exposed. The same situation would not be appropriate for a protein helix that, upon association with the remainder of the molecule, engages a solvent-shielded interaction surface. In this study, only the α-helices composing the SCRs and therefore mostly found in the protein core were considered for further investigation. Therefore, application of helix propensity scales might be not appropriate in this case. For example, Li and Deber [15] have shown that α-helices propensity scales are not appropriate for non aqueous environments and that β-branched amino acids, as valine and isoleucine, rank among the best helix promoters in an apolar environment, as a lipid bilayer.

On the other side, hydrophobic contacts deriving by side chain interactions could play a role of great importance in the stabilization of the α-helices composing the SCRs of hyperthermostable proteins. At temperatures above 80°C, the hydrophobic effect, that is considered to be a dominant force in protein folding [32,33], is mainly enthalpy driven [34]. In fact, while at high temperatures the entropy contribution to the protein stability tends to zero, the loss or gain of van der Waals interactions acquires increased importance. For example, constructing 15 Barnase mutants in which hydrophobic interactions were deleted, Serrano et al. [35] found a strong correlation between the degree of Barnase destabilization and the number of methyl side chain groups that were lost (r = 0.91). These data agree with the preferred substitutions (RAla→Val = 3.20; RVal→Ile = 6.31) observed in the CHCs of our datasets.

Conclusion

In conclusion, taken together the obtained results indicate the preference, in the hydrophobic contacts, for isoleucine and valine residues as an important feature contributing to the enhanced thermostability of α-helices in hyperthermophilic proteins, possibly occurring through a decreased flexibility of these elements of secondary structure. This effect, in turn, may be due to an increased number of buried methyl groups in protein core and/or a better packing of α-helices with the rest of the structure, caused by the presence of hydrophobic β-branched side chains.

Despite the advances in the design of hyperthermostable protein variants [17], a potential drawback of these approaches is still constituted by the time consumed by computer algorithms for exploring the whole sequence protein space. Other things being equal, focussing on the apolar contact area of the α-helices of the protein core through substitutions increasing the number of methyl side chain groups and/or resulting in a better packing of the secondary structure elements, will potentially give clues for the thermostabilization of the protein.

Methods

Data Collection

Hyperthermophilic and thermophilic protein structures were retrieved from Protein Data Bank (PDB)[36], by initially searching for the words "thermo", "thermophile" and "hyperthermophile". This search yielded about 300 proteins and their corresponding sources. An additional search was then performed using as query the name of such organisms, after having assessed that their optimal growth temperatures were between 50°C and 80°C for thermophiles, and above 80°C for hyperthermophiles [3]. Optimal growth temperatures for each organism were obtained from Entrez [37] and the "Prokaryotic Growth Temperature Database" [38]. As a first refinement step, the entries in which protein structures were determined by nuclear magnetic resonance (NMR) were discarded, yielding about 1563 crystal structures.

As a second refinement step, all the entries were examined by means of the PISCES tool [39], and culled from the original dataset by maximum percentage of identity (90%), maximum resolution (2.0 Å), maximum R-value (0.25) and minimum chain length (50 residues) criteria. Furthermore, a second dataset was collected following less stringent criteria (maximum resolution at 3.0 Å and maximum R-value at 0.30), in order to cull a greater number of structures. This second step yielded 458 and 767 proteins for dataset A and B, respectively. Each dataset was then further reduced by eliminating proteins displaying any structural defect, such as missing side-chains or chain breaks due to missing residues, using the MAXIT tool, available at [46]. At the end of this refinement step, 93 and 144 structures comprised dataset A and B, respectively.

Each structure of the two datasets was then exploited to check for the presence in PDB of a mesophilic counterpart. To this purpose, a search with the blast tool [40,41] was carried out, adopting the following criteria: 30% minimum sequence identity, that is usually accepted as a threshold value to assure a homology relationship between two proteins [42]; 90% maximum sequence identity, in order to avoid any redundancy of data; 40% maximum difference in length between the sequences, to avoid the presence of large indels between the two structures. Furthermore, the retrieved mesophilic proteins had to satisfy the same above described structural criteria to be accepted. In those cases yielding several mesophilic homologous structures available for each hyperthermophic/mesophilic protein, the one displaying the highest percent of sequence identity was collected. At the end of this search, 38 protein pairs for dataset A (14 thermophilic/mesophilic pairs and 24 hyperthermophilic/mesophilic pairs) and 59 protein pairs for dataset B (22 thermophilic/mesophilic pairs and 37 hyperthermophilic/mesophilic pairs) were collected (Table 1 and Table 2).

Computation of the Apolar Contact Area

Computation of the total apolar contact area between the residues of each structure composing dataset A and B was carried out by means of the pdb_np_cont tool [43], which computes pairwise atom contact areas between non-polar atoms from structural protein data in a standard PDB coordinate file. Briefly, this method is based on the classification of points located on a sphere of interaction radius, surrounding each non-polar atom. The interaction radius is the van der Waals radius of each atom type, plus the radius of a water molecule. The output of this program was utilized to calculate the pairwise residue contact areas for every possible pair of residues belonging to the structures analysed. Heteroatoms were ignored. The total apolar contact area was then normalized by sequence length of each protein structure.

In order to assess the role played by the hydrophobic contacts in the stabilization of the protein core, at high temperatures, each pair of homologous hyperthermophilic/mesophilic and thermophilic/mesophilic structures was initially superposed by means of the CE-MC tool [44]. The resulting alignment was then utilized to derive manually refined structural alignments. Every pair of structures was visually inspected and, where necessary, modified to optimise the matching of several structural features, including observed secondary elements, functionally conserved residues and hydrophobic regions, in order to give the most accurate structural alignment.

Each structural alignment obtained as described above was utilized to identify the common core and the structurally conserved regions between the pairs of proteins taken into consideration (SCRs). SCRs were defined as regions displaying a similar local conformation, with a mean positional RMSD of the equivalent α-carbon positions of the structures superposed ≤ 3.0 Å [18], lacking indels (insertions and deletions) and composed of at least three consecutive residues. For every structurally equivalent position of the pairwise structural alignment, the RMSD from the center of mass of the structurally equivalent Cα atoms was computed. To avoid the presence of SCRs with indels, positions with gaps were not considered. A window of size w = 3 positions was then scrolled through the alignment and used to define seed positions with a mean RMSD ≤ 3.0 Å. Each time a seed position was found, w was increased iteratively by one position until the mean score remained belove 3.0 Å, or until the window reached the end of the alignment. The obtained SCRs were then visually inspected to avoid the possible presence of regions with different conformations. Then, the hydrophobic contacts involving pairs of topologically equivalent residues in both of the structures analysed (Conserved Hydrophobic Contacts, CHCs) were extracted from the identified SCRs. The SCR_FIND and CHC_FIND tools [19] were utilized to this purpose.

The differences observed in the amount of apolar contact area between the SCRs of the hyperthermophilic/mesophilic and thermophilic/mesophilic protein pairs were further investigated through the analysis of such differences in the regular secondary structure elements: α-helices and β-strands. Secondary structures were determined by using the program DSSP [45].

The amount of apolar contact area measured in the SCRs and secondary structure elements of each structure were finally normalized by the number of residues belonging to SCRs, α-helices and β-strands, respectively.

Amino acid Composition of the residues involved in CHCs

Differences in amino acid composition were measured by:

<a onClick="popup('http://www.biomedcentral.com/1472-6807/8/14/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6807/8/14/mathml/M1">View MathML</a>

(1)

where Daa is the difference in amino acid composition for residue aa, nT and nM are the number of residues of type aa in hyperthermophilic/thermophilic (T) and mesophilic (M) structures and naa is the total number of residues in hyperthermophilic, thermophilic (T) and mesophilic (M) structures.

The Daa values measured for each pair of the structures analysed were then used to calculate the difference in amino acid composition Caa over the k pairs composing dataset A and dataset B:

<a onClick="popup('http://www.biomedcentral.com/1472-6807/8/14/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6807/8/14/mathml/M2">View MathML</a>

(2)

The mean and standard deviation for the Caa elements were determined; the significance Raa of the difference in amino acid composition for residue aa was then calculated by dividing the difference between Caa and the overall mean <a onClick="popup('http://www.biomedcentral.com/1472-6807/8/14/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6807/8/14/mathml/M3">View MathML</a> by the standard deviation σ:

<a onClick="popup('http://www.biomedcentral.com/1472-6807/8/14/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6807/8/14/mathml/M4">View MathML</a>

(3)

Raa values ≥ 3.0 standard deviations (corresponding to a probability P ≤ 0.01 that the observed difference was obtained by chance) from the mean value were considered statistically significant.

Preferred amino acid pairs in CHCs

Preferred amino acid pairs forming hydrophobic contacts were identified by computing the number of times a particular pair of residues comprised in SCRs makes a hydrophobic contact. The obtained counts were then normalized by the number of pairs of interacting residues present in the SCRs of the structure taken into consideration. An interaction matrix reporting the differences in the number of apolar contacts for each possible pair of residues, between hyperthermophilic/mesophilic and thermophilic/mesophilic structures, was derived:

<a onClick="popup('http://www.biomedcentral.com/1472-6807/8/14/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6807/8/14/mathml/M5">View MathML</a>

(4)

where k represents the number of elements of dataset A or B, CXY is the element of the matrix reporting the differences in the number of apolar contacts for the pair XY of interacting residues, CT and CM are the normalized counts for the hyperthermophilic/thermophilic and the mesophilic proteins, respectively.

The mean and standard deviation for the non-zero elements of the overall interaction matrix were determined; the significance RXY of the interaction XY was then calculated by dividing the difference between CXY and the overall matrix mean <a onClick="popup('http://www.biomedcentral.com/1472-6807/8/14/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6807/8/14/mathml/M3">View MathML</a> by the standard deviation σ:

<a onClick="popup('http://www.biomedcentral.com/1472-6807/8/14/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6807/8/14/mathml/M6">View MathML</a>

(5)

RXY values ≥ 3.0 standard deviations (corresponding to a probability P ≤ 0.01 that the observed difference was obtained by chance) from the mean value were considered statistically significant.

Preferred amino acid substitutions in CHCs

Amino acid substitutions of residues involved in the formation of conserved hydrophobic contacts between hyperthermophilic and mesophilic proteins were determined by analysing the alignment of the SCRs of each pair. For each residue X, belonging to a mesophilic protein and involved in making CHCs, aaX→Y was defined as the number of times X is substituted by the residue Y of the hyperthermophilic sequence. Likewise, aaY→X is defined. Therefore, a substitution matrix can be obtained by computing the difference between aaX→Y and aaY→X over the whole dataset of protein pairs k, according to:

<a onClick="popup('http://www.biomedcentral.com/1472-6807/8/14/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6807/8/14/mathml/M7">View MathML</a>

(6)

where CS is the element of the substitution matrix.

The mean and standard deviation for the non-zero elements of the overall exchange matrix were determined; the significance RXY of the exchange X → Y was then calculated by dividing the difference between CS, and the overall matrix mean <a onClick="popup('http://www.biomedcentral.com/1472-6807/8/14/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6807/8/14/mathml/M3">View MathML</a> by the standard deviation σ:

<a onClick="popup('http://www.biomedcentral.com/1472-6807/8/14/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6807/8/14/mathml/M8">View MathML</a>

(7)

RXY values ≥ 3.0 standard deviations (corresponding to a probability P ≤ 0.01 that the observed difference was obtained by chance) from the mean value were considered statistically significant.

Statistical significance

The statistical significance of the observed differences of ACA between hyper/thermophilic proteins and their mesophilic counterparts was assessed with a paired t-test (applied to every pair of structures composing dataset A and dataset B, respectively), to judge the rejection of the null hypothesis (t > 2.0; P(t) < 5%). The null hypothesis to be rejected with the paired t-test analysis is that there is not a significant difference between the measured values of ACA in the hyper/thermophilic and mesophilic proteins.

In order to ensure that the measured P(t) was not biased by the extreme values of the distributions, the t-test validation analyses were repeated, removing the highest and lowest values from the datasets.

The Shapiro-Wilk normality test was applied to judge the distribution of the obtained values for the two datasets. The null hypothesis of this test is that the analysed samples of data are taken from a Gaussian distribution; therefore, the returned P(t) of this test represents a criteria of acceptance or rejection of the null hypothesis. A P(t) < 0.05 was considered statistically significant to reject the supposition of normality.

Authors' contributions

AP conceived the study, interpreted the data and wrote the final manuscript. AP and RS both contributed source code. RS collected the structures, the datasets and implemented most of the various computational analyses. SP supervised the study and helped draft the manuscript. FB coordinated the study and helped to draft the manuscript. All authors read and approved the final manuscript.

Acknowledgements

The authors are very grateful to Dr. Roberto Contestabile for support and helpful advice. This work was supported in part by grants of the Italian Ministero dell'Università e della Ricerca.

References

  1. Razvi A, Scholtz JM: Lessons in stability from thermophilic proteins.

    Protein Sci 2006, 15:1569-1578. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  2. Lynn JR, Mancinelli RL: Life in extreme environments.

    Nature 2001, 409:1092-1101. PubMed Abstract | Publisher Full Text OpenURL

  3. Vieille C, Zeikus GJ: Hyperthermophilic enzymes: sources, uses, and molecular mechanisms for thermostability.

    Microbiol Mol Biol Rev 2001, 65:1-43. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  4. Pace CN, Shirley BA, McNutt M, Gajiwala K: Forces contributing to the conformational stability of proteins.

    FASEB J 1996, 10:75-83. PubMed Abstract | Publisher Full Text OpenURL

  5. Sadeghi M, Naderi-Manesh H, Zarrabi M, Ranjbar B: Effective factors in thermostability of thermophilic proteins.

    Biophys Chem 2006, 119:256-270. PubMed Abstract | Publisher Full Text OpenURL

  6. Kumar S, Tsai CJ, Nussinov R: Factors enhancing protein thermostability.

    Protein Engineering 2000, 13:179-191. PubMed Abstract | Publisher Full Text OpenURL

  7. Chen J, Lu Z, Sakon J, Stites WE: Increasing the thermostability of staphylococcal nuclease: implications for the origin of protein thermostability.

    J Mol Biol 2000, 303:125-130. PubMed Abstract | Publisher Full Text OpenURL

  8. Szilagyi A, Zavodszky P: Structural differences between mesophilic, moderately thermophilic and extremely thermophilic protein subunits: results of a comprehensive survey.

    Structure Fold Des 2000, 8:493-504. PubMed Abstract | Publisher Full Text OpenURL

  9. Robinson-Rechavi M, Godzik A: Structural genomics of thermotoga maritima proteins shows that contact order is a major determinant of protein thermostability.

    Structure 2005, 13:857-860. PubMed Abstract | Publisher Full Text OpenURL

  10. Gromiha MM: Important inter-residue contacts for enhancing the thermal stability of thermophilic proteins.

    Biophysical Chemistry 2001, 91:71-77. PubMed Abstract | Publisher Full Text OpenURL

  11. Plaxco KW, Simons KT, Baker D: Contact order, transition state placement and the refolding rates of single domain proteins.

    J Mol Biol 2001, 277:985-994. Publisher Full Text OpenURL

  12. Robinson-Rechavi M, Alibes A, Godzik A: Contribution of electrostatic interactions, compactness and quaternary structure to protein thermostability: lessons from structural genomics of Thermotoga maritima.

    J Mol Biol 2006, 356:547-557. PubMed Abstract | Publisher Full Text OpenURL

  13. Gunasekaran K, Hagler AT, Gierasch LM: Sequence and structural analysis of cellular retinoic acid-binding proteins reveals a network of conserved hydrophobic interactions.

    Proteins 2004, 54:179-194. PubMed Abstract | Publisher Full Text OpenURL

  14. Rigby M, Smith EB, Wakeham WA, Maitland GC: The forces between molecules. Oxford, Clarendon Press; 1986. OpenURL

  15. Li SC, Deber CM: A measure of helical propensity for amino acids in membrane environments.

    Nature Struct Biol 1994, 1:368-373. PubMed Abstract | Publisher Full Text OpenURL

  16. Gerstman BS, Chapagain PP: Self-organization in protein folding and the hydrophobic interaction.

    J Chem Phys 2005, 123:054901. PubMed Abstract | Publisher Full Text OpenURL

  17. Lilie H: Designer proteins in biotechnology.

    EMBO reports 2003, 4:346-351. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  18. Hill EE, Morea V, Chothia C: Sequence conservation in families whose members have little or no sequence similarity: the four-helical cytokines and cytochromes.

    J Mol Biol 2002, 322:205-233. PubMed Abstract | Publisher Full Text OpenURL

  19. Paiardini A, Bossa F, Pascarella S: CAMPO, SCR_FIND and CHC_FIND: a suite of web tools for computational structural biology.

    Nucleic Acids Res 2005, 33:50-55. Publisher Full Text OpenURL

  20. Berezovsky IN, Shakhnovich EI: Physics and evolution of thermophilic adaptation.

    Proc Natl Acad Sci USA 2005, 102:12742-12747. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  21. Elcock AH: The stability of salt bridges at high temperatures: implications for hyperthermophilic proteins.

    J Mol Biol 1998, 284:489-502. PubMed Abstract | Publisher Full Text OpenURL

  22. Petukhov M, Kil Y, Kuramitsu S, Lanzov V: Insights into thermal resistance of proteins from the intrinsic stability of their alpha-helices.

    Proteins 1997, 29:309-320. PubMed Abstract | Publisher Full Text OpenURL

  23. Dahiyat BI, Sarisky CA, Mayo SL: De novo protein design: towards fully automated sequence selection.

    Science 1997, 278:82-87. PubMed Abstract | Publisher Full Text OpenURL

  24. Malakauskas SM, Mayo SL: Design, structure and stability of a hyperthermophilic protein variant.

    Nat Struct Biol 1998, 5:470-475. PubMed Abstract | Publisher Full Text OpenURL

  25. Jarosch R: Interactions between hydrophobic side chains within alpha-helices.

    Protoplasma 2005, 227:37-46. PubMed Abstract | Publisher Full Text OpenURL

  26. Creamer TP, Rose GD: Side-chain entropy opposes alpha-helix formation but rationalizes experimentally determined helix-forming propensities.

    Proc Natl Acad Sci USA 1992, 89:5937-5941. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  27. Petukhov M, Munoz V, Yumoto N, Yoshikawa S, Serrano L: Position dependence of non-polar amino acid intrinsic helical propensities.

    J Mol Biol 1998, 278:279-289. PubMed Abstract | Publisher Full Text OpenURL

  28. Haney PJ, Badger JH, Buldak GL, Reich CI, Woese CR, Olsen GJ: Thermal adaptation analyzed by comparison of protein sequences from mesophilic and extremely thermophilic Methanococcus species.

    Proc Natl Acad Sci USA 1999, 96:3578-3583. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  29. Chakravarty S, Varadarajan R: Elucidation of factors responsible for enhanced thermal stability of proteins: a structural genomics based study.

    Biochemistry 2002, 41:8152-8161. PubMed Abstract | Publisher Full Text OpenURL

  30. Creamer TP, Rose GD: Interactions between hydrophobic side chains within alpha-helices.

    Protein Sci 1995, 4:1305-1314. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  31. Pace CN, Scholtz JM: A helix propensity scale based on experimental studies of peptides and proteins.

    Biophys J 1998, 75:422-427. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  32. Kauzmann W: Some factors in the interpretation of protein denaturation.

    Adv Protein Chem 1959, 14:1-63. PubMed Abstract OpenURL

  33. Dill KA: Dominant forces in protein folding.

    Biochemistry 1990, 29:7133-7155. PubMed Abstract | Publisher Full Text OpenURL

  34. Lee B: Solvent reorganization contribution to the transfer thermodynamics of small nonpolar molecules.

    Biopolymers 1991, 31:993-1008. PubMed Abstract | Publisher Full Text OpenURL

  35. Serrano L, Kellis J, Cann P, Matouschek A, Fersht A: The folding of an enzyme. II. Substructure of barnase and the contribution of different interactions to protein stability.

    J Mol Biol 1992, 224:783-804. PubMed Abstract | Publisher Full Text OpenURL

  36. Kouranov A, Xie L, de la Cruz J, Chen L, Westbrook J, Bourne PE, Berman HM: The RCSB PDB information portal for structural genomics.

    Nucleic Acids Res 2006, 34:302-305. Publisher Full Text OpenURL

  37. Schuler GD, Epstein JA, Ohkawa H, Kans JA: Entrez: molecular biology database and retrieval system.

    Methods Enzymol 1996, 266:141-162. PubMed Abstract OpenURL

  38. Huang SL, Wu LC, Laing HK, Pan KT, Horng JT: PGTdb: a database providing growth temperatures of prokaryotes.

    Bioinformatics 2004, 20:276-278. PubMed Abstract | Publisher Full Text OpenURL

  39. Wang G, Dunbrack RL: PISCES: recent improvements to a PDB sequence culling server.

    Nucleic Acids Res 2005, 33:94-98. Publisher Full Text OpenURL

  40. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool.

    J Mol Biol 1990, 215:403-410. PubMed Abstract | Publisher Full Text OpenURL

  41. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

    Nucleic Acids Res 1997, 25:3389-3402. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  42. Vogt G, Etzold T, Argos P: An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited.

    J Mol Biol 1995, 249:816-831. PubMed Abstract | Publisher Full Text OpenURL

  43. Drabløs F: Clustering of non-polar contacts in proteins.

    Bioinformatics 1999, 15:501-509. PubMed Abstract | Publisher Full Text OpenURL

  44. Guda CE, Scheeff D, Bourne PE, Shindyalov IN: A new algorithm for the alignment of multiple protein structures using Monte Carlo optimization.

    Pac Symp Biocomput 2001, 275-86. PubMed Abstract OpenURL

  45. Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.

    Biopolymers 1983, 22:2577-2637. PubMed Abstract | Publisher Full Text OpenURL

  46. MAXIT [http://sw-tools.pdb.org/apps/MAXIT/] webcite