Email updates

Keep up to date with the latest news and content from BMC Structural Biology and BioMed Central.

Open Access Highly Accessed Research article

Analysis of electric moments of RNA-binding proteins: implications for mechanism and prediction

Shandar Ahmad1 and Akinori Sarai2*

Author Affiliations

1 National Institute of Biomedical Innovation, 7-6-8, Saito-asagi, Ibaraki, Osaka, Japan

2 Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, 820-8502 Japan

For all author emails, please log on.

BMC Structural Biology 2011, 11:8  doi:10.1186/1472-6807-11-8

The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1472-6807/11/8


Received:29 June 2010
Accepted:1 February 2011
Published:1 February 2011

© 2011 Ahmad and Sarai; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

Protein-RNA interactions play important role in many biological processes such as gene regulation, replication, protein synthesis and virus assembly. Although many structures of various types of protein-RNA complexes have been determined, the mechanism of protein-RNA recognition remains elusive. We have earlier shown that the simplest electrostatic properties viz. charge, dipole and quadrupole moments, calculated from backbone atomic coordinates of proteins are biased relative to other proteins, and these quantities can be used to identify DNA-binding proteins. Closely related, RNA-binding proteins are investigated in this study. In particular, discrimination between various types of RNA-binding proteins, evolutionary conservation of these bulk electrostatic features and effect of conformational changes by complex formation are investigated. Basic binding mechanism of a putative RNA-binding protein (HI1333 from Haemophilus influenza) is suggested as a potential application of this study.

Results

We found that similar to DNA-binding proteins (DBPs), RNA-binding proteins (RBPs) also show significantly higher values of electric moments. However, higher moments in RBPs are found to strongly depend on their functional class: proteins binding to ribosomal RNA (rRNA) constitute the only class with all three of the properties (charge, dipole and quadrupole moments) being higher than control proteins. Neural networks were trained using leave-one-out cross-validation to predict RBPs from control data as well as pair-wise classification capacity between proteins binding to various RNA types. RBPs and control proteins reached up to 78% accuracy measured by the area under the ROC curve. Proteins binding to rRNA are found to be best distinguished (AUC = 79%). Changes in dipole and quadrupole moments between unbound and bound structures were small and these properties are found to be robust under complex formation.

Conclusions

Bulk electric moments of proteins considered here provide insights into target recognition by RNA-binding proteins, as well as ability to recognize one type of RBP from others. These results help in understanding the mechanism of protein-RNA recognition, and identifying RNA-binding proteins.

Background

Protein-RNA interactions have been identified as crucial for a number of cellular processes [1-7]. However, the mechanism of RNA recognition by proteins or vice versa has been poorly understood despite a recent surge in the study of protein-RNA interactions for specific systems as well as their statistical analysis and prediction [8-12]. Most computational studies on protein-RNA interactions have focused on classification, annotation and binding-site characterization [11,12]. A large number of features have often been employed for accurate predictions of these RNA-binding proteins as well as their interface residues [12]. Structure-based predictions and analysis of RBP's have focused on high-resolution structures utilizing detailed structural parameters such as solvent accessibility and detailed geometrical features such as cleft and patch. In these studies, basic electrostatic features such as dipole and quadrupole moments are typically considered in combination with many other parameters (e.g., 40 parameters by Shazman and Gutfreund [12]), which prevents us from looking at the role of individual physical properties of proteins in RNA-recognition and therefore some of the obvious role of electrostatic interactions may be lost in an effort to maximize prediction performance. We have earlier shown that simple electrostatic properties viz. net charge, dipole and quadrupole moments carry significant information useful to predict DNA-binding proteins both from full atomic coordinates as well as main chain atoms [13]. Subsequent studies confirmed that low-resolution structures could be used to apply this method to the prediction of nucleic acid binding function [14]. A Web-based tool to calculate dipole and quadrupole moments and reflections on their relationship to functional protein classes has also become available and was published recently [15].

Here, we carry out a systematic analysis of three bulk electrostatic properties of RNA-binding proteins (RBPs) viz. net charge, dipole and quadrupole moments, all calculated from low-resolution protein structures with only main-chain coordinates, in order to estimate how far these simple properties are able to identify RBPs from control proteins. Simple statistical analysis of electric moments in each category has been supplemented by all-against-all pair-wise recognition of various types, determined by neural network prediction. Our results show that there exists a pattern of electric moments in RBPs, which is different from the control data as well as within the proteins binding to various types of RNAs. One type of RNA-binding proteins can be distinguished from the other on the basis of these properties with various degrees of accuracy. Finally, we compile a data set of pairs of structures of the same RBPs solved in monomer state as well as full protein-RNA complex. Using this data set, we show that the calculation of moments is rather robust against conformational changes induced by complex formation. Finally, we discuss possible implications of the present results for the mechanism of protein-RNA interactions.

Methods

Data set of RNA-binding proteins

Primary source of RNA-binding proteins and their annotations into various categories is SCOR database [16]. First, a list of all PDB codes present in SCOR was compiled, resulting in 569 entries. All 569 PDB entries were scanned for RNA (998 chains) and proteins (1435 chains). Protein chains were then scanned to be in direct contact with at least one RNA chain. Proteins with at least 3 residues in contact were selected, resulting in 1242 chains. FASTA-formatted protein sequences were generated from the PDB files and redundancy was removed by clustering them at 25% sequence identity using BLASTCLUST [17]. This resulted in RBP_NR25 database of 160 protein chains, to be subsequently referred to as simply RBP. SCOR functional classification was used to annotate them as binding to mRNA (13 chains), tRNA (20 chains), rRNA (84 chains) or viral RNA (17 chains). Final list of selected protein chains, their calculated moments, along with other data sets, is provided in Additional File 1.

Additional file 1. Electric Moments of RNA-binding proteins. Charge (q), dipole moment (p) and quadrupole moments (three eigen values, Q1, Q2, Q3) of RNA-binding proteins. DNA-binding proteins and control proteins are also included.

Format: XLS Size: 419KB Download file

This file can be viewed with: Microsoft Excel ViewerOpen Data

Development of control data sets

First, a non-redundant list of all protein chains in PDB was obtained from PDBselect [18]. The latest (May30-2010 version) PDBselect (25% sequence ID clusters) consisted of 4868 protein chains. From this, chains smaller than 50 residues were removed, which resulted in 4133 protein chains. Next, a keyword search using "Nucleic acid binding" was carried out in SWISSPROT and resulting 20595 proteins chains were obtained in this way. Then, the 4133 chains selected from PDBselect were aligned against all the 20595 SWISSPROT sequences to obtain any similarity, using BLAST at e-values cutoff of 0.01. These chains were excluded from PDB select sequence database. Further PDB entry type was checked and nucleic acid binding chains were removed, leaving 2441 sequences with no similarity to RNA binding proteins with known or unknown structure were obtained. These 2441 protein chains were used as a control data set for all our analysis (see Additional File 1).

Complex versus monomeric structure pairs

Sequence homologues of proteins used in the above data set (RBP_NR25) were searched in PDB with at least 90% sequence identity and the best match was selected. Minimum alignment coverage was also set at 90% and only those target sequences that occurred in monomeric PDB entries were selected.

Calculations of electric moments

Charge, dipole moment and quadrupole moments were calculated as described in our earlier study [13]. According to that study, consideration of all-atom coordinates did not affect the overall results, as compared to the low-resolution model with only backbone coordinates. Thus, in this study, side-chain coordinates of the proteins were ignored and the electric moments were based on the main chain conformation determined by Cα-position of the residues. All Lys and Arg residues were assigned a positive charge and Glu and Asp residues were considered negative. All other residues were treated as neutral: His was considered as neutral, as the consideration of its charged states had negligible effects (see Results section). All water molecules, metals and ligands were also ignored for these calculations.

Components of dipole moments were calculated using the expression

<a onClick="popup('http://www.biomedcentral.com/1472-6807/11/8/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6807/11/8/mathml/M1">View MathML</a>

(1)

where Ro is the reference point, which was taken as the geometric center of all the residues (Cα-positions) in the structure, and i represents an atom in the protein structure. Net dipole moment was calculated by taking a vector sum of these components.

Quardupole moment is a tensor of rank 2 and a direct calculation from the PDB coordinates gives nine components (Mxx, Mxy, Mxz, Myx, Myy, Myz, Mzx, Mzy and Mzz). Each of these components is calculated by the following expression

<a onClick="popup('http://www.biomedcentral.com/1472-6807/11/8/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6807/11/8/mathml/M2">View MathML</a>

(2)

where ri is the relative position vector, i is the index of charge and summation is over all charges. The quadrupole moment matrix can be diagonalized and the three eigenvalues of the quadrupole moment matrix are represented as Q1, Q2 and Q3 in decreasing order. We used the largest eigenvalue Q1 for designating single quadrupole moment and all three eigen values for developing the predictor.

All electric moment values were the absolute values and normalized by the protein sequence length in a way similar to our earlier study [13]. Units are often omitted in describing quadrupole moments and net charge as these values are measured in atomic units (using electronic charge and Å as charge and distance units in calculations). Dipole moment values are quoted by converting them to Debyes.

Our method of computing electric moments is somewhat different from a similar approach adopted in a recently published dipole moment server [15]. First of all, we use only the Cα atoms for assigning charges, whereas charges are assigned to specific atomic positions in [15]. Secondly, we used geometric center of all Cα atoms (including residues with zero charge assignments to compute the reference point and axes) and finally, we obtain quadrupole moments by taking their eigen values, which is not provided in [15]. We find that there is a moderate correlation (~0.5) between the dipole moments computed by the two methods. Since our approach is more suitable for low resolution structures (does not require side chain positions), we report only the results obtained by our procedure. For similar reasons, we did not try to predict protonation state of residues, which could sometimes be possible if side-chain coordinates are provided [19].

Statistical significance of difference

Distributions of moments between control and RNA-binding as well as between various classes of RNA-binding proteins were compared by measuring the statistical significance of difference between their means. A two-tailed Student t-test was conducted for all such comparisons using open-source statistical programming language R http://r-project.org webcite. Histograms of distributions were also plotted in the same package.

Difference between bound and unbound pairs

For each protein chain in the RBP data set, a data set of monomeric proteins from PDB was scanned. Proteins with more than 90% similarity and coverage values were used as a pair of complexed and unbound monomers. Electric moments were then computed for both of them by the procedure described above. A total of 27 proteins were found to occur both in monomeric as well as RNA-complexed forms.

The difference between electric moments of a protein in its complexed and unbound forms is measured using Euclidean distance (ED) expression as follows:

<a onClick="popup('http://www.biomedcentral.com/1472-6807/11/8/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6807/11/8/mathml/M3">View MathML</a>

(3)

Where X refers to dipole or quadrupole moment of the protein and summations is taken over all protein-pairs considered in a category (effectively a distance in 27-dimensional space).

Neural network for prediction

A neural network-based predictor, similar to our earlier implementations (e.g. in [20]) was used to find a relationship between input vectors composed here of five descriptors based on charge, dipole moment and three eigenvalues of quadrupole moment and the functional property of protein chain e.g., binding or non-binding (control). To account for any cooperative and non-linear contribution of moments, a single hidden layer with 3 nodes has been used. To avoid over-assessment of performance, the neural network was trained in a jackknife style, by optimizing the predictor for all but one data in the training. Once the training is completed, prediction on the left-out protein is evaluated. After running through all binding and control proteins, overall prediction performance on the left-out proteins is evaluated. Since the neural network returns a real value between 0 and 1 for the target outputs 0 (non-binding) or 1 (binding), ROC data between specificity and sensitivity is calculated and converted to the area under the curve (AUC) values, which reflects performance over the entire range of cutoffs. Other measures of performance are as follows (T refers to true and F referes to false, whereas P is positive class and N is negative class):

<a onClick="popup('http://www.biomedcentral.com/1472-6807/11/8/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6807/11/8/mathml/M4">View MathML</a>

(4)

F-measure is the geometric mean of precision and recall and can be computed by transforming real-valued outputs of neural network into binary class-label predictions at various cutoffs. Cutoffs at which F-measure has the highest value is used for reporting all class-wise performance measures, i.e. precision, recall, accuracy and F-measure.

Results

Statistics of electric moments

Figures 1, 2 and 3 show the frequency histograms of electric charge, dipole and quadrupole moments of RBPs compared with control as well as amongst various RBP classes. Figure 4 shows the detailed scatterplots of the most notable combinations. Table 1 shows the summary in terms of mean values. All the calculated electric moments of RBPs are provided in the Additional File 1. Observations from these results are summarized below.

thumbnailFigure 1. Distribution of electric charges amongst RNA-binding proteins. Abbreviations: First letter of each legend (q: charge, p: dipole moment, Q1: First eigen value of the quadrupole moment), followed by type of proteins considered (bind: all RNA-binding, dbp: DNA-binding, and proteins binding to tRNA (trna), mRNA (mrna) etc.)

thumbnailFigure 2. Distribution of electric dipole moments amongst RBPs.

thumbnailFigure 3. Distribution of quadrupole moments amongst RBPs.

thumbnailFigure 4. Scatterplot of net charge versus dipole moment of RBP, DBP and control data sets.

Table 1. Mean and standard deviation values of electric moments in each class of RNA-binding protein (for a pair-wise comparison, see Table 2).

Net charge

Average overall charge on control proteins is observed to be negative, which is consistent with our earlier control data sets used for analyzing DBPs [21]. We have observed that, similar to DBPs, RBPs also have an average overall positive charge (0.075 per residue) compared with negative (-0.020) values observed for control proteins. Statistical significance is also established by a p-value nearly zero (smaller than the precision limit of the software). However, the histogram in Figure 1 shows that there are a significant number (~40%) of RNA-binding proteins with negative or near zero net charge similar to control proteins. Further look at class-wide distributions shows that tRNA-binding proteins have almost no difference with control proteins in their distribution of charges (p-value ~0.24). On the other hand, mRNA-binding proteins have small difference compared to control data sets (p-values suggest that the difference is significant). However, the most significantly positively charged proteins are rRNA-binding and viral RNA-binding proteins (mean charge 0.077, 0.192 respectively), in which less than 20% proteins have negative or near zero net charge. This is statistically confirmed by the corresponding p-values (nearly zero) in comparison to control proteins (Table 2). When compared to DBPs (Table 1), RBPs are found to have even higher average charge than DBPs. However, looking at various RNA types, we observe that the higher charge on the average in RBP comes predominantly because of rRNA-binding proteins as they are the most abundant in the data set and have the highest positive charge. All other RBPs have significantly lower charge per residue than either the rRNA-binding proteins or DBPs.

Table 2. Pair-wise statistical significance (p-values) of difference in groups of RNA-binding proteins (for mean and standard deviation values in each group, see Table 1).

Dipole moment

We observe from Figure 2 that most RNA-binding proteins are distributed in the range of higher dipole moments. Overall, mean dipole moment for all RBPs is 4.6 units compared with 2.7 units for control proteins (with a highly significant p-value for the difference). Similar to the charge distribution, not all RBP types have higher dipole moments. However, interestingly, the classes with higher dipole moments are slightly different from those with higher positive charge. Although rRNA binding proteins continue to appear at the top of both lists, viral RNA-binding proteins are among the lowest dipole moments amongst all studied classes. On the lowest dipole moments side, tRNA-binding proteins, which have a charge distribution similar to control proteins (as observed above), also have a lower dipole moment (2.1), which is even lower than the control proteins. The highest dipole moment (average ~6.4 units) for rRNA-binding proteins suggests a predominant role for electrostatic interactions in the ribosomal complexes.

Quadrupole moment

Histograms of quadrupole moments show a relatively more subtle role in RBPs. Average values of quadrupole moments in RBPs and DBPs are very similar, but again this comes mainly from rRNA-binding proteins, as all other types of RBPs have lower quadrupole moments than rRNA-binding or DNA-binding proteins.

Collective role of moments

It may be possible that proteins which are not well classified from each other in terms of a single electric moment or charge may be better classified in their combined values. Role of a pair of moments in such recognition can be observed from their scatterplots. Figure 4 shows that a number of RBPs have a higher charge but no significant dipole moment (no DBPs are observed in this category). Most of these proteins are binding to viral RNA. Similarly, some RBPs have a higher dipole moment but no positive charge. Again, there are fewer DBPs in this category. There is a tendency for DBPs that an increased charge also leads to increased dipole moment, but not always the case for RBPs. In other words, positive charges in DBPs are likely to be more localized compared to RBPs, increasing the dipole moments only in the former proteins. A similar difference exists in terms of quadrupole moments of these two types of nucleic acid binding proteins (data not shown).

Neural network based prediction

We observed above that the electric moments of all RBPs differ from control proteins as well as among their subclasses. However, as shown in the scatterplots and Table 1, individual group of proteins may not simply be identified by a single descriptor. For example, rRNA-binding proteins have higher values for all three moments, whereas viral RNA-binding proteins are better characterized by only the total charge. To determine the cumulative contributions of these features in protein-RNA recognition, we designed a neural network and trained it to take advantage of all of these features. Neural network performance in distinguishing any two types of proteins is measured by the area under the ROC curve and results are shown in Table 3. The results indicate that RBPs can be distinguished from control proteins at nearly 78% accuracy. However, rRNA-binding proteins could be determined at even higher accuracy (~79%). This is understandable as we show above that all three discussed properties in rRNA-binding proteins are significantly higher than any other category discussed, including DBPs. Some groups of proteins such as tRNA-binding and mRNA-binding proteins could not be distinguished from control at all, showing over-fitting for training data and almost no generalization value in the trained neural network. Also, DBPs and RBPs, despite subtle differences in their distributions, show limited difference when all factors are taken into account suggesting that the diversity in their moments is more than the amount of data and that many more features will be needed for such a fine-tuning of classification. Results of this classification are consistent with more detailed prediction obtained by up to 40 descriptors [12]. Authors in that study report nearly 81% AUC for identifying RBPs from control data using 10 electrostatic features. Our results, based on a much larger data and just three features, reached a performance of 78% AUC, which is comparable in performance, keeping in view that we do not apply the method to a specific patch but use the whole protein, and thereby show that the method can be used in a more general framework without much loss of performance. We have also developed pair-wise prediction models for various protein classes rather than just the comparison with control data sets, which has not been attempted earlier. With regards to the discrimination between DBPs and RBPs, we reach the same conclusion as [12], i.e. these two classes cannot be distinguished from each other with much confidence.

Table 3. Neural network performance to discriminate between proteins binding to different types of RNA based on charge, dipole and quadrupole moments*.

Bound versus unbound states

Most solved as well as modeled protein structures come from monomeric forms and formation of a full complex from isolated RBPs may lead to structural changes, which may make the predictions performed on complex-derived structures questionable. Prima facie, it appears that the calculations of moments using low-resolution structure information (Cα atoms only) will be more robust than existing methods utilizing side-chain coordinates coming from complexes at high resolution. To assess the validity of this intuitive argument, we compiled a list of structure-pairs of RBPs: one member of each pair came from the complex and the other from a monomer with no other protein or RNA (this may involve conformational changes by protein-protein or protein-RNA interactions). Detailed procedure for selecting pairs is described in Methods. Table 4 gives detailed comparison between the dipole moment and quadrupole moment of these structure pairs (charge is obviously identical in the two cases). Figure 5 and 6 show the scatterplots of dipole and quadrupole moments observed in monomeric and complexed structures. It is clear that correlation coefficients are close to 1 for both dipole and quadrupole moment values. Clearly, the calculation of moments from Cα atoms only makes the procedures far more robust than any other electrostatic property calculated from full atomic coordinates. Figure 7 shows a typical pair of structures as well as an exceptional case with very large conformational change in complex formation. Figure on the left shows complexed and unbound monomeric pairs of 30S ribosomal protein S16 (Complex PDB ID 1hnw_P, unbound PDB ID 1emw_A), and on the right a pair of Zinc finger structures in complex (1un6_B) and unbound forms (2j7j_A) are shown. Dipole and quadrupole moment values for Ribosomal protein S16 remain almost unchanged despite undergoing conformational changes (Table 4), whereas zinc finger pairs show a significant difference in the two variations. However, this protein has been shown to have two modes of binding via changes in domain orientations. When the moments of each domain were calculated separately, bound and the unbound conformations were found to have very similar moments, confirming that the dipole and electric moment values are fairly robust against small conformational changes induced by complex formation (see Table 5).

Table 4. Electric moments of RNA-binding proteins as pairs of RNA-complexed and monomeric structures*.

Table 5. Electric moments of three domains in Zinc finger, which undergoes very large conformational change.

thumbnailFigure 5. Dipole moments of RNA-binding proteins in complexed structure compared with their independently solved monomeric form.

thumbnailFigure 6. Quadrupole moments of RNA-binding proteins in complexed structure compared with their independently solved monomeric form.

thumbnailFigure 7. Superimposed structures of pairs of RBPs in RNA-complexed structure and their unbound monomeric forms. Figure on the left shows complexed and unbound monomeric pairs of 30S ribosomal protein S16 (Complex PDB ID 1hnw_P in red, unbound PDB ID 1emw_A in blue) and on the right a pair of Zinc finger structures in complex (1un6_B, blue) and unbound forms (2j7j_A, red) have been shown. Dipole and quadrupole moment values for Ribosomal protein S16 remain almost unchanged despite undergoing conformational changes (Table 5), whereas zinc finger pairs show a significant difference in the two variations. However, this protein (zinc finger) is a rare example of very large conformational changes in RBPs and in the compared pairs in Table 4, is the only exception to all other pairs, where complex and unbound structures have similar values of moments. This exception was further analyzed to reveal that the moments in the individual domains remain largely unchanged.

Evolutionary conservation

We also examined if the electric moments, calculated above, remain conserved among homologous or similar RNA-binding proteins. To evaluate this, we returned to the overall data of RNA-binding proteins (before removing redundancy), obtained during the process of selecting representative non-redundant set of 160 proteins. Out of 160 clusters obtained above 57 contained at least 10 members each and we plotted the noise to signal (N/S) ratio (standard deviation within each cluster relative to the mean value for each of the three electric moments). By looking at the N/S ratio (data not shown), we find that the charge and dipole moments are highly conserved within the family: the N/S ratio in net charge is less than 2% for most of the proteins, whereas for the dipole moment, most data is within 5% range of mean. In the case of quadrupole moments, the variation is slightly more, suggesting that quadrupole moment may not be as strictly conserved due to the flexibility of structure. However, even quadrupole moments show a fairly conserved distribution and whatever variations in this features are caused by evolutionary or structural variations are relatively small and are not likely to affect the predictability of such proteins from structure.

Protonation state of Histidine residues

Although some methods to predict protonation state of His residues are available, we adopted a more straightforward approach, which does not require knowledge of side chain atomic coordinates i.e. by treating all His residues as neutral. To estimate how far this will affect the conclusions of this study, we created the other extreme case i.e., when all His are treated to have a positive charge. We find that the correlation between the charge, dipole moments and quadrupole moments in these two extreme cases are 0.99, 0.98 and 0.96 respectively (we take the best correlated pair from the three values of quadrupole moment eigen values to quote this value). Together with a good prediction performance obtained by our method of assigning charges, this justifies ignoring the protonation of His residues.

A practical example

The analysis presented above shows that electric moments are a useful indicator of proteins to annotate them as RNA-binding. To illustrate the use of this study, we computed the electric moments of a hypothetical protein from PDB (PDB ID 1JO0) i.e. HI1333, which is a hypothetical protein from Haemophilus influenzae and it has been marked as candidate of being an RNA-binding protein [22]. We computed the electric moments of this protein by our method and found that the net charge and dipole moment of this protein are 0.026 and 4.22 Debyes respectively. Both these values are high and support the view that this could be an RNA-binding protein. To examine its charge distribution further, we plotted the distribution of Arg and Lys (positively charged) and Glu and Asp (negatively charged) residues shown in blue and red respectively in Figure 8. We see that there is a clear separation of positive and negatively charged regions along the horizontal axis of the figure; positively charged residues are protruded to the left, giving rise to a high dipole moment. In addition to the current supporting view of its annotation as RNA-binding protein, this analysis suggests a possible mechanism of interaction i.e., through a dipole moment, which possibly steers it into the negatively charged scaffold of target RNA molecule.

thumbnailFigure 8. Distribution of positively charged (Lys and Arg) residues (blue surface filled) and negatively charged (Asp and Glu) residues (red surface filled) in HI1333, a hypothetical protein from Haemophilus influenzae (PDB ID 1JO0) from the protein data bank. Protein has a significantly high dipole moment and its RNA-binding region seems to be clearly separated from the negatively charged region by a vertical plane.

Discussion

The main results presented above show that, similar to DNA-binding proteins, RNA-binding proteins also show a bias in the distribution of their basic electrostatic features. However, the dipole and quadrupole moments for proteins which bind to ribosomal RNA stand out in comparison with all other classes, suggesting that the main driving force for the formation and functioning of ribosomal assembly has strong electrostatic character revealed not only by their overall charge but orientations and spherical asymmetry contained in higher values of moments. Interaction of proteins with the transcribed RNA is highly order-specific as some proteins bind only after some others are already bound to the partially transcribed RNA [23]. Exact order of presenting proteins to the RNA may require quick recognition, and dipole-dipole and quadrupole interactions may facilitate this process by their long-range steering effect. Some high-resolution studies on specific electrostatic interactions such as the one formed by non-bridging phosphate oxygen have also been reported [24]. Thus, it is speculated that the requirement of orderly assembly and stabilizing electrostatic interactions, which are specific to rRNA-protein interactions, are reflected in higher electric moments in proteins interacting with them. Such an order of events is not essential in other types of RNA molecules as they do not have similar process of assembly (for example, tRNA interactions with proteins have been reported to have a clearly distinct mechanism, although they are also guided by electrostatic forces [25]).

Although studies specifically trying to optimize prediction performance using structure-based bulk electrostatic properties have been reported, they largely focus on charged patches and their geometry in RBPs. We have on the other hand analyzed only three electrostatic properties in more details and used the whole protein as the input for prediction model. This allowed us to examine how the charge distribution may characterize mode of action for these proteins. For example, predominant role of charge and dipole moment in ribosomal proteins stands out as explained above. Another group of RBPs that emerges distinct from this study is viral RNA-binding proteins, which have high amount of charge but not the dipole moment, making this group of proteins distinct from others in terms of a remarkably symmetric distribution of (unbalanced) charge over their surface. Thus, in utilizing charge and its asymmetric distribution on surface, rRNA-binding proteins form an extreme group, whereas other proteins utilize one or more of the three measures considered here. Based on this, the prediction performance of a model using just these properties remains comparable with more detailed methods. Furthermore, additional insight into the mechanism of action of RNA-protein recognition in various functional groups is obtained.

Another key observation in this work is that the change in electric moments due to complex formation is not large, unless it is accompanied by very large conformational changes as in the case of domain movement and multiple proteins with multiple binding modes. However, even in these cases, the constituent domains likely maintain their overall multi-polar electrostatic properties. It may be noted that the pair-wise data set of bound/unbound proteins in this work is somewhat biased as proteins whose structures change considerably are more likely to be reported. This is confirmed by measuring their RMSD after superimposing the structures (we find that the average RMSD of all pairs is 2.1Å, which is quite large; data not shown). Thus, despite these large conformational changes, characteristic electric moments are largely preserved, probably helping in long-range interactions resulting in appropriate energy landscape for recognition by steering.

We observe that the three electric moments are fairly conserved in evolution, and even sequence similarity being as low as 25%, RNA-binding proteins within the same cluster seem to have very similar electric moments, suggesting that the three properties may be universally employed for protein-RNA recognition.

Finally, this method has been rigorously cross-validated on known protein structures of RNA-binding proteins. However, the most useful application of the method would be to annotate proteins from their modeled structures. Unfortunately, a readily available public data of modeled structures with RNA-binding annotations was not available at the time of this study. Thus, all performance measures presented here correspond to real structures (although with very lenient requirements of resolution). Benchmarking performance on high throughput modeled structures remains an area for further investigation.

Conclusions

RNA-binding proteins have distinct patterns of net charge, dipole and quadrupole moments, which can be utilized to rapidly identify them and to some degree determine their structure class. This information is present even at a low-resolution level, as moments calculated from only main-chain coordinates can be utilized for prediction. This method is also robust against conformational changes, as well as evolutionary variations in protein structures.

Authors' contributions

This project was jointly conceived by both authors (SA and AS), as part of their ongoing collaborations. Detailed experimental design and implementation were carried out by SA. Manuscript preparation and analysis of results were carried out by SA in consultation with and suggestions from AS. Both authors read and approved the manuscript.

Acknowledgements

A.S. acknowledges support by Grants-in-Aid for Scientific Research (20016022, 21310131) from Ministry of Education, Culture, Sports, Science and Technology in Japan. S.A. acknowledges support by Grants-in-Aid for Scientific Research (Kaken-hi #22500277) from JSPS, Japan

References

  1. Draper D: Protein-RNA recognition.

    Annu Rev Biochem 1995, 64:593-620. PubMed Abstract | Publisher Full Text OpenURL

  2. de Guzman R, Turner R, Summers M: Protein-RNA recognition.

    Biopolymers 1998, 48:181-195. PubMed Abstract | Publisher Full Text OpenURL

  3. Jones S, Daley D, Luscombe N, Berman H, Thornton J: Protein-RNA interactions: a structural analysis.

    Nucleic Acids Research 2001, 29:943-954. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  4. Chen Y, Varani G: Protein families and RNA recognition.

    FEBS J 2005, 272:2088-2097. PubMed Abstract | Publisher Full Text OpenURL

  5. Sanchez-Diaz P, Penalva L: Post-transcription meets post-genomic: the saga of RNA binding proteins in a new era.

    RNA Biol 2006, 3:101-109. PubMed Abstract | Publisher Full Text OpenURL

  6. Keene J: RNA regulons: coordination of post-transcriptional events.

    Nature Rev Genet 2007, 8:533-543. PubMed Abstract | Publisher Full Text OpenURL

  7. Lunde B, Moore C, Varani G: RNA-binding proteins: modular design for efficient function.

    Nature Rev Mol Cell Biol 2007, 8:479-490. Publisher Full Text OpenURL

  8. Yu X, Cao J, Cai Y, Shi T, Li Y: Predicting rrna-, rna-, and DNA-binding proteins from primary structure with support vector machines.

    J Theor Biol 2006, 240:175-184. PubMed Abstract | Publisher Full Text OpenURL

  9. Terribilini M, Sander J, Lee J, Zaback P, Jernigan R, Vasant H, Drena D: RNABindR: a server for analyzing and predicting RNA-binding sites in proteins.

    Nucleic Acids Research 2007, 35:W578-W584. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  10. Chen YC, Lim C: Predicting RNA-binding sites from the protein structure based on electrostatics, evolution and geometry.

    Nucleic Acids Research 2008, 36(5):e29. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  11. Ellis JJ, Broom M, Jones S: Protein-RNA interactions: structural analysis and functional classes.

    Proteins 2007, 66:903-911. PubMed Abstract | Publisher Full Text OpenURL

  12. Shazman S, Gutfreund YM: Classifying RNA-Binding proteins based on electrostatic properties.

    PLoS Comput Biol 2008, 4(8):e1000146. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  13. Ahmad S, Sarai A: Moment-based prediction of DNA-binding proteins.

    J Mol Biol 2004, 341:65-71. PubMed Abstract | Publisher Full Text OpenURL

  14. Szilagyi A, J S: Efficient prediction of nucleic acid binding function from low-resolution protein structures.

    J Mol Biol 2006, 358:922-933. PubMed Abstract | Publisher Full Text OpenURL

  15. Felder CE, Prilusky J, Silman I, Sussman JL: A server and database for dipole moments of proteins.

    Nucleic Acids Research 2007, 35:W512-W521. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  16. Klosterman PS, Tamura M, Holbrook SR, Brenner SE: SCOR: a structural classification of RNA database.

    Nucleic Acids Research 2002, 30:392-394. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  17. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool.

    J Mol Biol 1990, 215:403-410. PubMed Abstract OpenURL

  18. Berman HM, Henrick K, Nakamura H: Announcing the worldwide Protein Data Bank.

    Nature Structural Biology 2003, 10(12):980. PubMed Abstract | Publisher Full Text OpenURL

  19. Gordon JC, Myers JB, Folta T, Shoja V, Heath LS, Onufriev A: H++: a server for estimating pKas and adding missing hydrogens to macromolecules.

    Nucleic Acids Research 2005, 33:W368-W371. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  20. Ahmad S, Gromiha MM, Sarai A: Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information.

    Bioinformatics 2004, 20:477-486. PubMed Abstract | Publisher Full Text OpenURL

  21. Ahmad S, Keskin O, Sarai A, Nussinov R: Protein-DNA interactions: structural, thermodynamic and clustering patterns of conserved residues in DNA-binding proteins.

    Nucleic Acids Research 2008, 36(18):5922-5932. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  22. Willis M, Krajewski W, Chalamasetty V, Reddy P, Howard A, Herzberg O: Structure of HI1333 (YhbY), a putative RNA-binding protein from Haemophilus influenzae.

    Proteins 2002, 49(3):423-426. PubMed Abstract | Publisher Full Text OpenURL

  23. Williamson JR: After the ribosome structures: How are the subunits assembled.

    RNA 2003, 9:165-167. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  24. Ghosh S, Joseph S: Non-bridging phosphate oxygen in 16S rRNA important for 30S subunit assembly and association with the 50S ribosomal subunit.

    RNA 2005, 11:657-667. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  25. Tworowskia D, Feldmana AV, Safro MG: Electrostatic Potential of Aminoacyl-tRNA Synthetase Navigates tRNA on its Pathway to the Binding Site.

    Journal of Molecular Biology 2005, 350(5):866-882. PubMed Abstract | Publisher Full Text OpenURL