Institute of Biochemistry and Biophysics of the Polish Academy of Sciences, Pawińskiego 5a, 02-106, Warszawa, Poland

Laboratory of Plant Molecular Biology, Warsaw University, Pawińskiego 5a, 02-106, Warszawa, Poland

Abstract

Background

Known protein interaction networks have very particular properties. Old proteins tend to have more interactions than new ones. One of the best statistical representatives of this property is the node degree distribution (distribution of proteins having a given number of interactions). It has previously been shown that this distribution is very close to the sum of two distinct exponential components. In this paper, we asked: What are the possible mechanisms of evolution for such types of networks? To answer this question, we tested a kinetic model for simplified evolution of a protein interactome. Our proposed model considers the emergence of new genes and interactions and the loss of old ones. We assumed that there are generally two coexisting classes of proteins. Proteins constituting the first class are essential only for ecological adaptations and are easily lost when ecological conditions change. Proteins of the second class are essential for basic life processes and, hence, are always effectively protected against deletion. All proteins can transit between the above classes in both directions. We also assumed that the phenomenon of gene duplication is always related to ecological adaptation and that a new copy of a duplicated gene is not essential. According to this model, all proteins gain new interactions with a rate that preferentially increases with the number of interactions (the rich get richer). Proteins can also gain interactions because of duplication. Proteins lose their interactions both with and without the loss of partner genes.

Results

The proposed model reproduces the main properties of protein-protein interaction networks very well. The connectivity of the oldest part of the interaction network is densest, and the node degree distribution follows the sum of two shifted power-law functions, which is a theoretical generalization of the previous finding. The above distribution covers the wide range of values of node degrees very well, much better than a power law or generalized power law supplemented with an exponential cut-off. The presented model also relates the total number of interactome links to the total number of interacting proteins. The theoretical results were for the interactomes of

Conclusions

Using these approaches, the kinetic parameters could be estimated. Finally, the model revealed the evolutionary kinetics of proteome formation, the phenomenon of protein differentiation and the process of gaining new interactions.

Background

Although an evolutionary viewpoint in network studies is not a new concept

Consequently, one may expect that the current network architecture may provide quantitative information about the network history. Comparing the presented kinetic model for the evolution of the protein interaction network with the data for

To address variations in functional significance, a transition between the following two coexisting classes was postulated for the proteins: the class of optional proteins that are essential for ecological adaptations, which naturally emerge and are eliminated during evolution, and the class of proteins essential for basic life processes, which are protected from immediate loss (Figure

Two hypothetical classes of coexisting proteins, X and Y

**Two hypothetical classes of coexisting proteins, X and Y.** Proteins of class X are essential for ecological adaptations, and proteins of class Y are essential for basic biological processes. This schematic picture shows the evolutionary importance of the proteins of each class for ecological adaptation and survival. Organisms having proteins Y can live only in the water environment (upper fig.). Organisms possessing proteins Y and X can exist in the water, in the terrestrial environment and in the mud (central fig.). An organism that loses its Y proteins in the terrestrial environment is evolutionarily eliminated (bottom fig.).

Two sources for new interactions were considered: one, newly emerging proteins and two, proteins within the currently existing interactome. The overall preference for gaining new interactions was assumed to be related to the node degree. In addition, two methods for losing new interactions were considered; the first was related to protein deactivation, and the second one was spontaneous. Because there is evidence that more important proteins evolve similarly to others

The described model predicts a double-shifted power-law distribution for the node degree. Therefore, it confirms the earlier proposal of a double exponential distribution for the node degree

Results

Kinetic model of the evolution of a protein interaction network

Proteome formation

Let us consider two classes of proteins, X and Y, which are evolving according to the following rules (for details, see the Methods). New proteins of class X originate at rate _{
0
} and are inactivated at rate _{
i
} (Figure _{
XY
}. Proteins of class Y are transferred back to class X at rate _{
YX
}
_{
2
}, but the duplicates of the Y class belong to class X. These rules are included in the set of eqs. 1 and 2, describing the variation in the size of population

The kinetic model of the protein interactome evolution

**The kinetic model of the protein interactome evolution. a**. Schematic representation of proteome formation. **b**. Schematic representation of interactome formation. The symbols _{0} - protein origination, _{i} - protein inactivation, _{XY} and _{YX} - protein transition between classes, _{2} - protein duplication, _{0}_{0}/

All parameters of the model describing the rates are treated as fixed.

Interactome formation

By definition, the node degree _{0}
_{0}/_{
0
} is the degree of an entirely new protein and **
ε
** per unit increase in

where τ is the protein age.

For simplicity, only the proteins that emerged in the steady state of proteome formation (

Then, the resolution of eq. 3 describing the evolution of a protein’s node degree is

where _{
r
} is presented as it is defined in the Methods.

Node degree distribution

As mentioned above, the degree of a node (protein) in a network (interactome) is the number of links (interactions) to other nodes, or simply the number of contacts. Its statistical variety may be described by the node degree distribution, i.e., the mathematical function indicating the number of nodes with a given degree. In a continuous approach, the discussed function is denoted as

The amplitudes _{
i
} and the powers _{
i
} (i = 1,2) are defined in the Methods.

Total number of links

Integrating _{
∞
}.

The probabilities _{
i
} (i = 1,2) are defined in the Methods.

The cited quantities _{
r
}, _{
i
}, _{
i
}, _{
i
} can be related to reality by fitting eqs. 5 and 6 to experimental data. However, they are dependent on the kinetic parameters of the processes considered in the model (see the Methods). This approach may lead to the quantitative estimation of these parameters.

Computer simulations

Experimental data

The values of _{∞} and

** Interactome**

**
N
**

**
L
**

**Database**

487

959

BIND

129

107

DIP

3227

5026

BIND

7910

23128

BIND

399

312

BIND

724

1403

COSIN

2529

3376

DIP

1003

994

DIP

349

304

DIP

4135

7839

COSIN

Fitting the model of the node degree distribution to the experimental data

The _{
i
}, _{
i
} and _{
r
}, are presented in Table ^{
-c
} (0.34), and the generalized power law with exponential cut-off (PL-EC), _{
1
})^{
-c2
} e^{-ξ/c3
} (0.24). A detailed comparison of the different fits is shown in Figure

Fitting of the kinetic model to the experimental data

**Fitting of the kinetic model to the experimental data. a**. The result of fitting the model of the node degree distribution to the **b**. Comparison of the fit using the proposed model (eq. 5) and the fits using other models: PL - power-law model and PL-EC – generalized power-law with exponential cut-off model. The continuous line indicates a linear trend for the factual values and the values predicted by our model. The parameters of the trend line are shown in the inset. **c**. Comparison of the fit using the current model (eq. 5) and the fits using our previous double exponential model, 2E **d**. The result of fitting the model of the dependence of _{∞} and

**Quantity**

** Estimate**

**SE (%)**

_{1}

3184.82

32

_{2}

49.8628

77

_{1}

4.80485

29

_{2}

2.1242

20

_{
r
}

6.30779

51

_{1}

0.73526

13

_{2}

0.000552383

5

Fitting the model of the dependence of N_{∞} and L to the experimental data

The _{
∞
}, _{
i
}, are listed in Table

Finding the values of the kinetic parameters of the model

The general parameters of both the node degree distribution (_{
i
}, _{
i
} and _{
r
}) and the total number of links (_{
i
}) can be related to the parameters of the kinetic model. Using both sets of parameters increases the universality and the credibility of the final estimated parameters of model.

A random-walking-type algorithm was developed to estimate the values of the kinetic parameters _{
i,
}
_{
2,
}
_{
XY,
}
_{
YX,
}
_{
0,
}
_{
,
}
_{
i
}, _{
i
}, _{
r
} and _{
i
} (see Methods). The results of both former simulations were joined, and the error measure ^{
2
}, defined below, was minimized:

In the above equation, the singly primed values (‘) were taken from Table _{0}, which was used in the equation:

with the assumption that _{∞} = 4135, as for

At minimization, a few additional simple constraints were added to eliminate the kinetic parameters that showed no real physical importance. Several attempts were made, and the results of the best minimization courses are presented in Table

** Kinetic**

**parameter**

**Best estimation**

^{2} = 0.14

** Average**

(the 10 best)

** SE (%)**

(the 10 best)

_{
i
}

8.61692

12.683567

10.1

_{
2
}

0.122669

0.052556819

27.7

_{
XY
}

0.0611168

0.09274057

10.4

_{
YX
}

2.8542

4.249004

10.1

_{
0
}

0.426372

0.4256051

0.1

0.00237779

0.003526257

10.1

**
ε
**

10.6696

15.91212

9.9

0.107993

0.204924912

45.9

_{
0
}

34376.8

51108.95

10.1

Simulations of the kinetics of the protein interactome evolution

Using eqs. A.1 and A.2, A.6 and A.7 and eq. A.27 with the best fit parameters from Table

Simulations of the evolution of the proteome

**Simulations of the evolution of the proteome. a**. It was also assumed that at the beginning of the evolution, all proteins were essential for life processes (**b**. It was also assumed that at the beginning of the evolution, there were no proteins essential for life processes (Y = 0). Axis

Simulation of the evolution of a small sample of synchronized proteins

**Simulation of the evolution of a small sample of synchronized proteins.** The hypothetical kinetics of the proteome evolution are shown. Axis

Simulation of the evolution of a single node (protein) degree

**Simulation of the evolution of a single node (protein) degree.** Axis

Summary of the most important results

The proposed kinetic model (Figure _{
∞
} and _{1} = 0.12, 1/_{2} = 0.35 and 1/_{
∞
}/_{
∞
} equals 0.02.

Discussion

The presented kinetic model of the evolution of a protein interactome is an extension of the previous two-class model

From a cognitive point of view, the proposed model led to a satisfactory fit to the node degree histogram (Figure _{∞} and

The model and its estimated kinetic parameters allow a sketch of a hypothetical picture of proteome evolution, indicating that class Y of proteins that are functionally essential for basic processes of

In this picture, the origination of new species may be related to variations in the value of the parameters governing the kinetics of evolution (e.g., _{
0
}, which directly determines the value of _{∞}), resulting in the origination of a new steady state of proteome organization. In addition, the results indicate that entering the important class Y is approximately 50-fold slower than leaving it. This finding illustrates how difficult it is to become a member of a protein “gentlemen's club” and how easy it is to lose this position. Mechanisms of selection and adaptation certainly play an important role in this type of arrangement, ensuring stability in the composition of backbone biochemical reactions. The stability is one of the most important factors supporting organisms’ survival. During evolution, organisms investigate optimal paths of growth and replication, which is possible if and only if the organisms preserve certain optimal and stable biochemical machinery

The obtained results also show how large dynamic changes involving new protein emergence and inactivation may occur in class X proteins without disturbing the steady state of the entire system. The results also revealed an essential preference for gaining new interactions. Within the interactome of

To relate these findings to the timescale of real evolution, it is reasonable to arbitrarily assume that a unit of time in the model corresponds to 10^{9} years. Then, an _{
0
} of 34376.8 means approximately 30 new proteins per 10^{6} years. Consequently, the characteristic times of proteome evolution can be estimated to equal 1.2·10^{8} and 3.5·10^{8} years. The shorter time describes the timescale of entering the “higher” class, and the longer time describes the timescale of protein deactivation. The characteristic time of gaining a new interaction is 4.5·10^{8} years.

From the perspective of describing the current distribution of protein degree or the dependence of the total number of links on the size of the interactome, a steady-state approximation for proteome evolution appears to be a correct simplification. Most of the observed proteins most likely originated during the “steady state era”. For a more precise description of the connectivity of older proteins, e.g., those from the pre-eukaryotic radiation era, the model should also take into account the variations with time in both the proteome size and the values of kinetic parameters.

One of the main predictions of the proposed model (Figure

Finally, the proposed model relates the static observables, such as the node degree distribution, to many dynamic evolutionary processes. The discussed dynamics are not a trivial consequence of the birth and death of proteins. The dynamics also involve the transition of proteins between classes, which leads to a dynamic balance, in which a given protein may change its importance class several times depending on the environmental conditions. Thus, the amplitudes in the derived formula for node degree distribution describe an effective dynamic content of each protein class but not the number of specific proteins.

As previously shown, the presented kinetic model of the evolution of a protein interaction network offers a solid foundation for future development and provides a productive research approach to protein interaction networks.

In future studies, it would be nice to have a more definitive evaluation of how the model’s simplifications affect its accuracy. Standard errors of the estimation (Table _{
2
}, is of less statistical significance. Possibly, these parameters could be omitted in simplifications that neglect parameters of the second order without considerable loss in the accuracy of the model.

Despite good fits, we are aware of the fact that the cited experimental methods have enormous potential for false data. The PPI data are full of false positives and false negatives, which, when unquestioningly included, tend to generate false conclusions. Necessarily, the model was applied to the data that exist. High-throughput data tend to be worse than low-throughput data _{
i
}, probabilities _{
i
}). Test simulations that were performed indicate that a 10% increase in the value of those parameters may result in a change of the final estimated kinetic parameters of the model reaching up to 70%. Thus, the results may change in the face of future data.

The presented and applied model of the evolution of the protein interactome by its nature contains some abstraction, which does not invalidate the results (see Hamilton

Conclusions

The current model leads to a number of predictions that we can hope to test in the not-so-distant future. The most interesting findings are the following:

– A small sample of synchronized proteins decreases and differentiates; the degree of a single protein node expands.

– The evolution of a node degree is slower than the evolution of the proteome.

– The evolution of the total proteome stabilizes.

– Entering the class of proteins that are essential for basic biological processes is approximately 50-fold slower than leaving it.

– Large dynamic changes, involving new protein emergence and inactivation in class X, do not disturb the steady state of the entire system.

– There is a parabolic relationship between the total number of interactions and the total number of interacting proteins.

– The connectivity of the oldest part of the interaction network is dense; the node degree distribution follows the sum of the two shifted power-law functions.

We hope that the above paper presents a helpful advance in this interesting area.

Methods

Mathematical formulation of a kinetic model of the evolution of a protein interaction network

Proteome formation

The set of eqs. 1 and 2 (see main text) describing the rate of variation in the size of protein classes X and Y can be rewritten using a more convenient pair of variables, i.e., the total number of evolving proteins,

where _{
0
} is the rate of origination of entirely new proteins of class X, _{
i
} is the rate of protein inactivation, _{
XY
} and _{
YX
} are the rates of protein migration between classes X and Y, _{
2
} is protein duplication rate and

The steady-state (_{∞}, and the number of essential proteins, _{∞}, can be estimated according to eqs. 1 and A.2 as

where

Consequently, the evolution of a small sample of proteins originating within the short time period

with the initial conditions

The eigenvalues, _{
1
} and _{
2
}, obtained from the determinant requirement

describe the characteristic rates of change in sample size

where

When ^{
2
}/4 >

Where

The right side of eq. A.14 describes the number of proteins aged between _{
0
} and _{
0
}

Interactome formation

Considering the protein degree

where

where _{
0
} is the degree of an entirely new protein, **
ε
** is the increase in the rate per link resulting from the preference effect,

Then,

where

Combined with A.23, it is easy to show that

where

Node degree distribution

The degree distribution of a protein node,

one can obtain

where

For _{
r
} < < 1, the distribution A.28 can be approximated by a double exponential formula

where

Total number of links

For a protein emerging in the steady state (

Using distribution A.28 and equations A.3, A.21, A.24, A.26 and A.29 one can obtain

where

and

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

PHP proposed the model and performed model-based analysis. PHP, SK, and PZ participated in the design of the study. PP and SK drafted the manuscript. PP and PZ revised the manuscript. All authors read and approved the final manuscript.

Acknowledgements

This work has been supported by grant 772/N-COST/2010.