Email updates

Keep up to date with the latest news and content from BMC Evolutionary Biology and BioMed Central.

Open Access Highly Accessed Research article

Difference in gene duplicability may explain the difference in overall structure of protein-protein interaction networks among eukaryotes

Takeshi Hase1, Yoshihito Niimura1* and Hiroshi Tanaka12

Author Affiliations

1 Department of Bioinformatics, Medical Research Institute, Tokyo Medical and Dental University, Yushima, Bunkyo-ku, Tokyo 113-8510, Japan

2 Department of Bioinformatics, Graduate School of Biomedical Science, Tokyo Medical and Dental University, Yushima, Bunkyo-ku, Tokyo 113-8510, Japan

For all author emails, please log on.

BMC Evolutionary Biology 2010, 10:358  doi:10.1186/1471-2148-10-358


The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2148/10/358


Received:5 March 2010
Accepted:18 November 2010
Published:18 November 2010

© 2010 Hase et al; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

A protein-protein interaction network (PIN) was suggested to be a disassortative network, in which interactions between high- and low-degree nodes are favored while hub-hub interactions are suppressed. It was postulated that a disassortative structure minimizes unfavorable cross-talks between different hub-centric functional modules and was positively selected in evolution. However, by re-examining yeast PIN data, several researchers reported that the disassortative structure observed in a PIN might be an experimental artifact. Therefore, the existence of a disassortative structure and its possible evolutionary mechanism remains unclear.

Results

In this study, we investigated PINs from the yeast, worm, fly, human, and malaria parasite including four different yeast PIN datasets. The analyses showed that the yeast, worm, fly, and human PINs are disassortative while the malaria parasite PIN is not. By conducting simulation studies on the basis of a duplication-divergence model, we demonstrated that a preferential duplication of low- and high-degree nodes can generate disassortative and non-disassortative networks, respectively. From this observation, we hypothesized that the difference in degree dependence on gene duplications accounts for the difference in assortativity of PINs among species. Comparison of 55 proteomes in eukaryotes revealed that genes with lower degrees showed higher gene duplicabilities in the yeast, worm, and fly, while high-degree genes tend to have high duplicabilities in the malaria parasite, supporting the above hypothesis.

Conclusions

These results suggest that disassortative structures observed in PINs are merely a byproduct of preferential duplications of low-degree genes, which might be caused by an organism's living environment.

Background

Large-scale data of protein-protein interactions have become available from several organisms, including Saccharomyces cerevisiae (yeast; [1-4]), Caenorhabditis elegans (worm; [5]), Drosophila melanogaster (fly; [6]), Homo sapiens (human; [7,8]), and Plasmodium falciparum (malaria parasite; [9]). In a protein-protein interaction network (PIN), a protein and an interaction between two proteins are represented as a node and a link, respectively. The number of links connected to a node is called a degree. The degree distribution P(k) represents the fraction of k-degree nodes in a network and characterizes the structure of a network. It is well known that various biological, technological, and social networks are scale-free networks, in which P(k) follows a power law, i.e., P(k) ~ k[10-12]. In a scale-free network, therefore, most of the nodes have low degrees, but a small number of high-degree nodes (hubs) also exist. In the case of PINs, P(k) better fits a power law with an exponential cut-off, i.e., <a onClick="popup('http://www.biomedcentral.com/1471-2148/10/358/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2148/10/358/mathml/M1">View MathML</a>[13,14].

A correlation between degrees of two nodes connected by a link is another feature characteristic of a network architecture. A simple way to see the degree correlation is to consider the Pearson correlation coefficient r of the degrees at both ends of a link [12,15,16]. A network is called as assortative when r > 0, while it is disassortative when r < 0. In an assortative network, hubs are preferentially connected to other hubs, whereas in a disassortative network, hubs tend to attach to low-degree nodes. It was reported that social networks such as coauthorships of scientific papers or film actor collaborations are assortative, whereas technological and biological networks including Internet, food web, neural network, and PIN are disassortative [16].

Assortativity of a network can also be evaluated by <Knn(k)>, the mean degree among the neighbors of all k-degree nodes ("nn" in <Knn(k)> represents "nearest neighbors"; [12,14,17,18]). In assortative and disassortative networks, <Knn(k)> follows an increasing and decreasing functions of k, respectively. If there are no degree correlations, <Knn(k)> is independent of k, <Knn(k)> = <k2>/<k> [12]. Several studies reported that the yeast PIN is a disassortative network showing <Knn(k)> ~ k-ν [12,14,17], where ν represents the extent of disassortative structure. In the yeast PIN, therefore, links between a hub and a low-degree node are favored, but those between hubs are suppressed. From this observation, Maslov and Sneppen [17] suggested a picture that, in the yeast PIN, a hub forms a functional module of the cell together with many low-degree neighbors. They hypothesized that the suppression of interactions between hubs minimizes unfavorable cross-talks between different functional modules and increases the robustness of a network against perturbations. Therefore, it is postulated that the disassortative structure in the yeast PIN has been favored by natural selection. Note that, if this hypothesis is true, a disassortative structure should be a general feature that is commonly observed among PINs in any organisms.

To understand the evolutionary mechanisms shaping PIN architectures, several network growth models have been proposed. Many of them are based on gene duplication and divergence, in which a randomly selected node is duplicated to generate a new node having the same links as the original node, and some links are added or eliminated in a divergence process [19-23]. We have recently proposed a non-uniform heterodimerization (NHD) model [14]. In this model, a new link is preferentially attached between two duplicated nodes to create a cross-interaction when they share many common neighbors. We showed that this model can the best reproduce structural features of the yeast PIN, including scale-freeness, a small number of cross-interactions, and a skewed distribution of triangles composed of three nodes and three links. However, this model as well as other duplication-divergence models [21,22] failed to explain the presence of a disassortative structure in the yeast PIN. Simulation studies showed that these models could generate a decreasing function of <Knn(k)>, yet the value of ν (0.18) in <Knn(k)> ~ k-ν is much smaller than the actual value (0.47; see Tables 1 and 2). Therefore, the origin of a disassortative structure still remains unexplained. We should again note that most of these simulation studies were carried out by using the yeast PIN only, because it is currently the best characterized.

It is well-known that large-scale PIN data contain many false positive interactions [24]. Maslov and Sneppen [17] used a dataset obtained by high-throughput yeast two-hybrid (Y2H) screens [2] to show suppression of interactions between high-degree nodes. Aloy and Russell [25], however, argued that the observed suppression of hub-hub interactions is probably an artifact caused by a systematic error in the Y2H data due to prey-bait asymmetry (see also Maslov and Sneppen [26]). To circumvent the problem of high false positive rates in high-throughput datasets, Batada et al. [27] used only interactions that were independently reported at least twice in different datasets, and they found that hub-hub interactions were not suppressed in the multi-validated yeast PIN data. However, Hakes et al. [28] pointed out that multiple validation introduces another problem: interactions observed at least twice will be biased towards well-studied proteins, such as those from particular cellular environments or highly expressed ones. They showed that assortativity of a PIN drastically changes depending on datasets [28]. A literature-curated yeast PIN dataset [29], which is expected to be reliable because each of the interaction data was derived from small-scale experiments, showed a disassortative structure; however, when they retained only interactions observed twice or three times, it became rather assortative [28]. Therefore, the presence of a disassortative structure in a PIN itself has now become controversial. These studies suggest that a global structure of a PIN has to be investigated by using various datasets obtained from different methods.

The purpose of this paper is to investigate the presence of disassortative structures in PINs and an evolutionary mechanism shaping disassortative structures, if any. For this purpose, we examined eukaryotic PINs from the yeast, worm, fly, human, and malaria parasite. We analyzed four large-scale yeast PIN datasets (MIPS [3]; Yu et al. [4]; Reguly et al. [29]; Batada et al. [30]). The datasets include Batada et al.'s updated version of a multi-validated dataset, Reguly et al.'s comprehensive literature-curated dataset, and MIPS [3], which has been called a "gold standard" of yeast protein interaction dataset generated by manual curations by experts. We also used recently published high-quality protein interaction data by Yu et al. [4], which were obtained by compiling several Y2H datasets. In addition, we examined two independent human PIN datasets (Rual et al. [7]; Stelzl et al. [8]). As a result, we show that the yeast, worm, fly, and human PINs have disassortative structures, while malaria parasite PIN is not disassortative. We then propose a possible evolutionary mechanism causing the difference in assortativity among species.

Results

In this study, we examined nine PIN datasets from yeast, worm, fly, human, and malaria parasite (Table 1). Although the numbers of nodes and links are quite different among the five species, their degree distributions P(k) follow nearly the same curve (Figure 1 and additional file 1: Figure S1). All of the PINs examined are scale-free, suggesting that scale-freeness is a general feature of PINs. These observations are consistent with Suthram et al. [31].

Table 1. Statistics of the PINs from five eukaryote species

thumbnailFigure 1. Degree distribution of PINs in five eukaryote species. Degree distribution P(k) in the PINs of yeast (black square), worm (magenta plus), fly (blue triangle), human (green cross), and malaria parasite (red diamond). For yeast and human PINs, P(k) for MIPS and Rual et al. datasets, respectively, are shown, because they contain the largest numbers of genes among the PINs for each species. The results for the other yeast and human datasets are provided in Additional file 1: Figure S1. A dashed line represents <a onClick="popup('http://www.biomedcentral.com/1471-2148/10/358/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2148/10/358/mathml/M2">View MathML</a> with γ = 2.7, k0 = 3.4, and kC = 50.

Additional file 1. Figure S1: Degree distribution in the yeast and human PINs. (A) Degree distribution P(k) in the yeast PIN for four different datasets. A dashed line is the same as Figure 1. (B) Degree distribution P(k) in the human PIN for two datasets. A dashed line is the same as Figure 1.

Format: TIFF Size: 128KB Download fileOpen Data

On the other hand, a disassortative structure was not commonly observed among PINs. Although <Knn(k)> for the yeast, worm, fly, or human PIN is a decreasing function following k-ν, the malaria parasite PIN is not disassortative (Figure 2A and additional file 2: Figure S2). Note that all of the four yeast PIN datasets showed a disassortative structure regardless of the controversy on the presence of hub-hub suppression (see additional file 2: Figure S2; see Discussion). The values of ν for the eight PINs in yeast, worm, fly, and human examined are significantly non-zero (P < 3×10-4), while the value of ν for the malaria parasite PIN is not significantly different from zero (P ~ 0.27). The difference in ν between the malaria parasite PIN and each of the other eight PINs is also significant (P < 1×10-3; analysis of covariance). In agreement with these observations, the correlation coefficient r between degrees of connected nodes in the yeast, worm, fly, or human PIN is negative, while that in the malaria parasite PIN is nearly zero (Table 1).

thumbnailFigure 2. Difference in assortativity among eukaryote PINs. (A) <Knn(k)>, the mean of the degrees among the neighbors of k-degree nodes, in the PINs of yeast (black square), worm (magenta plus), fly (blue triangle), human (green cross), and malaria parasite (red diamond). For yeast and human PINs, <Knn(k)> for MIPS and Rual et al. datasets, respectively, are shown, and the results for the other yeast and human datasets are provided in Additional file 2: Figure S2. Dashed lines in black, magenta, blue, green, and red represent k-0.47, k-0.29, k-0.35, k-0.26, and k-0.02, respectively. (B) Duplication of a node changes the value of ν in <Knn(k)> ~ k-ν. A diagram below each network indicates the distribution of <Knn(k)> and the value of ν. (C) The distribution of <Knn(k)> in the networks generated by the DDD model with the asymmetric divergence (DDD+A; left) and the symmetric divergence (DDD+S; right). Blue diamonds, green crosses, and red diamonds indicate the results with σ = -0.05 (-0.05), -0.03 (-0.03), and 0 (0), respectively, for DDD+A (DDD+S). These results were obtained by taking the mean among 100 networks generated by simulations. Black squares indicate <Knn(k)> in the yeast PIN for MIPS. Dashed lines in black, blue, green, and red represent k-0.47 (k-0.47), k-0.51 (k-0.48), k-0.37 (k-0.38), and k-0.18 (k-0.26), respectively, for DDD+A (DDD+S).

Additional file 2. Figure S2: <Knn(k)> in the yeast and human PINs. (A) <Knn(k)> in the yeast PIN for four different datasets. Dashed lines in black, blue, green, and red represent k-0.47, k-0.33, k-0.33, and k-0.25, respectively. (B) <Knn(k)> in the human PIN for two datasets. Dashed lines in black and red represent k-0.26 and k-0.27, respectively.

Format: TIFF Size: 122KB Download fileOpen Data

We next examined a possible evolutionary scenario generating the difference in assortativity of PINs among species on the basis of a duplication-divergence model. Figure 2B (middle) illustrates a simple network containing a low-degree node (e.g., A) and a high-degree node (e.g., C) that are connected to each other. In a duplication process, a randomly selected node is duplicated to generate a new node having the same links as the original node, followed by a divergence process in which some links are eliminated. If a low-degree node A is duplicated to generate a new node A' (Figure 2B, right), the value of ν in a network increases, because a degree of a node (C) connected to a low-degree node increases. On the other hand, duplication of a high-degree node (C) causes the value of ν to decrease, because a degree of a node (A) connected to a high-degree node increases (Figure 2B, left). Therefore, we can hypothesize that duplications of low- and high-degree nodes in a disassortative network have an effect to make the value of ν larger and smaller, respectively.

To examine this issue in more detail, we developed a new duplication-divergence model named the degree-dependent duplication (DDD) model by modifying the NHD model that we proposed previously [14]. In the DDD model, a duplication of a node occurs depending on its degree. In a duplication process, a randomly selected node is duplicated with a probability proportional to 1 + σk, where k is the degree of the node, and σ is a parameter determining the duplicability of the node (see Methods for details).

As for a divergence process, we examined two different models, the asymmetric divergence and the symmetric divergence (Figure 3). In the former, the removal of links occurs in only one of the duplicated nodes, while in the latter, links are lost from both of the duplicates with an equal probability. In this study, we conducted simulations using four different models: NHD with the asymmetric and symmetric divergence, which is referred to as NHD+A and NHD+S, respectively, and DDD with the asymmetric and symmetric divergence (DDD+A and DDD+S, respectively) (Table 2).

thumbnailFigure 3. Degree-dependent duplication (DDD) model. In the DDD model, the probability of a duplication of a node is dependent on the degree of the node. In the network at the left, node A is duplicated to generate node A' with the probability of (1 + 4σ)/1,000, because the degree of node A is four (see Methods). In the asymmetric divergence, each of the links to node A' is removed with a uniform probability α in the divergence process (top, second column). In the symmetric divergence, one of the two duplicated links (e.g. either A-B link or A'-B link) to each node connecting to A and A' (nodes B-E) is eliminated with a probability α (bottom, second column). A new link between nodes A and A' is attached with the probability proportional to the number of common neighbors (nN) shared by these nodes (third column). In this case, the probability is 2β, because these nodes share two common neighbors (nodes C and D).

Table 2. Statistics of the networks generated by the NHD and DDD models

Simulation studies showed that the value of ν increases (the slope becomes steeper) as σ decreases for both DDD+A and DDD+S (Figure 2C). We found that the disassortative structures of the yeast (MIPS), worm, and fly PINs were successfully reproduced by DDD+A and DDD+S when the values of σ are negative (Table 2, additional file 3: Figure S3). The human (Rual et al.) PIN was best regenerated by DDD+S with σ = 0. Note that, although σ = 0 means no degree-dependency of duplicability, where the DDD model becomes identical to the NHD model, the resultant network is still disassortative (Figure 2C). Therefore, in order to generate a network similar to the malaria parasite PIN, the value of σ has to be positive, i.e., high-degree nodes should be duplicated more preferentially than low-degree nodes. In fact, our analysis showed that the assortativity of the malaria parasite PIN was reproduced by the DDD model with a positive σ (see Table 2 and additional file 3: Figure S3E).

Additional file 3. Figure S3: Distribution of <Knn(k)> in the PINs and the networks generated by the NHD and DDD models. Distribution of <Knn(k)> in the PIN (black square) and the networks by DDD+A (red diamond), DDD+S (blue triangle), NHD+A (green cross), and NHD+S (purple plus) for (A) yeast, (B) worm, (C) fly, (D) human, and (E) malaria parasite. The results for the NHD and DDD models were obtained by taking the mean among 100 networks generated by simulations. A dashed line represents a regression line. The slope (ν) of each regression line is shown in Table 2.

Format: TIFF Size: 1.1MB Download fileOpen Data

The effect of link gains after gene duplication was also investigated. However, random attachments of links to duplicated nodes do not essentially affect the assortativity of resultant networks (additional file 4: Figure S4).

Additional file 4. Figure S4: Distribution of <Knn(k)> in the networks generated by simulations with link gains for (A) the DDD+A and (B) DDD+S models. ε is the probability of a link gain (see Methods). The results were obtained by taking the mean among 100 networks generated by simulations. A dashed line represents a regression line (ν = 0.51 and 0.48 for the asymmetric and symmetric divergence, respectively).

Format: TIFF Size: 509KB Download fileOpen Data

We also examined the average shortest path length, <L> and the extent of modularity, M in PINs (Table 1) and simulation-generated networks (Table 2). In agreement with our previous study [14], the values of <L> in the networks by NHD+A are larger than the actual values in PINs for all species. DDD+A gave the <L> values that are slightly closer to the actual values than NHD+A. On the other hand, for both NHD and DDD models, the symmetric divergence generated networks having larger values of <L>. It was reported that PINs are highly modular [32], but simulation-generated networks showed even higher values of M than the PINs (Table 2). Moreover, when we compare four networks generated by different models for each species, the value of M is positively correlated with that of <L>, which is consistent with Zhang and Zhang [33].

To see whether the difference in duplicability dependent on degrees accounts for the difference in assortativity, we analyzed orthologous relationships using proteomes in 55 eukaryote species. Wapinski et al. [34] provided data of orthologous relationships among 19 Ascomycota fungi including S. cerevisiae. In their dataset, all proteins in these 19 species are classified into ortholog groups, each of which consists of the proteins descended from a single ancestral protein in their most recent common ancestor. To evaluate the duplicability of a given gene in S. cerevisiae, we examined orthologous relationships between S. cerevisiae and each of the other 18 Ascomycota fungi. A phylogenetic tree was constructed using orthologous genes from the two species, and the number of gene duplication events observed in the phylogenetic tree was regarded as a duplicability of the gene (see Methods). In the same manner, we also evaluated gene duplicability in C. elegans, D. melanogaster, H. sapiens, and P. falciparum using other databases (see Methods).

Figure 4 and additional file 5: Figure S5 indicate the relationships between the degree and the duplicability. We classified all proteins in each PIN into three categories containing similar numbers of proteins: low- (k = 1), middle- (k = 2 - 6), and high- (k > 6) degree proteins. The results showed that the duplicability of low- and middle-degree proteins is significantly higher than that of high-degree proteins in the yeast and worm PINs (Figure 4 and additional file 5: Figure S5). The same trend was also observed in the fly PIN. In contrast, the duplicability of low- and middle-degree proteins is significantly lower than that of high-degree proteins in the malaria parasite PIN, while no clear trends were observed in the human PIN (Figure 4). These observations are consistent with the above hypothesis; i.e., the differences in degree-dependent duplicability of genes account for the difference in assortativity among species.

thumbnailFigure 4. Gene duplicability dependent on degrees. Correlation between the degree and the duplicability of proteins in the (A) yeast, (B) worm, (C) fly, (D) human, and (E) malaria parasite PINs. L, M, and H represent low- (k = 1), middle- (k = 2-6), and high-degree (k > 7) proteins, respectively. A vertical axis indicates the mean duplicability in each category. A species name above each diagram denotes the species with which the orthologous relationships were examined. For example, in the top left diagram in (A), gene duplicabilities were investigated using a phylogenetic tree containing S. cerevisiae and S. paradoxus genes. In (A) and (C), the results for MIPS and Rual et al. datasets, respectively, are shown, and those for other yeast and human datasets are provided in Additional file 5: Figure S5. In each diagram, the duplicability of proteins in each category is compared to one another by using the Wilcoxon rank-sum test with the Bonferroni correction. *, P < 0.05; **, P < 0.01; ***, P < 0.001.

Additional file 5. Figure S5: Gene duplicability dependent on degree in the yeast and human PINs. Duplicability of genes in the yeast and human PINs for (A) Batada et al., (B) Reguly et al., (C) Yu et al., and (D) Stelzl et al.

Format: TIFF Size: 614KB Download fileOpen Data

We also investigated the differences in degrees and duplicabilities among different functional categories in yeast and malaria parasite proteins. Table 3 shows the mean degree and the mean duplicability of yeast proteins belonging to each category obtained from the GO (gene ontology) slim database in the Saccharomyces Genome Database [3]. Interestingly, genes in several categories with significantly higher (lower) degrees on average showed significantly lower (higher) duplicabilities. A similar analysis was conducted for malaria parasite proteins using the GO in the PlasmoDraft database [35] (Table 4). In this case, functional categories with high (low) degrees tend to show high (low) duplicabilities (additional file 6: Figure S6), which is an opposite trend to that observed in yeast proteins. The slopes in the degree-duplicability relationships are significantly different between the yeast and malaria parasite PINs (P < 0.01; analysis of covariance).

Table 3. Degrees and duplicabilities of the genes in the yeast PIN belonging to each functional category

Table 4. Degrees and duplicabilities of the genes in the malaria parasite PIN belonging to each functional category

Additional file 6. Figure S6: Relationships between mean degrees and mean duplicabilities for different functional categories in (A) yeast and (B) malaria parasite. A dot indicates each functional category, and its size represents the number of proteins in the category. A dashed line indicates a regression line.

Format: TIFF Size: 240KB Download fileOpen Data

Discussion

Disassortative structures in PINs

In this paper, we showed that the yeast, worm, fly, and human PINs are disassortative, while the malaria parasite PIN is not disassortative. Therefore, a disassortative structure is not a common feature of PINs. By comparing proteomes and conducting simulations, we demonstrated that the difference in assortativity can well be explained by assuming that the duplicability of proteins is dependent on its degree and the dependency is different among species. If low-degree proteins have preferentially duplicated in evolution as in yeast, worm, and fly, or there is no trend in the duplicability between low- and high-degree proteins as in the human, the PIN becomes disassortative. On the other hand, a PIN without a disassortative structure could be generated if high-degree proteins have preferentially duplicated as in malaria parasite. Therefore, for explaining the presence of a disassortative structure in PINs, the "selectionist view" as proposed by Maslov and Sneppen [17] is not necessary. It is rather likely that a disassortative structure observed in PINs is merely a byproduct of preferential duplications of low-degree proteins.

Although several authors [25,27] claimed that the suppression of hub-hub interactions may be an artifact, our analyses using four recently published high-quality yeast PIN datasets demonstrated that all of the four PINs are in fact disassortative. In Batada et al. [27], they mentioned that the interactions between hubs are not suppressed, where a hub was defined as a node with k > 21 (top 10% of the nodes). However, the same data showed that the interactions between nodes with relatively high degrees (20 < k < 30) and those with very high degrees (k > 50) are suppressed and interactions between low-degree nodes (k < 3) and high-degree nodes (k > 50) are favored. Therefore, Batada et al.'s data [27] is not inconsistent with the presence of a disassortative structure. Moreover, the updated version [30] of their multi-validated yeast PIN data clearly showed disassortativity (see additional file 2: Figure S2A). These results suggest that a disassortative structure in the yeast PIN is not an artifact.

Fernández [36] classified yeast proteins into several categories on the basis of the existence of orthologous proteins in other genomes, e.g., the proteins that are present in eukaryotes, eubacteria, and archaebacteria, or those present in other fungi. He found that an "ancient" network consisting of proteins that are present in diverse organisms tends to be assortative and the assortative ancient network evolved into the disassortative PIN in yeast at the present time. To explain this evolutionary trend, Fernández [36] hypothesized a model in which an attachment of new links between similar-degree nodes is disfavored. Note that our DDD model is also consistent with the evolutionary trend toward higher disassortativity (see additional file 7: Figure S7).

Additional file 7. Figure S7: Evolutionary trend toward higher disassortativity in the networks generated by the DDD model. Fernández [36] categorized yeast proteins into five classes: proteins that are present in all organisms (3.5% of the yeast proteome), in eubacteria (9.5%), in archaebacteria but not in eubacteria (8%), in eukaryotes diverging earlier than fungi (19%), in other fungi (36%), and exclusively in yeast (24%). By using these fractions, we calculated the numbers of nodes contained in ancient networks as 136, 505, 1,556, and 3,268. We generated networks by the DDD model (asymmetric divergence) with σ = -0.05, α = 0.50, and β = 0.019, which were used for regenerating the yeast PIN (see Table 1). For each ancient network, we calculated the mean value of ν from 100 simulation-generated networks.

Format: TIFF Size: 48KB Download fileOpen Data

PIN data include binary interaction information that is directly obtained from experiments such as Y2H and indirectly inferred from protein complex data. Wang and Zhang pointed out that these two types of data may give quite different images of PINs [32]. We therefore excluded protein complex data from the MIPS database and reexamined the yeast PIN. The result, however, showed no significant differences in disassortativity between PINs with and without complex data (additional file 8: Figure S8). We should also note that PINs are a collection of potential interactions that occur at different times in different cells or subcellular locations, but we treated all interactions simultaneously. To see how such treatment affects our results, we examined yeast subnetworks constructed from the proteins in each subcellular localization separately. As shown in additional file 9: Figure S9, although the extent of disassortativity varies among different subcellular locations due to smaller sample sizes, in general such subnetworks also show disassortative structures.

Additional file 8. Figure S8: Disassortative structure in the yeast PIN with and without protein complex data. Distribution of <Knn(k)> in the yeast PIN with (black square) and without protein complex data (red triangle).

Format: TIFF Size: 127KB Download fileOpen Data

Additional file 9. Figure S9: Disassortative structures of the yeast sub-PINs constructed from proteins in different subcellular localizations. ν = 0.40, 0.48, 0.29, 0.17, and 0.10 for cytoplasm, cell periphery, punctate composite, nucleolus, and nucleus, respectively. The subcellular localization data were downloaded from http://www.umich.edu/~zhanglab/download/Wang_PLoSCB_Suppl/description.htm webcite. Subcellular localizations containing >100 proteins and >30 interactions were shown.

Format: TIFF Size: 141KB Download fileOpen Data

Neofunctionalization and subfunctionalization

It is generally thought that gene duplication is a primary source for generating organismal complexity. Neofunctionalization and subfunctionalization are proposed as a fate of duplicated genes. Neofunctionalization hypothesizes that the presence of redundant copies of genes allows one duplicate to be free from selective pressure, and thus one of the duplicates can accumulate random mutations and potentially acquire novel functions [37]. Subfunctionalization argues that each of the duplicates accumulates degenerative mutations, resulting in the division of ancestral functions into complementary subsets [38]. Both neofunctionalization and subfunctionalization contribute to protein evolution [39-42].

In the duplication-divergence model, neofunctionalization and subfunctionalization are modeled as a random attachment of new links [20] and a random loss of links to duplicated nodes [22], respectively. Our simulation studies showed a high rate of link losses (α > 0.5; see Table 2), suggesting the importance of subfunctionalization. On the other hand, link gains were shown to have only minor effects to the structure of PINs (additional file 4: Figure S4). Altogether, our study supports a view that subfunctionalization plays a significant role in shaping the structures of PINs, which is consistent with a recent study by Gibson and Goldberg [43].

As for subfunctionalization, it has been reported that the number of links retained after gene duplication is considerably different between two duplicates [44]. For this reason, several previous studies used the asymmetric divergence model [14,45-48]. However, "complete" asymmetric divergence in which links are eliminated from only one of the duplicates is unrealistic, and the actual situation should be between asymmetric divergence and symmetric divergence. We have therefore conducted simulation studies using both symmetric and asymmetric divergence models. The results, however, did not show essential differences (Table 2).

Degree-duplicability correlations

In this study, we found that lower-degree proteins tend to duplicate more frequently in the yeast, worm, and fly PINs (Figure 4). One caveat of this analysis is that the degrees of proteins used in these analyses are present-day degrees and thus might be different from those prior to duplication. Because the number of interactions often changes greatly after duplications [19,41], the observed degree-duplicability correlation may also be interpreted as that degrees decrease after duplication by divergence rather than that the duplicability itself is dependent on a degree. However, under this interpretation, it is difficult to explain the difference in the trend of degree-duplicability correlations among different species (Figure 4). Moreover, as mentioned above, the duplication-divergence model without considering degree-dependent duplicability is insufficient to explain the extent of disassortativity in the yeast, worm, and fly PINs.

Prachumwat and Li [49] found a positive correlation between degree and the proportion of unduplicated proteins in the yeast proteome, which is consistent with our results. Liang et al. [50] showed that the extent of protein under-wrapping, which indicates the solvent accessibility to backbone hydrogen bonds, is negatively correlated with gene duplicability in Escherichia coli, yeast, worm, fly, human, and Arabidopsis thaliana. They also found that the correlation becomes weaker for more complex organisms. It was reported that the extent of protein under-wrapping is strongly positively correlated with the degree of proteins in yeast [51]; therefore, their results are also consistent with ours (Figure 4). In Liang et al. [50], gene duplicability was defined as a protein family size. In this study, we evaluated gene duplicability by directly counting the number of gene duplication events using orthologous genes in closely related species. Therefore, we considered more recent gene duplications than Prachumwat and Li [49] and Liang et al. [50]. He and Zhang showed that low-degree nodes are less important [52] and less important genes tend to duplicate more frequently [53]. Their results are also consistent with ours.

Why low-degree proteins tend to be duplicated frequently in the evolution of the yeast PIN? The actual reason is currently unclear. Yet, as indicated in Table 3, some functional categories showed low degrees but high duplicabilities on average, while others showed high degrees and low duplicabilities. The former includes metabolic processes for carbohydrates or vitamins. Marland et al. [54] reported that the duplicability of genes involved in metabolism, especially in central metabolism, is significantly higher than that for non-metabolic genes in both yeast and E. coli. Moreover, most of the enzymes involved in these metabolic processes bind only to a specific substrate, and probably for this reason, their degrees are relatively low. The categories showing a high degree and a low duplicability are exemplified by organelle organization and biogenesis, RNA metabolic process, and transcription (see Table 3). The category "organelle organization and biogenesis" contains many proteins involved in the organization of actin filaments or cytoskeletons. Actin and actin-related proteins are known to bind many partner proteins [55]. At the same time, they are highly conserved from yeasts to humans [56], and therefore gene duplications of these genes are apparently rare.

Why, then, are high-degree proteins duplicated preferentially in the evolution of the malaria parasite PIN? Table 4 indicates that genes belonging to the categories pathogenesis and interaction with host tend to have high degrees and high duplicability, though the numbers of genes in these categories are not large. These categories include many proteins of Pf erythrocyte membrane protein 1 (PfEMP1) family. PfEMP1 proteins interact with receptors in the host and change the morphology of the host cell [57]; therefore, the duplications of these genes would be beneficial to malaria parasites. Moreover, a PfEMP1 protein has a feature of an adhesive molecule [58] and can bind many partner proteins. However, the actual reason for the opposite trend of gene duplicability in the entire PIN of malaria parasite to that of other eukaryotes is currently unclear. It would be intriguing to investigate the PINs of other parasitic organisms.

These observations suggest that the duplicability of the proteins having a given function can be different and determined by each organism's living environment. The duplicability of genes for each species would, in turn, determine the overall structure of a PIN. The availability of high-quality interaction data from various species including parasitic organisms will help us to clarify the relationships between environments where organisms inhabit and the evolution of their PINs in greater detail.

Conclusions

In this study, we showed that disassortative structures are not common features among eukaryotes by examining nine different PINs from five eukaryote species. We found that low-degree proteins tend to show high duplicabilities for the PIN with a disassortative structure (e.g. yeast), while an opposite trend was observed for the PIN without disassortativity (e.g. malaria parasite). Simulation studies on the basis of gene duplication and divergence also supported these observations. Therefore, for explaining the presence of disassortative structure, any selective forces on the entire structure of PINs are unnecessary. Our results indicate that overall structure of PINs is primarily determined by local processes in the course of evolution.

Methods

PIN and GO data

The datasets of the yeast PIN were obtained from the MIPS (Munich Information Center for Protein Sequences) database http://mips.gsf.de webcite (18 May 2006) [3], Batada et al. [30], Reguly et al. [29], and Yu et al. [4]. Worm and Fly PIN data were obtained from Li et al. [5] and IM Browser http://proteome.wayne.edu/PIMdb.html webcite[59], respectively. The datasets of the human PIN were from Rual et al. [7] and Stelzl et al. [8], and Malaria parasite PIN was from LaCount et al. [9]. Some of these datasets contain components that are not connected to each other. In these cases, we used the largest component for the analysis. All self-interactions were removed. The yeast GO slim dataset was downloaded from the ftp site of Saccharomyces Genome Database ftp://genome-ftp.stanford.edu/pub/yeast/literature_curation/ webcite. The GO dataset for P. falciparum was obtained from PlasmoDraft [35]. The yeast PIN excluding protein complex data was obtained from http://www.umich.edu/~zhanglab/download.htm webcite[32].

Modularity

PINs have a modular structure, in which interactions between proteins are much denser within a module than between modules [32]. The modularity m for a particular separation of a network is calculated by <a onClick="popup('http://www.biomedcentral.com/1471-2148/10/358/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2148/10/358/mathml/M3">View MathML</a>, where N is the number of modules, L is the number of links in a network, ls is the number of links within module s, and ks is the sum of the degrees of nodes in module s [60]. The separation that maximizes m is considered to be optimal. The maximum m among all possible separation of a given network is referred to as the modularity of the network and denoted as M. We used the method by Vincent et al. [61] for searching the optimal separation, since the method gives excellent accuracy for module separation and outperforms other methods in terms of a computational time [61].

Simulation

The simulation studies were conducted using a duplication-divergence model in a similar manner to Hase et al. [14] with a modification. In the DDD model, a new node and new links are added to the network according to the following rules at each time step of a simulation. (1) A node in a network is randomly selected (A). Node A is duplicated to generate a new node (A') with a probability (1 + σk)/1,000 (when 1 + σk >0), where k is the degree of node A, and σ is a parameter determining the duplicability of a node for each species. The probability is defined to be 0 when 1 + σk is lower than 0. The interacting pattern of node A' is identical to that of node A. (2) For a divergence process, two different models were examined: the asymmetric divergence [14] and the symmetric divergence (Figure 3). In the former, links to node A' is removed with a uniform probability α. In the latter, for each of the nodes connecting to A and A' (e.g. node B), one of the two links (either A-B link or A'-B link) is randomly chosen and is removed with a probability α (Figure 3). (3) A new link between node A and node A' is created with a probability βnN (when βnN ≤ 1), where nN is the number of common neighbors shared by these two nodes. The probability is defined to be one when βnN is greater than 1. If there are no links to node A' after these processes (all links to node A' were removed and no links were generated), node A' is not added to the network.

The processes (1)-(3) were repeated until the number of nodes in a network became the same as those in the PINs for a given species. We used various values of σ, α, and β and performed simulations. The value of σ was changed from -0.05 to 0 by 0.01 and from 0 to 10.0 by 0.1, and the values of α and β were changed from 0 to 1 by 0.01 and 0.001, respectively. For a given set of σ, α, and β, we conducted simulations 100 times. We then calculated the mean of <k> and the mean of <C> from the 100 networks. Moreover, we calculated the mean of <Knn(k)> from the 100 networks. The value of ν represents the slope of the regression line of the mean of <Knn(k)>. In Table 2, the values of σ, α, and β that could reproduce <k>, <C>, and ν in each PIN are shown.

We also examined a model considering link gains. In this model, the following process was added after the process (3) in the DDD model: A link is attached between each of the two duplicated nodes (A and A') and a randomly selected node with a probability ε. The value of ε was changed from 0.01 to 0.1. σ = -0.05 was used for both asymmetric and symmetric divergence. The values of α and β were determined in the same way as the DDD model.

Gene duplicability

We examined the duplicability of genes in yeast, worm, fly, human, and malaria parasite by using orthologous relationships among closely related species. For yeast genes, we used the dataset of ortholog groups for 19 Ascomycota fungi including S. cerevisiae downloaded from Fungal Orthogroups Repository http://www.broad.mit.edu/regev/orthogroups/ webcite[34]. This dataset provides ortholog groups, each of which consists of genes descended from a gene in the last common ancestor of 19 Ascomycota fungi. Duplicability of genes in the yeast PIN was evaluated by considering orthologous relationships between S. cerevisiae and each of the other 18 fungal species. Let us consider the comparison between S. cerevisiae and S. paradoxus, for instance. Because some ortholog groups do not contain any genes from some of the 19 species, we consider only ortholog groups containing at least one gene from both S. cerevisiae and S. paradoxus. Suppose that a given ortholog group contains two genes from S. cerevisiae and three genes from S. paradoxus (and more from other species). We constructed a phylogenetic tree from these five genes by the neighbor-joining (NJ) method [62] using ClustalW [63]. We then counted the number of duplication events from the tree using Notung (ver. 2.5) [64]. This number is regarded to be duplicabilities for both of two S. cerevisiae genes. In this way, the value of duplicability was assigned to each protein in the yeast PIN. Similarly, we calculated duplicability of genes contained in the worm, fly, human, and malaria parasite PINs. For worm and malaria parasite genes, we used OrthoMCL-DB version 2 http://orthomcl.cbil.upenn.edu webcite[65], which contains ortholog groups of three nematode species including C. elegans and those of six Haemosporidian species including P. falciparum. For fly and human genes, we used ortholog groups of 12 Drosophila species and those of 11 vertebrate species including seven mammals, respectively, downloaded from OrthoDB http://cegg.unige.ch/orthodb webcite[66].

List of abbreviations

DDD: degree-dependent duplication; NHD: Non-uniform heterodimerization; PIN: protein-protein interaction network

Authors' contributions

TH, YN, and HT designed the study; TH analyzed data and performed simulation studies; TH and YN wrote the paper. All authors read and approved the final manuscript.

Acknowledgements

The authors thank T. Masuda, Y. Fukuoka, T. Kaminuma, K. Mogushi, S. Nagaie, and S. Nakagawa for their useful comments and discussion. This study was supported by the Ministry of Education, Culture, Sports, Science and Technology, Japan, grant 20770192 to YN.

References

  1. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, Johnston M, Fields S, Rothberg JM: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae.

    Nature 2000, 403:623-627. PubMed Abstract | Publisher Full Text OpenURL

  2. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y: A comprehensive two-hybrid analysis to explore the yeast protein interactome.

    Proc Natl Acad Sci USA 2001, 98:4569-4574. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  3. Guldener U, Munsterkotter M, Oesterheld M, Pagel P, Ruepp A, Mewes HW, Stumpflen V: Mpact: The MIPS protein interaction resource on yeast.

    Nucleic Acids Res 2006, 34:436-441. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  4. Yu H, Braun P, Yildirim MA, Lemmens I, Venkatesan K, Sahalie J, Hirozane-Kishikawa T, Gebreab F, Li N, Simonis N, Hao T, Rual JF, Dricot A, Vazquez A, Murray RR, Simon C, Tardivo L, Tam S, Svrzikapa N, Fan C, de Smet AS, Motyl A, Hudson ME, Park J, Xin X, Cusick ME, Moore T, Boone C, Snyder M, Roth FP, Barabási AL, Tavernier J, Hill DE, Vidal M: High-quality binary protein interaction map of the yeast interactome network.

    Science 2008, 322:104-110. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  5. Li S, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M, Vidalain PO, Han JD, Chesneau A, Hao T, Goldberg DS, Li N, Martinez M, Rual JF, Lamesch P, Xu L, Tewari M, Wong SL, Zhang LV, Berriz GF, Jacotot L, Vaglio P, Reboul J, Hirozane-Kishikawa T, Li Q, Gabel HW, Elewa A, Baumgartner B, Rose DJ, Yu H, Bosak S, Sequerra R, Fraser A, Mango SE, Saxton WM, Strome S, Van Den Heuvel S, Piano F, Vandenhaute J, Sardet C, Gerstein M, Doucette-Stamm L, Gunsalus KC, Harper JW, Cusick ME, Roth FP, Hill DE, Vidal M: A map of the interactome network of the metazoan C. elegans.

    Science 2004, 303:540-543. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  6. Giot L, Bader JD, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao YL, Ooi CE, Godwin B, Vitols E, Vijayadamodar G, Pochart P, Machineni H, Welsh M, Kong Y, Zerhusen B, Malcolm R, Varrone Z, Collis A, Minto M, Burgess S, McDaniel L, Stimpson E, Spriggs F, Williams J, Neurath K, Ioime N, Agee M, Voss E, Furtak K, Renzulli R, Aanensen N, Carrolla S, Bickelhaupt E, Lazovatsky Y, DaSilva A, Zhong J, Stanyon CA, Finley RL Jr, White KP, Braverman M, Jarvie T, Gold S, Leach M, Knight J, Shimkets RA, McKenna MP, Chant J, Rothberg JM: A protein interaction map of Drosophila Melanogaster.

    Science 2003, 302:1727-1736. PubMed Abstract | Publisher Full Text OpenURL

  7. Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, Klitgord N, Simon C, Boxem M, Milstein S, Rosenberg J, Goldberg DS, Zhang LV, Wong SL, Franklin G, Li S, Albala JS, Lim J, Fraughton C, Llamosas E, Cevik S, Bex C, Lamesch P, Sikorski RS, Vandenhaute J, Zoghbi HY, Smolyar A, Bosak S, Sequerra R, Doucette-Stamm L, Cusick ME, Hill DE, Roth FP, Vidal M: Towards a proteome-scale map of the human protein-protein interaction network.

    Nature 2005, 437:1173-1178. PubMed Abstract | Publisher Full Text OpenURL

  8. Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S, Timm J, Mintzlaff S, Abraham C, Bock N, Kietzmann S, Goedde A, Toksöz E, Droege A, Krobitsch S, Korn B, Birchmeier W, Lehrach H, Wanker EE: A human protein-protein interaction network: a resource for annotating proteome.

    Cell 2005, 122:957-968. PubMed Abstract | Publisher Full Text OpenURL

  9. LaCount DJ, Vignali M, Chettier R, Phansalkar A, Bell R, Hesselberth JR, Schoenfeld LW, Ota I, Sahasrabudhe S, Kurschner C, Fields S, Hughes RE: A protein interaction network of malaria parasite Plasmodium falciparum.

    Nature 2005, 438:103-107. PubMed Abstract | Publisher Full Text OpenURL

  10. Barabási AL, Albert R: Emergence of scaling in random networks.

    Science 1999, 286:509-512. PubMed Abstract | Publisher Full Text OpenURL

  11. Barabási AL, Oltvai ZN: Network biology: understanding the cell's functional organization.

    Nat Rev Genet 2004, 5:101-113. PubMed Abstract | Publisher Full Text OpenURL

  12. Costa LF, Rodrigues FA, Travieso G, Boas V: Characterization of complex networks: A survey of measurements.

    ADV PHYS 2007, 56:167-242. Publisher Full Text OpenURL

  13. Jeong H, Mason SP, Barabási AL, Oltvai ZN: Lethality and centrality in protein networks.

    Nature 2001, 411:41-42. PubMed Abstract | Publisher Full Text OpenURL

  14. Hase T, Niimura Y, Kaminuma T, Tanaka H: Non-uniform survival rate of heterodimerization links in the evolution of the yeast protein-protein interaction network.

    PLoS ONE 2008, 3:e1667. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  15. Callaway DS, Hopcroft JE, Kleinberg JM, Newman MEJ, Strogatz SH: Are randomly grown graphs really random?

    Phys Rev E 2001, 64:041902. Publisher Full Text OpenURL

  16. Newman ME: Assortative mixing in networks.

    Phys Rev Lett 2002, 89:208701. PubMed Abstract | Publisher Full Text OpenURL

  17. Maslov S, Sneppen K: Specificity and stability in topology of protein networks.

    Science 2002, 296:910-913. PubMed Abstract | Publisher Full Text OpenURL

  18. Pastor-Satorras R, Vazquez A, Vespignani A: Dynamical and correlation properties of the internet.

    Phys Rev Lett 2001, 87:258701. PubMed Abstract | Publisher Full Text OpenURL

  19. Wagner A: The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes.

    Mol Biol Evol 2001, 18:1283-1292. PubMed Abstract | Publisher Full Text OpenURL

  20. Solé RV, Pastor-Satorras R, Smith ED, Kepler T: A model of large-scale proteome evolution.

    Adv Comp Syst 2002, 5:43-54. Publisher Full Text OpenURL

  21. Pastor-Satorras R, Smith E, Solé RV: Evolving protein interaction networks through gene duplication.

    J Theor Biol 2003, 222:199-210. PubMed Abstract | Publisher Full Text OpenURL

  22. Vazquez A: Growing networks with local rules: preferential attachment, clustering hierarchy and degree correlations.

    Phys Rev E 2003, 67:056104. Publisher Full Text OpenURL

  23. Ispolatov I, Krapivsky PL, Mazo I, Yuryev A: Cliques and duplication-divergence network growth.

    New Journal of Physics 2005, 7:145. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  24. von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P: Comparative assessment of large-scale data sets of protein-protein interactions.

    Nature 2002, 417:399-403. PubMed Abstract | Publisher Full Text OpenURL

  25. Aloy P, Russell RB: Potential artefacts in protein-interaction networks.

    FEBS Lett 2002, 530:253-254. PubMed Abstract | Publisher Full Text OpenURL

  26. Maslov S, Sneppen K: Protein interaction networks beyond artifacts.

    FEBS Lett 2002, 530:255-256. PubMed Abstract | Publisher Full Text OpenURL

  27. Batada NN, Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hurst LD, Tyers M: Stratus not altocumulus: a new view of the yeast protein interaction network.

    PLoS Biol 2006, 4:e317. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  28. Hakes L, Pinney JW, Robertson DL, Lovell SC: Protein-protein interaction networks and biology-what's the connection?

    Nature Biotechnol 2008, 26:69-72. Publisher Full Text OpenURL

  29. Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hon GC, Myers CL, Parsons A, Friesen H, Oughtred R, Tong A, Stark C, Ho Y, Botstein D, Andrews B, Boone C, Troyanskya OG, Ideker T, Dolinski K, Batada NN, Tyers M: Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae.

    J Biol 2006, 5:11. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  30. Batada NN, Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hurst LD, Tyers M: Still stratus not altocumulus: further evidence against the Date/Party hub distinction.

    PLoS Biol 2007, 5:e154. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  31. Suthram S, Sittler T, Ideker T: The Plasmodium protein network diverges from those of other eukaryotes.

    Nature 2005, 438:108-112. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  32. Wang Z, Zhang J: In search of the biological significance of modular structures in protein networks.

    PLoS Compt Biol 2007, 3:e107. Publisher Full Text OpenURL

  33. Zhang Z, Zhang J: A big world inside small-world networks.

    PLoS ONE 2009, 4:e5686. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  34. Wapinski I, Pfeffer A, Friedman N, Regev A: Natural history and evolutionary principles of gene duplication in fungi.

    Nature 2007, 449:54-61. PubMed Abstract | Publisher Full Text OpenURL

  35. Brehelin L, Dufayard JF, Gascuel O: PlasmoDraft: a database of Plasmodium falciparum gene function prediction based on postgenomic data.

    BMC Bioinformatics 2008, 16:440. BioMed Central Full Text OpenURL

  36. Fernández A: Molecular basis for evolving modularity in the yeast protein interaction network.

    PLoS Compt Biol 2007, 3:e226. Publisher Full Text OpenURL

  37. Ohno S: Evolution by gene duplication. New York: Springer; 1970. OpenURL

  38. Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J: Preservation of duplicate genes by complementary, degenerative mutations.

    Genetics 1999, 151:1531-1545. PubMed Abstract | PubMed Central Full Text OpenURL

  39. Lynch M, Force A: The probability of duplicate gene preservation by subfunctionalization.

    Genetics 2000, 154:459-473. PubMed Abstract | PubMed Central Full Text OpenURL

  40. Blanc G, Wolfe KH: Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution.

    Plant Cell 2004, 16:1679-1691. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  41. He X, Zhang J: Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution.

    Genetics 2005, 169:1157-1164. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  42. Freilich S, Massingham T, Blanc E, Goldovsky L, Thornton JM: Relating tissue specialization to the differentiation of expression of singleton and duplicate mouse protein.

    Genome Biol 2006, 7:R89. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  43. Gibson TA, Goldberg DS: Questioning the ubiquity of neofunctionalization.

    PLoS Compt Biol 2009, 5:e1000252. Publisher Full Text OpenURL

  44. Wagner A: Asymmetric functional divergence of duplicate genes in yeast.

    Mol Biol Evol 2002, 19:1760-1768. PubMed Abstract | Publisher Full Text OpenURL

  45. Wagner A: How the global structure of protein interaction networks evolves.

    Proc R Soc Lond B 2003, 270:457-466. Publisher Full Text OpenURL

  46. Kim J, Krapivsky PL, Kahng B, Render S: Infinite-order percolation and giant fluctuations in a protein interaction network.

    Phys Rev E 2002, 66:055101. Publisher Full Text OpenURL

  47. Chung F, Lu L, Dewey TG, Galas DJ: Duplication models for biological networks.

    J Comput Biol 2003, 10:677-687. PubMed Abstract | Publisher Full Text OpenURL

  48. Isporatov I, Krapivsky PL, Yuryev R: Duplication-divergence model of protein interaction network.

    Phys Rev E 2005, 71:061911. Publisher Full Text OpenURL

  49. Prachumwat A, Li WH: Protein function, connectivity, and duplicability in yeast.

    Mol Biol Evol 2006, 23:30-39. PubMed Abstract | Publisher Full Text OpenURL

  50. Liang H, Plazonic KR, Chen J, Li WH, Fernández A: Protein under-wrapping causes dosage sensitivity and decreases gene duplicability.

    PLoS Genet 2008, 4:e11. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  51. Fernández A, Scott R, Berry RS: The nonconserved wrapping of conserved protein folds reveals a trend toward increasing connectivity in proteomic networks.

    Proc Natl Acad Sci USA 2004, 101:2823-2827. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  52. He X, Zhang J: Why hubs tend to be essential in protein networks?

    PloS Genet 2006, 2:e88. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  53. He X, Zhang J: Higher duplicability of less important genes in yeast genomes.

    Mol Biol Evol 2006, 23:144-151. PubMed Abstract | Publisher Full Text OpenURL

  54. Marland E, Prachumwat A, Maltsev N, Gu Z, Li WH: Higher gene duplicabilities for metabolic proteins than for nonmetabolic proteins in yeast and E. Coli.

    J Mol Evol 2004, 59:806-814. PubMed Abstract | Publisher Full Text OpenURL

  55. Remedios CGD, Chhabra D, Kekic M, Dedova IV, Tsubakihara M, Berry DA, Nosworthy NJ: Actin binding proteins: regulation of cytoskeletal microfilaments.

    Physiol Rev 2003, 83:433-473. PubMed Abstract | Publisher Full Text OpenURL

  56. Goodson HV, Hwse WF: Molecular evolution of the actin family.

    J Cell Science 2002, 115:2619-2622. PubMed Abstract | Publisher Full Text OpenURL

  57. Pasternak ND, Dzikowski R: PfEMP1: An antigen that plays a key role in the pathogenicity and immune evasion of the malaria parasite Plasmodium falciparum.

    Int J Biochem Cell Biol 2009, 41:1463-1466. PubMed Abstract | Publisher Full Text OpenURL

  58. Chen BQ, Barragan A, Fernández V, Sundstrom A, Schlichtherle M, Sahlen A, Carlson J, Datta S, Wahlgren M: Identification of Plasmodium falciparum erythrocyte membrane protein 1 (PfEMP1) as the resetting ligand of the malaria parasite P. falciparum.

    J Exp Med 1998, 187:15-23. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  59. Pacifico S, Liu G, Guest S, Parrish JR, Fotouhi F, Finley RL Jr: A database and tool, IM Browser, for exploring and integrating emerging gene and protein interaction data for Drosophila.

    BMC Bioinformatics 2006, 7:195. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  60. Guimerá R, Amaral LAN: Functional cartography of complex metabolic networks.

    Nature 2005, 433:895-900. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  61. Vincent DB, Guillaume JL, Lambiotte R, Lefebvre : Fast unfolding of communities in large networks.

    J Stat Mech 2008, 10:P10008. OpenURL

  62. Saitou N, Nei M: The neighbor-joining method: a new method for reconstructing phylogenetic trees.

    Mol Biol Evol 1987, 4:406-425. PubMed Abstract | Publisher Full Text OpenURL

  63. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG: ClustalW and ClustalX version 2.

    Bioinformatics 2007, 23:2947-2948. PubMed Abstract | Publisher Full Text OpenURL

  64. Chen K, Durand D, Farach-Colton M: NOTUNG: a program for dating gene duplications and optimizing gene family trees.

    J Compt Biol 2000, 7:429-447. Publisher Full Text OpenURL

  65. Chen F, Mackey AJ, Stoeckert CJ Jr, Roos DS: OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups.

    Nucleic Acids Res 2006, 1:363-368. Publisher Full Text OpenURL

  66. Kriventseva EV, Rahman N, Espinosa O, Zdobnov EM: OrthoDB: the hierarchical catalog of eukaryotic orthologs.

    Nucleic Acids Res 2008, 36:271-275. Publisher Full Text OpenURL

  67. Watts DJ, Strogatz SH: Collective dynamics of 'small-world' networks.

    Nature 1998, 393:440-442. PubMed Abstract | Publisher Full Text OpenURL