Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA, 02138

Department of Mathematics, Harvard University, Cambridge, MA, USA, 02138

Abstract

Background

At a time when genomes are being sequenced by the hundreds, much attention has shifted from identifying genes and phenotypes to understanding the networks of interactions among genes. We developed a gene network developmental model expanding on previous models of transcription regulatory networks. In our model, each network is described by a matrix representing the interactions between transcription factors, and a vector of continuous values representing the transcription factor expression in an individual.

Results

In this work we used the gene network model to look at the impact of mating as well as insertions and deletions of genes in the evolution of complexity of these networks. We found that the natural process of diploid mating increases the likelihood of maintaining complexity, especially in higher order networks (more than 10 genes). We also show that gene insertion is a very efficient way to add more genes to a network as it provides a much higher chance of developmental stability.

Conclusions

The continuous model affords a more complete view of the evolution of interacting genes. The notion of a continuous output vector also incorporates the reality of gene networks and graded concentrations of gene products.

Background

In the approximately ten years since the completion of the draft sequence of the human genome, researchers have become increasingly attuned to the many layers of complexity that underlie the mechanisms of life

Understanding the organization and evolution of these networks has been a challenge because of their complexity. Experimental studies have been able to identify important roles of interacting regulatory networks, such as the ability of yeast to respond to environmental changes

A mathematical model of mutually interacting transcription factors was first developed by Wagner_{ij }_{ij }_{t }_{t+τ }_{t}_{t}_{i}_{i }_{t }_{t+τ }^{= }_{t}

The gene expression patterns generated by this model were able to mimic certain aspects of

This work was followed by additional studies focusing on such issues as network robustness to mutation

Sexual and asexual reproduction as well as the coevolution of reproductive method and genetic architecture were analyzed by Azevedo

The network structure of the Wagner model has been studied in many ways

Other types of network models have also been developed. A system of coupled ordinary differential equations derived from the principle of chemical kinetics was used to describe genes and the concentration of their products

A continuous network model (that is, one in which the elements _{ij }_{j }

The discrete version of the Siegal and Bergman_{t+1 }= _{t}

**Mathematical Background**. Formal mathematical background to the computations that are described in the body of this article and to compare the discrete model (step function) to the continuous model (ramp function).

Click here for file

In this work, we describe another approach to a network model with continuous output generalizing that in Wagner (1996). The continuous variation in the abundance of the gene products creates additional complexity that allows a more complete description of the evolution of these networks. The model is intuitively appealing because different concentrations of transcription factors should affect gene expression quantitatively, resulting in different levels of activation and repression. The output vector would therefore be expected to be composed of elements that are continuous rather than discrete. We build on Wagner's model

Methods

Our model is similar to that of Siegal and Bergman_{ij }_{0 }_{t+1 }= _{t}

The concept of stability is based on the development of the individual represented at time _{t}_{t}_{t+1}_{t+1 }

The multiplication of each row

By multiplying

where the function _{t+1 }_{t }

where _{max }^{-4}. This model yields viable individuals in which the levels of the transcription factors are continuous, and is a straightforward extension of Wagner's model of transcription regulatory networks

If one were to define _{t+1 }merely as _{t+1 }= _{t}_{t }

which is a sigmoidal function centered at

We analyzed different values of

The continuous model gives us a more complete view of the evolution of interacting genes. It allows the addition of more genes to the networks and is more efficient in maintaining stability. The notion of a continuous output vector also creates a closer relationship with the reality of gene networks and gene products, where it is not sufficient to ascertain merely whether a gene is on or off. Gene product concentrations play an important role in determining the viability of individuals, and aids in the evolution and maintenance of complexity.

The different mechanisms to generate network complexity tested had a strong impact in the probability of yielding a viable individual, even in networks with many genes and especially for diploid mating. By "diploid mating", we mean that each element of the matrix W of the progeny network equals the arithmetic average of the corresponding elements in the parental matrices. The probability that a random network of 15 genes yields a viable individual is far smaller than that obtained by diploid mating between two viable networks suggests that sexual reproduction may be a key component in the evolution of complexity. The phenomena of insertion and deletion probably also play an important role in the evolution of complexity, given the high probability of a viable individual to remain viable after undergoing an insertion or a deletion.

The continuous model also gives insight into the mechanisms that regulate the evolution of complexity in a general setting that represents the concentration of the products of gene networks. The inclusion of gene product concentration brings the model closer to actual transcriptional networks, and perhaps gives us a better idea of the difficulty of obtaining complex networks that yield viable individuals in the real world.

Results and Discussion

Intraclass Correlation

The final stable output vector describes the gene expression levels of a viable individual. The distribution of output values on (-1, +1) across individuals was indistinguishable from a uniform distribution, as might be expected. We also tested whether the output vectors were correlated across individuals. To test for correlation, we examined the intra-class correlation coefficient (ICC) of the elements of

The ICC tests whether the final output vectors of the ^{2 }is the variance of the elements among the

From our data for both the discrete and continuous models, for any number of genes, we found the ICC value very close to zero. Even with a small number of genes where there is little room for variation, the ICC was still extremely low, as shown in Figure

The intraclass correlation coefficient (ICC) indicates how closely gene products levels are clustered in viable individuals

**The intraclass correlation coefficient (ICC) indicates how closely gene products levels are clustered in viable individuals**. An ICC of 0 means uncorrelated.

Generating Viable Individuals

Individuals were generated at random by drawing networks and initial vectors of gene expression levels from a uniform distribution on [-1, +1]. One possible interpretation of the initial vector is that it is the level of gene products passed by maternal inheritance to the zygote, where development of the embryo would begin.

We generated 12 populations of 1000 viable individuals each (one population for each network from 3-15 genes). Figure

Likelihood of finding a viable individual with a random input vector in the discrete and continuous models

**Likelihood of finding a viable individual with a random input vector in the discrete and continuous models**. The Discrete Output Vector model (DOV) is an adaptation of Wagner's original model with continuous values in the network and discrete values in the output vector, resulting from a choice of

The model depends on a set of initial conditions to start the developmental stage of the simulation using a randomly generated initial state vector. It is therefore unclear whether the viability of these individuals is determined by the choice of the initial output vector or by the wiring of the gene network.

To test the impact of the choice of an initial output vector we selected viable individuals and replaced their networks with randomly generated ones, however retaining the original initial state vector. We repeated this process 1000 times for each individual, generating a different network each time, while tallying the number of random

As shown in Figure

Frequency of viable individuals generated from a viable network and from a viable initial state vector

**Frequency of viable individuals generated from a viable network and from a viable initial state vector**.

Evolution of Complexity

Given the very low likelihood that a random

Mating was performed by defining a population of 1000 viable networks and mating two randomly drawn networks at a time. The "offspring" network was then tested for stability by iterating random initial vectors according to Equation 1. This process was repeated 1000 times to generate 1000 new progeny networks.

Haploid Mating

In the process of "haploid mating", a given gene is inherited at random from the network of either parent with equal probability. Accordingly, in the haploid mating process, we randomly selected individual rows from within the paternal or maternal network and copied them to create an offspring network. This process passes on parental genes without modification from one generation to the next. Repeating the selection process for each row yields a new offspring network with a random set of both parents' genes.

The initial state vector of the new offspring is chosen at random to equal a stable state of one of the parents. This procedure reflects the assumption that one of the parents would be passing on the general stable gene-product concentrations to its offspring, analogous to the interaction between an oocyte and its mother during the earliest stages of development.

When applied to a population of 1000 viable networks (see Figure

Haploid and diploid mating stability

**Haploid and diploid mating stability**. When compared with the DOV model, diploid mating displays a very different behavior. As with the haploid model, the discrete model excels in smaller networks, with rates as high as 60%, but falls sharply and starts oscillating between 10-40% in individuals with more than 10 genes. The continuous model shows the opposite pattern starting at values between 20-40% for small networks and consistently increasing as the number of genes grows, achieving values of 43-47% for the rate of viability with networks greater than 10 interacting genes. This result suggests that diploid mating has a greater impact on viability in the continuous model.

It is interesting to draw a parallel between haploid mating in the discrete model and the continuous one. Haploid mating displays the same behavior in both models, with high efficiency in generating viable networks with a small number of interacting genes, but then efficiency falls off sharply as the number of genes increase. In the case of the discrete model, the efficiency drops to almost zero with 8 or more genes. In contrast, the continuous model maintains a more consistently slower drop with increasing number of genes, without ever reaching 0 even for networks of size 15.

Diploid Mating

Diploid individuals benefit from heterozygosity to modulate the effects of damage or deleterious mutations as well as from increasing diversity through the recombination events between the parental chromosomes. In the process of "diploid mating," each row in the

When applied to a set of 1000 viable networks, the diploid mating model generated viable progeny networks of up to 10 interacting genes in 19-32% of the iterations (Figure

A randomly generated network with 15 interacting genes has an 8.9% chance of being viable. When two viable individuals mate following the haploid-mating model, the likelihood of generating a viable network jumps to 22%, however diploid mating increases the likelihood to 47%. This increase may be due to the fact that these original two networks were already selected from a small pool of viable networks with 15 genes, and diploid mating maintains network stability better than haploid mating. We conclude that, while for any level of complexity (number of genes in the network) it is difficult to generate viable complex individuals at random, mating is relatively efficient in producing viable networks of the same level of complexity as those in the parents.

Random Insertion

The difficulty in finding a viable network with more than 10 interacting genes prompted the question of whether increasing the number of genes of a viable network is more successful than generating a viable network at random. To answer this question we randomly inserted a gene into a viable network and developed stable state vectors to test whether stability was retained.

A gene insertion represents the phenomenon of a new gene being fully incorporated by the genome and interacting with the other genes in the network. In the inserted gene all interaction values are chosen at random from the uniform distribution [-1,1], and all pre-existing genes receive new randomly generated values for interaction with the newly inserted gene. The stable vector also receives a new randomly generated value, representing the initial concentration of the product of the new gene. The result is a new individual with an extra transcription factor that may or may not be viable when developed with the augmented network.

From a population of 1000 viable networks we selected each in turn and tried 100 different random insertions and tested for stability. Each insertion adds a new gene at a random place in the network. The graph in Figure

Stability of new individuals generated through an insertion or a deletion

**Stability of new individuals generated through an insertion or a deletion**. The y-axis represents how many viable individuals a viable network with as many genes as represented by the x-axis generates after undergoing an insertion or a deletion.

Figure

As a special scenario of insertions, gene duplication consists of a full copy of a randomly selected row/column pair inserted into the individual's network resulting in an individual with an extra copy of a gene

**As a special scenario of insertions, gene duplication consists of a full copy of a randomly selected row/column pair inserted into the individual's network resulting in an individual with an extra copy of a gene**. Gene duplication generates viable individuals around 50% of the time independent of the number of genes in the network. This independence of the number of genes in the network is a unique feature of gene duplication that allows creation of complex viable networks.

Random Deletion

Similarly to the test with random insertions, the likelihood of obtaining a viable network after removing a gene was tested by deleting one gene at random from a viable network and developing viable individual state vectors to asses if it would remain viable. We performed 100 random deletions in each of the 1000 previously generated viable networks. A gene deletion comprises a row and column deletion in the network, plus an entry deletion for the corresponding gene product in the initial output vector.

For networks with few interacting genes, loss of a gene is critical, with very few networks remaining viable after a deletion. This result is compatible with the difficulty in finding viable networks when there are few interacting genes. With more complex networks the numbers are still high, for example, 67.9% for networks with originally 10 interacting genes, which is significantly greater than the 14% rate for randomly generated networks with 9 interacting genes. Deletion maintains 66.6% of the viable networks with 15 interacting genes.

Conclusion

We presented an alternative model to describe the development and evolution of gene transcription factors that allows for a continuous distribution of expression levels. This version of the model allows the study of more complex network (both in number of genes and degree of connectivity) given the additional classes of networks that yield viable individuals. The continuous model, however, makes it more difficult to define the concept of "neighboring networks" but this may be addressed by defining a threshold below which differences between networks define them as neighbors. Another limitation to our model is computing time, as the matrix multiplication in development and the tests for viability are more time consuming than in the discrete model.

Authors' contributions

MC participated in the conception and design of the study and developed the computer simulations to perform the statistical analyses. CHT participated in the design and coordination and helped to draft the manuscript. DLH conceived the study and participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript.

Acknowledgements

We are grateful to Trevor Bedford and Bernardo Lemos for their helpful comments on the continuous output model. This work was supported by NIH grant GM07953 to DLH.