Department of Biology, University of Pennsylvania, Philadelphia, PA, USA

CoMPLEX, University College London, London, UK

Department of Mathematics, University College London, London, UK

Department of Genetics, Evolution and Environment, University College London, London, UK

Abstract

Background

Changes in gene regulatory networks drive the evolution of phenotypic diversity both within and between species. Rewiring of transcriptional networks is achieved either by changes to transcription factor binding sites or by changes to the physical interactions among transcription factor proteins. It has been suggested that the evolution of cooperative binding among factors can facilitate the adaptive rewiring of a regulatory network.

Results

We use a population-genetic model to explore when cooperative binding of transcription factors is favored by evolution, and what effects cooperativity then has on the adaptive re-writing of regulatory networks. We consider a pair of transcription factors that regulate multiple targets and overlap in the sets of target genes they regulate. We show that, under stabilising selection, cooperative binding between the transcription factors is favoured provided the amount of overlap between their target genes exceeds a threshold. The value of this threshold depends on several population-genetic factors: strength of selection on binding sites, cost of pleiotropy associated with protein-protein interactions, rates of mutation and population size. Once it is established, we find that cooperative binding of transcription factors significantly accelerates the adaptive rewiring of transcriptional networks under positive selection. We compare our qualitative predictions to systematic data on

Conclusions

Our study reveals a rich set of evolutionary dynamics driven by a tradeoff between the beneficial effects of cooperative binding at targets shared by a pair of factors, and the detrimental effects of cooperative binding for non-shared targets. We find that cooperative regulation will evolve when transcription factors share a sufficient proportion of their target genes. These findings help to explain empirical pattens in datasets of transcription factors in

Background

It is often difficult for a population to acquire an adaptive phenotype that requires simultaneous changes in the co-expression of multiple genes

Given the potential adaptive benefit of cooperative regulation, it makes sense to ask, when will cooperative binding between a pair of transcription factors be able to invade a population that lacks such cooperativity? To answer this we must understand the following tradeoff: although cooperative binding between a pair of factors may result in improved regulation at the target genes shared by both factors, any mutation that results in a physical interaction between the transcription factors will effect

Schematic of the population-genetic model

**Schematic of the population-genetic model.** A schematic cartoon of our population-genetic model. (top) When cooperativity is absent different transcription factors (gray and red) must bind to sites at each of their targets independently. Each factor has a number of targets, _{1}and _{2}, and a number

A number of previous studies have explored the mechanistic details of cooperative transcription factor binding at a given target gene

We use a mathematical model to study the conditions under which cooperative binding between pairs of transcription factors is favoured. We first determine the evolutionary conditions that favour cooperative binding under stabilising selection, in terms of the basic evolutionary parameters of the population: the strength of selection on binding sites, the rate of mutation, and the population size. We then study the influence of cooperative regulation on the capacity for a transcriptional circuit to adapt under positive selection. We calculate the time required for a target gene to gain a new, adaptive transcription factor binding site, in the presence or absence of cooperative interactions among its regulators. We confirm our analytical results on the evolution of cooperative regulation by comparison to Monte-Carlo simulations of the Wright-Fisher process associated with our system, and we compare our qualitative conclusions to systematic empirical data.

Our population-genetic model describes a pair of transcription factors, each with its own set of target genes, with some degree of overlap between these sets (Figure

Results and discussion

Stabilising selection without cooperative binding

We consider a pair of transcription factors, labelled 1 and 2, that have _{1} and _{2} targets, respectively. A fraction _{1} + _{2}), as illustrated in Figure
_{
l
}, and back mutations, which result in a functional binding site being gained at a target, occur at rate _{
g
}. An individual incurs a fitness penalty _{1} + _{2} of its required binding sites is _{
i
}=(1−^{
i
}. The fitness landscape associated with our model thus has a single peak at

We consider a population of _{
i
}. In an infinitely large population, the evolution of hamming class

where
_{
ij
} is the probability a genotype lacking _{
l
}=_{
g
}), the solution to Equation 1 is a binomial distribution. In the more general case of a finite population, with _{
l
}≠_{
g
}, we find that the equilibrium continues to be well approximated by a binomial distribution, with mean (_{1} + _{2})_{
s
}. The term _{
s
} is the probability that a binding site will be non-functional in a randomly chosen individual at equilibrium. The probability _{
s
} depends on the strength of selection against non-functional binding sites, _{
l
}and _{
g
} (see Methods and

The equilibrium distribution above describes how stabilizing selection determines the frequencies of functional binding sites in a population. The associated mean fitness for a pair of transcription factors that do not bind cooperatively is
_{
s
}
_{
l
},_{
g
} ≪ _{
s
} can be approximated by

and otherwise by

(see Methods). These equations have an intuitive interpretation: When 2_{0}=_{
l
}/(_{
l
} + _{
g
}), and the second term describes the effect of selection. In the limit _{
s
} equals _{
l
}/_{0}.

Stabilising selection with cooperative binding

Here we modify our model to account for cooperative regulation by a pair of factors. This allows us to ask when cooperative regulation is favored by evolution. A mutation that results in cooperative binding between a pair of transcription factors has two effects on the fitness of a transcriptional circuit. For a target that is regulated by both transcription factors, we assume that cooperative binding mitigates the effects of deleterious mutations at transcription factor binding sites
_{1} + _{2}) shared targets, so that (1−_{1} + _{2}) targets that are regulated by only one or the other of the transcription factors. We assume that the cooperative binding of the transcription factors causes pleiotropic mis-regulation at these targets (since the other transcription factor, which does not have a binding site at such sites, now binds to the first transcription factor through a physical interaction). This results in a fitness penalty _{1} + _{2}) targets that are not co-regulated. Fitness is again assumed to be multiplicative, so that the cost of pleiotropy associated with cooperative binding is

Provided _{
l
},_{
g
} ≪ 1, genes that are co-regulated and genes that are not co-regulated have equilibrium distributions described by independent binomial distributions with means _{
hs
}and _{
s
}respectively, which are approximated by Equation 2 (substituting _{
s
}, a mutation that results in cooperative binding can invade a population at equilibrium.

Similarly, a mutation that results in the loss of cooperative binding in a population where it is present will be favoured when
_{
hs
}, a mutation that results in loss of cooperative binding can invade a population at equilibrium.

Since the first expression in Equation 2 is monotonically decreasing in _{
hs
}≤_{
s
}, i.e populations that have cooperative binding accumulate more deleterious mutations, that result in weaker transcription factor binding sites, than populations that lack it. As a result there is a range of

Using the expression for _{
s
} given in Equation 2, and recalling that _{0} = _{
l
}/(_{
l
}+_{
g
}) is the neutral equilibrium in a system dominated by drift, the threshold value of

Similarly, the threshold value of

These equations allow us to make a number of observations about the evolution of cooperative gene regulation (Figure

Evolutionary parameters that permit cooperative regulation

**Evolutionary parameters that permit cooperative regulation.** Evolutionary parameters that permit the evolution of gene regulation by cooperative transcription factors. Threshold number of shared targets for gain (black) and loss (red) of cooperative binding to be advantageous in a population at equilibrium under stabilising selection. The black line shows the value of ^{5}replicate Monte-Carlo simulations. Parameter values (unless stated otherwise) are _{l}=2×10^{−7}, _{g}=10^{−7}, _{1} + _{2}=100, ^{−3}, ^{−1}, ^{−4}and ^{4}.

Similarly, from Equation 4 for a population with cooperative binding, we see that when

Adaptation of transcriptional circuits under positive selection

When cooperative binding is present, under stabilising selection, transcription factor binding sites at co-regulated genes are better able to tolerate mutations (i.e _{
hs
}> _{
s
}). Under positive selection for a novel expression phenotype, this may speed adaptation, since greater mutational robustness generates greater genetic diversity and can help speed adaptation (Figure

A schematic cartoon of rewiring

**A schematic cartoon of rewiring.** A schematic cartoon of rewiring with (left) and without (right) cooperative binding. Selection favours a change in the regulation of target genes from the red TF to the green TF. Rewiring requires an initially deleterious mutation at the red binding site before a green binding site can be acquired. The fitness of the different states is shown on the left hand side for each case. The reduced fitness of the intermediate state is less when cooperative binding is present than when it is absent.

We study adaptive change that involves replacement of an existing transcription factor by a new one that confers higher fitness. We assume that the target gene must first suffer an initially deleterious mutation at its existing binding site before a newly adaptive binding site can be acquired (Figure
_{
r
}. The expected waiting time for such a gene to produce a newly adaptive binding site therefore depends on the number of binding sites in the population that harbor a deleterious mutation, which is proportional to _{
s
} when cooperativity is absent and _{
hs
} when it is present. Since _{
hs
}> _{
s
}, this number is greater when cooperative binding is present than when it is absent.

The ratio of waiting times before a newly adaptive binding site arises,
_{
r
} (for populations without,
_{
r
}, cooperative binding), quantifies the degree to which cooperative binding of transcription factors accelerates adaptation under positive selection. This ratio is given by _{
hs
}/_{
s
} (Figure

Cooperative binding accelerates adaptation

**Cooperative binding accelerates adaptation.** Cooperative binding accelerates adaptation under positive selection. The ratio of waiting times before the arrival of novel adaptive binding sites for populations without (
_{r}) cooperative binding. Provided ^{5}replicate Monte-Carlo simulations. Parameter values _{l}=2×10^{−7}, _{g}=10^{−7}, _{1} + _{2}=100, ^{−1}, ^{−4}, ^{4}, _{r}=10^{−7}.

Cooperative binding and the fraction of shared targets in yeast

Our model predicts that, under stabilising selection, cooperative binding will be favoured when the fraction of targets shared by a pair of transcription factors exceeds a certain threshold. In order to test this prediction, and to get some idea of the degree of overlap that is required for cooperative binding to arise in natural systems, we inspected pairs of transcription factors in ^{−16}, Wilcoxon test). This supports the prediction of our population-genetic analysis, and it suggests that a sizeable overlap in targets is required before cooperative binding becomes advantageous.

Number of shared targets

**Number of shared targets.** Fraction of targets that are shared between pairs transcription factors in ^{−16}, Wilcoxon test).

Cooperative binding in the yeast sex determination network

The ability of cooperative transcription factors to facilitate adaptation also has empirical support, from observations in the sex determination networks of different yeast species

Conclusions

We have shown that cooperative binding between a pair of transcription factors is favoured under stabilising selection, provided the overlap between their targets is sufficiently large. The threshold fraction of shared targets depends upon the strength of selection on binding sites, the cost of pleiotropy associated with protein-protein interactions, and the rates of mutations. It also depends on the population size. Just as in models that consider the evolution of redundancy

This study shows that, even when the deleterious effects of pleiotropy are taken into account, mutations that change transcription factor function can play an important role in the evolution of gene expression. Taking account of mutations both to regulatory binding sites and to the transcription factors themselves reveals a rich set of evolutionary dynamics that helps explain how complex transcriptional networks can rapidly rewire large sets of genes in order to adapt to new environments.

Methods

Equilibrium distribution

To find the equilibrium relative abundances of the hamming classes _{
i
}that give the solution to Equation 1, we follow
_{
i
}
_{
i
}=1 it is easy to show that

To compute _{
s
} we follow
_{
V
}(_{
V
}(_{
l
} and _{
g
}, this is given by
_{
V
}(_{
i
}
_{
i
}
^{
i
}, where _{
i
} results in the infinite population equilibrium distribution:

When cooperative binding is present a subset _{1} + _{2})=_{
hs
}of the target genes have selective coefficient _{1} + _{2})=_{
s
}have selective coefficient _{
ij
} refers to an individual with

and at equilibrium
_{
ij
}is just the product of the two independent fitness landscapes associated with the different selective coefficients, i.e _{
ij
}=(1−^{
i
}(1−^{
j
} using our assumed form of _{
ij
} results in values of _{
s
}and _{
hs
} as given by Equation 5 for the independent distributions with the appropriate selective coefficients.

The finite _{
l
}
_{
g
}
^{−1} ≪ 1. This gives

Assuming _{
l
},_{
g
} ≪ _{
s
} to first order, in terms of 1/(2

Using the above distributions, the equilibrium mean fitness _{
ind
}, in the absence of cooperative binding is
_{
coop
}is
_{
coop
} > _{
ind
}. When

According to this inequality, cooperative binding is advantageous only when the fraction of targets shared by the pair of transcription factors is greater than a threshold. Since by definition

Rewiring time

For a given binding site the waiting time

where _{
s
}, therefore _{
s
}, and we are able to write

where _{
r
} gives the rate at which rewiring mutations occur at sites that have already undergone an initially deleterious mutation. The excepted waiting time for a single gene is thus

If the gene to be rewired is coregulated by a pair of transcription factors that bind cooperatively, we similarly have

and the ratio of waiting times _{
s
}/_{
hs
} is therefore simply _{
hs
}/_{
s
}. Finally, if _{
s
}(

Therefore the ratio of expected waiting times with and without cooperative binding is independent of the number of genes to be rewired, and depends only on the ratio _{
hs
}/_{
s
}.

Variation in selection strength across sites

Up to this point we have assumed that the selective coefficients, _{
i
} ≪ 1, such that the fitness landscape is approximately additive.

We assume that there are a finite set of selective coefficients, ^{
α
} and ^{
γ
}, where the super-scripts ^{
α
} or ^{
γ
}are distributed according to some function _{
i
} associated with it, drawn according to _{
i
} is drawn independently from the distribution _{
α
} is given by _{
α
}=^{
α
})(_{1} + _{2}). The number of mutations in each subclass is then given by a binomial distribution.

When the fitness landscape is close to additive, the method of

The expected number of mutations in each subclass is simply

From these the invasion probabilities and threshold values of _{
s
}
_{
hs
}

Authors’ contributions

AJS, AP, RMS, and JBP designed the research. AJS and RMS performed the analysis. AJS, AP, and JBP wrote the paper. All authors read and approved the final manuscript.

Acknowledgements

JBP and AJS acknowledge funding from the Burroughs Wellcome Fund, the David and Lucile Packard Foundation, the James S. McDonnell Foundation, the Alfred P. Sloan Foundation, grant #D12AP00025 from the U.S. Department of the Interior and Defense Advanced Research Projects Agency, and grant RFP-12-16 from the Foundational Questions in Evolutionary Biology Fund. AP acknowledges grants from the Natural Environment Research Council (NE/G00563X/1) and the Engineering and Physical Sciences Research Council (EP/F500351/1, EP/I017909/1). AJS also acknowledges an EPSRC PhD Plus fellowship.