Program for Evolutionary Dynamics, Harvard University, Cambridge, MA 02138, USA

Department of Mathematics, University of California, Berkeley, CA 94720, USA

Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas-UPV, 46022 València, Spain

Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI 48824, USA

Abstract

Background

Understanding interactions between mutations and how they affect fitness is a central problem in evolutionary biology that bears on such fundamental issues as the structure of fitness landscapes and the evolution of sex. To date, analyses of fitness landscapes have focused either on the overall directional curvature of the fitness landscape or on the distribution of pairwise interactions. In this paper, we propose and employ a new mathematical approach that allows a more complete description of multi-way interactions and provides new insights into the structure of fitness landscapes.

Results

We apply the mathematical theory of gene interactions developed by Beerenwinkel et al. to a fitness landscape for

Conclusion

A full description of complex fitness landscapes requires more information than the average curvature or the distribution of independent pairwise interactions. We have proposed a mathematical approach that, in principle, allows a complete description and, in practice, can suggest new insights into the structure of real fitness landscapes. Our analysis emphasizes the value of non-independent genotypes for these inferences.

Background

Understanding the structures of fitness landscapes is central to evolutionary biology. The image of populations evolving on fitness landscapes traces to Sewall Wright's seminal work in the thirties _{g}.

If all mutations were strictly additive or multiplicative in their effects on fitness, then it would be rather easy to describe the structure of fitness landscapes and understand the resulting dynamics of adaptation by natural selection. However, many mutations interact with one another in complex ways. For example, two or more mutations may interact such that their combined effect on fitness is much greater or much less than predicted from their individual effects; their combined effect may even be opposite in sign to the expectation based on their individual effects. These deviations from simple expectations are called epistasis

Over the last decade, several studies have sought to examine the form and prevalence of epistatic interactions by measuring the fitness effects of numerous mutations alone and in combination in viruses, bacteria, fungi, and animals

We organized the 37 genotypes in the symmetric 10-by-10 matrix with missing values as shown in Table

Genotype space.

All 37

Fitness data.

1.000

0.976

0.708

0.975

0.981

0.984

0.995

0.978

0.564

0.593

0.976

0.990

0.973

0.990

0.982

0.718

0.500

0.708

0.964

0.684

0.694

0.782

0.664

0.510

0.975

0.983

0.975

0.974

0.977

0.650

0.482

0.981

0.990

0.964

0.983

0.718

0.988

0.524

0.984

0.973

0.684

0.975

0.724

0.986

0.490

0.995

0.990

0.694

0.974

0.982

0.679

0.508

0.978

0.982

0.782

0.977

0.718

0.724

0.982

0.564

0.718

0.664

0.650

0.988

0.986

0.679

0.593

0.500

0.510

0.482

0.524

0.490

0.508

Each entry shows the fitness value of the corresponding genotype in Table 1. Fitness is reported as the median of ten replicate assays performed to estimate the growth-rate advantage of the respective mutant genotype relative to the wild-type strain [2]. Values above and below the diagonal are the same. Genotypes along the diagonal do not exist; other empty cells correspond to genotypes that are missing owing to the specific experimental design.

The experimental design is illustrated geometrically in Figure

Three-dimensional genotopes

**Three-dimensional genotopes**. The genotope of the complete bi-allelic three-locus system with eight genotypes is the regular cube, depicted in (a). The three-locus systems that arise from data structured as in Table 1 are displayed in (b), (c), and (d). The genotope (b) lacks the triple mutant, genotope (c) lacks the triple and one double mutant, and genotope (d) contains only the wild-type and the single mutants.

The goal of our analysis is to describe the geometry of the

Our analysis of this landscape is based on the approach developed by Beerenwinkel et al.

Results

Markov basis of the interaction space

Our first point is that the genotype space in Table

Experimental biologists (including two authors of this paper) are likely to raise three concerns about these non-standard tests. What biological insights can non-standard tests provide beyond those obtained using standard tests? Are these additional tests independent of the standard tests? What computational tools are available to perform such tests on other datasets?

As we show in the sections that follow, the non-standard tests are potentially useful in at least three respects. First, they allow one to focus attention on features of epistasis that are not quantifiable by the standard tests. For example, we perform non-standard tests of the "single-double" type to explore whether some mutations are better overall mixers than others. Second, non-standard tests span greater genetic distances than do pairwise tests, allowing more powerful analyses of the structure of fitness landscapes. For example, we use the "double-double" tests to test curvature at genetic distances of four, whereas standard tests allow curvature to be examined only at distances of two. Third, non-standard tests are an integral part of the complete geometric description of a fitness landscape. While this high-dimensional geometry is abstract and even foreign, we describe how biological features of gene interactions, such as mixing ability, are embedded in the geometric shape of the landscape.

The non-standard tests are not independent, in a statistical sense, of the pairwise tests or of one another because all the tests are calculated from the same underlying data. Nonetheless, this complication can be addressed by employing appropriate statistical methods (Tukey's jackknife, Bonferroni correction, etc.) to ensure that significance levels reflect the data structure. Regarding the availability of computational tools, we provide references to programs that automate the calculations of the Markov basis and perform the triangulations necessary to describe the geometry of landscapes, and these tools can be applied to other datasets. In supplementary material, we illustrate the use of these computational tools and show their output (see Additional files

Instructions for calculating the Markov basis and triangulation of a dataset with Macaulay 2.

Click here for file

Macaulay 2 file for computing Markov bases of interaction spaces and triangulations of genotopes. (This program requires Macaulay 2 version 0.0.95 to be installed. See additional file 1 for instructions.)

Click here for file

Examples of fitness landscape files to be processed with the program fitness.m2 (additional file 2).

Click here for file

Minimal Markov basis of the interaction space.

Click here for file

Geometry of the fitness landscape.

Click here for file

Comparing standard and non-standard epistasis terms overall

An important feature of the standard tests is that the sign of epistasis, either positive or negative, is always expressed in reference to the same wild-type strain. The key result reported by Elena and Lenski, based on the standard tests, was that there were many significant epistatic deviations in both directions, in contrast to one hypothesis that predicted negative epistasis should be the general rule

Figure

Standard and non-standard gene interactions

**Standard and non-standard gene interactions**. Displayed are density estimates of gene interactions as measured by the 27 standard tests (solid curve), for example

To test this prediction for the fitness peak at the wild-type, we calculated for each standard test of epistasis (such as

Figure ^{-11}), if each of the deviations is viewed as independent. However, these values all rest on 37 genotypes, whose fitness values were estimated with error (albeit with replication), and hence the errors are not independent for those epistatic terms that share a genotype. To take this complication into account, we performed Tukey's jackknife test _{s }= 3.761, 26 d.f., one-tailed _{s }= 3.672, 26 d.f., one-tailed

Epistasis correlates with relative fitness loss

**Epistasis correlates with relative fitness loss**. For each standard test (filled black circles), for example

If we want to make the same type of inference about the complete genotype space, rather than the specific subset sampled by Elena and Lenski, we can apply a similar, but not identical, test. More precisely, we want to investigate the correlation between average fitness decrease and epistasis among any single and double mutants of the _{s }= 1.363, 8 d.f., one-tailed _{s }= 1.780, 8 d.f., one-tailed

We present these alternative analyses to emphasize the subtly different hypotheses that can be addressed by using our mathematical approach. Comparing the last two analyses suggests that individual mutations might have pervasive effects on the shape of the local landscape. While pervasive effects of certain mutations can make it more difficult to test broader generalizations, the precise nature of these pervasive effects is of biological interest. In the next section, we follow the same general approach, but focusing on a different set of epistatic interactions, to examine differences between individual mutations in greater detail.

Some non-standard tests reveal differences in mixing ability

In this section, we use non-standard tests of the "single-double" type to explore a particular aspect of the fitness landscape, specifically whether certain mutations are better mixers than others. The mixing ability of any particular mutation indicates whether its epistatic interactions with other mutations tend to be positive or negative. We can then measure the relative mixing ability of two mutations by holding constant the identity of other mutations with which the two of interest are mixed. Consider the polynomial

Any pair of mutations belonging to the same set of three ({

In our analysis, we focus on the nine comparisons of mixing ability that each involves six tester mutations, because these provide more statistical power that might reveal differences between focal mutations. Figure

Mixing ability of mutations

**Mixing ability of mutations**. For each focal pair of mutations (

Geometry of the fitness landscape

Our final set of results is concerned with the geometric shape of the fitness landscape. Since fitness landscapes are high-dimensional and complicated objects, it is desirable to classify them into a finite set of distinct shapes; with the general idea that fitness landscapes with the same shape are likely to share biological properties. This approach generalizes the classification of bi-allelic two-locus landscapes into those with positive epistasis versus those with negative epistasis. This appealing binary classification has been linked, for example, to the advantage of sex, but it does not extend to higher-dimensional genotype spaces. We present here a notion of the shape of a fitness landscapes for any genotypic space. This concept is intimately related to the interaction tests discussed so far, because the shape is determined by a certain subset of the gene interactions that includes the Markov basis. Thus, the proposed classification of landscapes into shapes can be regarded as a formal summary of all the various standard and non-standard tests.

The fitness landscape studied in this paper consists of the 37

Consider the bi-allelic two-locus system with genotypes

For larger genetic systems, the role of the triangles is played by simplices. The shape of the present fitness landscape on 37 genotypes is a triangulation of the genotope into 362 nine-dimensional simplices (see Additional file

The three-locus subsystems that occur in this dataset are represented in Figure

The 16 shapes of fitness landscapes of genotope (b)

**The 16 shapes of fitness landscapes of genotope (b)**. In the graph, each vertex represents one of the 16 possible geometric shapes of a fitness landscape on the truncated three-locus system that corresponds to genotope (b) in Figure 1. The shapes are determined by a collection of different tests for gene interactions. Two shapes are connected by an edge if they differ only by the sign of a single test. The labelling of the vertices follows [1, Table 5.1].

The analogous graph of possible shapes for the three-locus subsystem corresponding to the triangular prism (Figure

The shape of fitness landscapes on type (c) spaces, such as {

Discussion and Conclusion

Epistasis occurs whenever mutations interact non-linearly with one another, and it represents a major challenge in describing the mathematical structure of real fitness landscapes. With epistatic interactions, the combined effect of two or more mutations on fitness may be greater than, less than, or opposite in sign to expectations obtained by combining their separate effects. A growing body of empirical research indicates that epistasis is very common in nature

To date, two different aspects of epistasis have served as summary statistics of these interactions. First, studies have used the overall directional curvature of mean fitness as a function of the number of random mutations introduced into the genome of some wild-type organism

The objective of this paper is to introduce biologists to a new mathematical framework for characterizing epistatic interactions between mutations, which goes beyond both overall directional curvature and pairwise interactions by providing a complete geometrical description of the epistatic interactions that define an empirically determined fitness landscape. To that end, we have re-analyzed the dataset from the pairwise experimental design performed by Elena and Lenski

The fitness landscape that we analyzed comprises 37 genotypes of

Third, we show that individual mutations contribute in different ways to the complex admixture of epistatic interactions. In particular, we found that some mutations are better mixers than other mutations (Figure

In closing, we would like to raise an issue related to experimental design, one that requires attention when planning studies that might employ these new approaches to testing epistatic interactions and describing fitness landscapes. In their paper, Elena and Lenski

Methods

Bacterial experiments

Our analyses use the data obtained from the second of two experiments reported by Elena and Lenski

Mathematical framework

Our analysis is based on the mathematical framework presented by Beerenwinkel et al. ^{9 }of size 37 consisting of the wild-type strain _{G }⊂ **R**^{37 }the set of all populations (genotype frequencies) on _{G }⊂ **R**^{9 }is the set of all allele frequencies that can be realized by populations on _{g }≥ 0 and _{G }.

There is a natural mapping ρ : Δ_{G }→ Π_{G }that assigns to each population its allele frequency spectrum. The kernel of this map defines the interaction space (see Section 3 in

The specific experimental design that was used to generate the present fitness dataset allows us to represent the genotypes in the matrix displayed in Table

A simplex is an **R**^{n }_{G }(see Section 4 in

Computational methods

The Markov basis can always be derived from the genotype space (the set of measured genotypes) by algebraic computations. However, there is no simple recipe for writing out the Markov basis. We computed the Markov basis independently with the computer algebra systems Macaulay 2

Authors' contributions

N.B., L.P., and B.S. are responsible for the development of the mathematical theory. S.F.E. and R.E.L. designed and performed the experiments with bacteria. N.B. and R.E.L. formulated the specific analyses reported here and wrote the paper. All the authors contributed to editing the paper and approve of its final form.

Acknowledgements

This work was supported by the "FunBio" grant from DARPA to Simon Levin (Princeton University). We thank Simon Levin and Ben Mann (DARPA) for facilitating this math-bio collaboration, Peter Bates and Charles Ofria for helpful discussions, Peter Malkin for an improved program for computing circuits, Mike Stillman for writing and improving the Macaulay 2 code, and three anonymous reviewers for suggestions. Collection of the dataset used in this study was funded by a fellowship from the Spanish MEC to S.F.E. and a grant from the NSF to R.E.L. N.B. was funded by a grant from the Bill & Melinda Gates foundation through the Grand Challenges in Global Health Initiative.