How did the complex metabolic systems we observe today evolve through adaptive evolution? The fitness landscape is the theoretical framework to answer this question. Since experimental data on natural fitness landscapes is scarce, computational models are a valuable tool to predict landscape topologies and evolutionary trajectories. Careful assumptions about the genetic and phenotypic features of the system under study can simplify the design of such models significantly. The analysis of C4 photosynthesis evolution provides an example for accurate predictions based on the phenotypic fitness landscape of a complex metabolic trait. The C4 pathway evolved multiple times from the ancestral C3 pathway and models predict a smooth ‘Mount Fuji’ landscape accordingly. The modelled phenotypic landscape implies evolutionary trajectories that agree with data on modern intermediate species, indicating that evolution can be predicted based on the phenotypic fitness landscape. Future directions will have to include structural changes of metabolic fitness landscape structure with changing environments. This will not only answer important evolutionary questions about reversibility of metabolic traits, but also suggest strategies to increase crop yields by engineering the C4 pathway into C3 plants.
Evolution of complex traits and underlying fitness landscapes
The metabolism of multicellular organisms exhibits an outstanding complexity in both the number and the spatial arrangement of interacting components. Since many of those components appear to be essential for the adaptive function of the system as a whole, the question arises of how these complex systems were formed through adaptive evolution. The theory of fitness landscapes provides a framework for answering such questions.
Genetic fitness landscapes map genetic states to corresponding fitness values, whereas phenotypic landscapes map a phenotypic domain to fitness  (Figure 1). In the theory of genetic fitness landscapes, the equivalent of the above problem would be the search for accessible evolutionary trajectories through the landscape that connect an ancestral state with a derived state. Whether such a trajectory exists will depend on the availability of beneficial mutations and the existence of epistatic interactions between them . The resulting topology of the underlying fitness landscape thus determines the accessibility of evolutionary paths. Smooth, single-peaked, ‘Mount Fuji’-type landscapes allow adaptive changes from each state in the landscape, whereas rugged topographies imply the existence of local optima that render additional changes deleterious and result in ‘dead end’-trajectories . A rugged topology in a genetic fitness landscape implies the existence of reciprocal sign epistasis between genetic loci; simultaneous genetic changes result in a fitness change that has the opposite sign of that of the individual changes .
Concepts of fitness landscapes
Current techniques for experimental exploration of fitness landscapes are extremely limited by the combinatorial explosion of possible states. The most common approach is the selection of a small set of mutations that are either known to be relevant for organismal fitness a priori or are connecting two phenotypic states of interest. Using this approach, subsets of the intragenic landscapes of bacterial β-lactamase  and isopropylate dehydrogenase  were mapped experimentally. Both these single-gene studies yielded smooth ‘Mount Fuji’-type landscapes. Khan et al.  mapped a multigene landscape that arose in an experimental evolution experiment and also found a smooth genetic landscape. A complementary approach to an exhaustive evaluation of small landscapes is the random sampling of accessible trajectories through experimental evolution [8,9].
The fitness landscape of a complex trait is unlikely to be mapped using current experimental techniques because even genetic landscapes that are spanned by few mutations quickly become infeasible to evaluate experimentally. Systems models of complex metabolic traits allow us to investigate landscapes of high dimension, provided that we understand how genotypic or phenotypic parameters map to organismal fitness.
Theoretical studies of fitness landscape are often restricted to the analysis of ‘artificial’ landscapes of given statistical properties [3,10]. Although such investigations are important for studying the general implications of landscape topologies and assumptions about the nature of adaptive walks, they are not based on real biological systems. Theoretical studies on actual genetic fitness landscapes are scarce due to the lack of models for accurate genotype–phenotype mapping. One exception is the prediction of RNA structure from primary sequence. Fontana and Schuster  simulated the folding of RNA towards an arbitrarily defined structure of maximum fitness and found discontinuous changes in phenotype space. Even though this RNA system provides a unique opportunity to study a genotype–phenotype map, the phenotype-to-fitness map is not based on a real biological mechanism.
Although feasible for the simple structural phenotypes of RNA molecules, the computational prediction of genotype-fitness maps for complex traits is currently not possible. The closest we can get to make reliable predictions of complex trait evolution is through models of the phenotypic fitness landscape.
Modelling evolutionary trajectories on the phenotypic fitness landscape makes important assumptions about the nature of the underlying genotype–phenotype map. We firstly have to ensure that phenotypic traits are able to evolve independently. Furthermore, and more importantly, we have to assume that the underlying genotype–phenotype map is a continuous function; the neighbourhood of a point in sequence space should be a subset of the neighbourhood of the corresponding point in phenotype space. Whereas this is not the case for RNA secondary structure, additivity of thermodynamic contributions in protein catalysis provides scenarios where this assumption is justified [6,12].
In cases where these assumptions hold, modelling evolution on the phenotypic fitness landscape might even be preferable to the genotype-based approach. Genotype–phenotype mappings are often many-to-one and focusing on the phenotypic domain can yield a substantial dimensional reduction in the space that has to be explored.
For simple metabolic systems, models were successfully used to understand experimental fitness landscapes in terms of phenotype-fitness mapping [6,13]. The use of optimality criteria for the prediction of metabolic states indicates the feasibility of predicting phenotype-fitness relationships for complex metabolic systems . This makes metabolic systems a promising candidate for attempting to predict evolutionary trajectories in complex systems.
C4 photosynthesis: a complex metabolic solution for the oxygenase problem of Rubisco
C4 photosynthesis is a complex metabolic add-on to the ancestral C3 pathway that involves changes in metabolic gene expression, enzyme kinetics, leaf anatomy and interaction of cell types. A clear connection between phenotype and fitness and the existence of evolutionary intermediates render C4 photosynthesis an ideal case study for the evolution of a complex metabolic trait.
The most abundant enzyme on earth, Rubisco (ribulose-1,5-bisphosphate carboxylase-oxygenase), fixes atmospheric CO2 into three-carbon compounds in the mesophyll cells of C3 plants. Due to the evolutionary history of Rubisco in low O2 environments and the chemical similarity of CO2 and O2, molecular oxygen is a competing substrate for Rubisco. When Rubisco catalyses an oxygenation reaction, one molecule of 2-phosphoglycolate is created that needs to be recycled in the photorespiratory cycle. Apart from a significant energy usage, this metabolic pathway involves the release of CO2 by the glycine decarboxylase (GDC) complex, decreasing photosynthetic capacity by up to 30% [15–17].
C4 plants suppress photorespiration by concentrating CO2 around Rubisco (Figure 2). In order to be operational, this energy-dependent CO2 pump requires changes in gene expression, leaf anatomy and biochemical features. Rubisco expression is shifted from the mesophyll to the bundle sheath tissue where cells possess thick cell walls that are assumed to reduce CO2 leakage . Phosphoenolpyruvate carboxylase (PEPC), an enzyme lacking an oxygenation function, catalyses the primary fixation of CO2 as bicarbonate in the mesophyll. Resulting C4 acids diffuse to the bundle sheath cells and are decarboxylated, releasing CO2 in proximity to Rubisco. The result is a high ratio of CO2 to O2 in the bundle sheath cells that efficiently decreases photorespiration. Genes encoding for all enzymes required for the C4 cycle are found in C3 plant genomes, where they fulfil diverse metabolic functions .
Schematic view of C4 photosynthesis
Although only 3% of all land plant species use the C4 cycle, they contribute about a quarter of primary biomass production on land [20,21]. Under tropical conditions, the ability to suppress photorespiration results in higher rates of photosynthesis and increased photosynthetic water- and nitrogen-use efficiency [22–24]. The most productive crop species are C4 plants, which possess high biomass production rates even in temperate regions . Due to this high productivity, the engineering of the C4 trait into major C3 crops is a highly promising route towards meeting the growing demands on food production [26,27]. Since major features of the genetic blueprint of C4 photosynthesis are still lacking, combining directed breeding with genetic engineering approaches is considered a promising option towards achieving this goal . Understanding the evolutionary trajectories that produced C4 photosynthesis will support such strategies [28,29].
Conceptual models on the evolution of C4 photosynthesis are based on the physiological, biochemical and genetic study of C3, C4 and C3–C4 intermediate species . Such data on intermediate species is valuable for constructing the fitness landscape model, especially when deciding on the independence of parameters and continuity of genotype–phenotype mapping. Of course, caution is advised as to avoid circular arguments when using intermediate species for model validation.
Despite the complexity of the trait, C4 photosynthesis has evolved independently in more than 60 angiosperm lineages . This high level of polyphyly constitutes a striking example of convergent evolution and points towards a smooth genotypic fitness landscape between C3 and C4 photosynthesis.
Environmental factors have a strong influence on the probability of C4 evolution. The atmospheric CO2–O2 ratio, heat, aridity and high light are discussed as important abiotic factors promoting C4 evolution [15,16,31]. On the other hand, the high energy requirement of C4 metabolism leads to a disadvantage of C4 under low-light conditions, such as those that occur in shaded forest areas . Due to these dependencies, the fitness landscape will exhibit a dynamic topology with changing environments, possibly inverting the direction of selection pressure. This dynamic property of fitness landscapes is stressed by the concept of the ‘fitness seascape’ .
A general pre-adaptation in the form of gene duplications is assumed to be a requirement for C4 evolution [34,35]. This expanded genetic repertoire allows neofunctionalization of genes and the distinct expression patterns that are found in C4 plants. This means that phenotypic parameters can change without affecting their original metabolic function. Both enzyme activity and kinetics can thus be treated as independent parameters in a landscape model.
Simulating C4 photosynthesis evolution
Modelling evolutionary trajectories of C4 photosynthesis was first suggested by Peisker , but these early models lacked a fitness-relevant output. Heckmann et al.  extended steady state gas exchange models of C3–C4 intermediates developed by von Caemmerer  to map the phenotypic fitness landscape connecting C3 and C4 photosynthesis in a C4-promoting environment. The main assumption underlying this study is the proportional increase in fitness with CO2 assimilation rate for a given nitrogen budget. Although difficult to verify, this assumption is justified by the advantages of C4 photosynthesis discussed above. Similar assumptions are made for applying flux balance analysis in bacteria , although the connection between the growth rates is more direct than in the case of a plant.
The choice of evolving phenotypic parameters has to ensure their independence. For example, kinetic parameters of Rubisco are unable to evolve independently  and such interplay has to be included into the model . In case of C4 photosynthesis, the underlying genotype–phenotype map is not well elucidated. As discussed above, continuity of the genotype–phenotype map has to be assumed to predict evolutionary trajectories solely in phenotype space. Although recent studies on individual parameters justify this assumption , it is not possible to rule out large phenotypic jumps in response to small genetic changes.
Under these assumptions, the topology of the phenotypic landscape of C4 photosynthesis is of the Mount-Fuji type, similar to some results on small bacterial landscapes [5,6,37]. This result might explain the high level of convergence in C4 evolution . In the in silico equivalent of an experimental evolution experiment, Markov Chain Monte Carlo simulations were used to predict evolutionary trajectories on this landscape . These predicted trajectories lie close to all C3–C4 intermediate species for which parameters were available. This example shows the importance of experimentally accessible intermediate phenotypes for validating evolutionary predictions for complex traits. Due to the high number of assumptions that go into these models, such a validation is essential for studying fitness landscapes. The predicted trajectories show no signs of diminishing returns , which are often observed in bacterial experimental evolution studies [8,9]. The reason for this observation in the C4 model lies in the epistatic effects between landscape parameters; the fitness benefit of late mutations is increased when early changes are present . Furthermore, the predicted trajectories were similar to those that resulted from a purely statistical approach for the inference of trajectories towards C4 photosynthesis .
Further details of the early steps of C4 evolution were elucidated by combining kinetic and genome-scale metabolic models . C2 photosynthesis is a weak carbon-concentrating mechanism that operates by releasing photorespiratory CO2 close to Rubisco in a specific cell type . The fact that many plants in the phylogenetic neighbourhood of C4 photosynthesis utilize this mechanism indicates that C2 photosynthesis increases the probability of C4 evolution. Only when modelling C2 photosynthesis using a detailed genome-scale metabolic model, a variety of nitrogen recycling pathways became apparent . These pathways use components of the C4 cycle to operate, thus indicating that C2 photosynthesis works as a metabolic pre-adaptation to C4 photosynthesis.
Outlook: modelling environments and engineering strategies
Future directions in studying the fitness landscape of C4 evolution will have to integrate knowledge on environmental responses . Conceptual models predict C3 photosynthesis to be advantageous in shaded environments, whereas hot high light environments result in a Mount Fuji landscape that predicts strong selection towards the C4 pathway . The shape of the fitness landscape can thus be expected to transform drastically when environments change. Integrating quantitative physiological knowledge into fitness models will result in a space of environments that will allow mapping this ‘seascape’ of C4 evolution . Such an advanced model will allow for predictions on the evolvability and reversibility of C4 photosynthesis across relevant environments. Furthermore, this will permit the prediction of optimal strategies towards engineering C4 photosynthesis in C3 plants, possibly through the combined use of metabolic engineering and experimental evolution .
The preferred way of validating predictions on the fitness landscape is the comparison with intermediate species. Additional data on the biochemistry, gene expression, anatomy and growth environments will allow the validation and falsification of assumptions on the current models of C4 fitness landscapes.
In the greater picture of the predictability of the evolution of complex metabolic traits, mapping from genotype to phenotype will be the strongest limitation to the development of robust models. In cases where the genotype–phenotype mapping is continuous, this constraint can be relaxed. In cases where this relaxation is not possible, predicting evolution solely based on phenotype-fitness models is likely to fail.
Understanding and predicting the evolution of metabolic systems remains a challenging task, but integration of the available biochemical, physiological and evolutionary data on well-chosen systems will continue to produce robust quantitative fitness models.
I would like to thank Professor Martin Lercher for is insightful comments on this manuscript.
This work was supported by the Deutsche Forschungsgemeinschaft [grant number IRTG 1525] to D.H.
Metabolic Pathways Analysis 2015: Held at Bom Jesus, Braga, Portugal., 8–12 June 2015.