Plants are complex organisms that adapt to changes in their environment using an array of regulatory mechanisms that span across multiple levels of biological organization. Due to this complexity, it is difficult to predict emergent properties using conventional approaches that focus on single levels of biology such as the genome, transcriptome, or metabolome. Mathematical models of biological systems have emerged as useful tools for exploring pathways and identifying gaps in our current knowledge of biological processes. Identification of emergent properties, however, requires their vertical integration across biological scales through multiscale modeling. Multiscale models that capture and predict these emergent properties will allow us to predict how plants will respond to a changing climate and explore strategies for plant engineering. In this review, we (1) summarize the recent developments in plant multiscale modeling; (2) examine multiscale models of microbial systems that offer insight to potential future directions for the modeling of plant systems; (3) discuss computational tools and resources for developing multiscale models; and (4) examine future directions of the field.
Plants have developed complex regulatory strategies, across all levels of biological organization, to adapt to changes in their environment . With the development of next-generation sequencing techniques, scientists have access to large-scale molecular phenotypes, providing information to help connect plant genomes to their phenomes. While the flow of biological information along the Central Dogma of biology (DNA → RNA → Protein) seems simple, it is exceedingly complex, making it difficult to predict plant phenotypes based solely on the genome. Advances in multiscale modeling are needed to account for this inherent complexity and to capture dynamic biological system responses to perturbations.
Plant modeling efforts range from empirical to mechanistic and vary in their predictive ability. Highly parameterized empirical models provide accurate representations of living systems but are rarely able to predict emergent properties. An exception to this is the OpenSimRoot model that predicted yield of maize as an emergent property of root architecture and water uptake . Alternatively, data-driven models, such as machine learning models are entirely predictive, detecting correlations and extracting unseen patterns in the data. However, they do not reveal the underlying causal mechanisms driving the predicted phenotypes. Further, these models are unable to predict beyond the scope of the data used to train the models [3,4]. Finally, mechanistic models are mathematical representations of observed phenomena that attempt to identify causal relationships that result in an emergent phenotype, enabling the extrapolation of predictions about behaviors not present in the original data [3,4]. However, like empirical models, many mechanistic models are only predictive within the scope of the observed data, failing to predict phenotypes when new parameters are introduced .
Mechanistic plant modeling has often focused on biochemical or metabolic processes like photosynthesis [5–7], other primary or secondary metabolic pathways [8–10], and genome-scale metabolic models which aim to capture all of the metabolic fluxes in an organism [11,12]. However, similar to just connecting plant genomes and phenomes, predicting emergent phenotypic responses based solely on metabolic models are difficult. Integrating empirical and mechanistic modeling approaches [3,4,13–15] and creating models that span across biological levels through integrative multiscale modeling (Figure 1) is necessary to understand how plants respond to a changing climate  or to explore how plants can be engineered to achieve specific phenotypic goals .
In this review we discuss the current state of the art in plant multiscale modeling, as well as recent advances in microbial multiscale modeling that offer insight to potential future directions for the multiscale modeling of plant systems. We also discuss recently developed computational tools and resources for plant multiscale modeling, and other future directions of the field.
Multiscale plant models
Many of the current plant multiscale models have been developed for the model plant Arabidopsis, which has been extensively studied over the last few decades [18,19]. One of the most complete multiscale plant models to date is a family of Arabidopsis framework models. The first version of this model integrated (1) carbon dynamic, (2) functional-structural, (3) photothermal, and (4) photoperiodism models to describe Arabidopsis rosette growth . Together, these four models link genetic regulation and biochemical dynamics to organ and plant growth. Chew et al. used their multiscale model to show that increasing leaf production rate in developmentally misregulated transgenic Arabidopsis sufficiently explained the smaller leaf size phenotype of this transgenic. The authors added a clock sub-model that explicitly represented key pathways in the Arabidopsis clock gene circuit to create version 2 of the Arabidopsis framework model (FMv2) . The outputs of the clock sub-model were used to regulate tissue elongations and starch metabolism. With these updates, Arabidopsis FMv2 was able to predict phenotypic responses due to altered circadian timing in clock-mutant plants .
Kinmonth-Schultz et al. further expanded on the Arabidopsis FMv1 model  by adding a mechanistic description of temperature influence on FLOWERING LOCUS T (FT) expression, a photoperiodic flowering regulator . Their model also highlighted some areas for further improvement such as incorporating a more mechanistic description of the relationship between leaf developmental age and FT expression. Zardilis et al. further extended the framework models [20,21] to include reproductive growth and developed a whole life cycle model of Arabidopsis, FM-life . They used FM-life to explore different life cycle strategies for two genetic inputs (low seed dormancy with low floral repression and high seed dormancy with high floral repression) in two different environments (Valencia, Spain and Oulu, Finland). Their model helps explain how genetic variances result in different plant fitness responses in different environments .
Genome-scale metabolic models
Liu et al. integrated a genome-scale metabolic flux model with condition-specific transcriptome data to explore the metabolic response of Arabidopsis under low and high CO2 conditions . However, in many cases using only transcriptomic data has not resulted in the expected improvement in model predictions . Possible reasons for this include that key regulation could be occurring after transcription at either proteomic  or metabolic  levels. The authors also found this to be true with their Arabidopsis model, and they used their model to explore potential areas of post-transcriptional regulation . Their findings suggested that Arabidopsis adapts to low or high CO2 environments by regulating the metabolic activity after transcription .
Multi-tissue genome-scale metabolic models have recently been used to study the balance of resources between plant tissues and across growth stages. Scheunemann et al. used a genome-scale metabolic model to develop a model of Arabidopsis root metabolism that consisted of several coupled cell-type specific models . They used this model to predict the flux of the growth hormone indole-3-actetate through the root and its developmental stages . De Oliveira Dal'Molin et al. created a multi-tissue genome-scale reconstruction of Arabidopsis leaf, stem, and roots, which they used to explore carbon (C) and nitrogen (N) resource allocation between the sink and source tissues with diurnal cycle constraints . Likewise, a multi-tissue dynamic genome-scale model of Arabidopsis was developed and used to explore C/N balance over different growth stages . Using this model, they identified sets of growth-stage specific reactions and reactions that were present during all of the growth stages . This approach has also been successful in crop plants as demonstrated by Moreira et al. who developed a soybean multi-tissue genome-scale metabolic model of cotyledon and hypocotyl/root axis tissues and used it to study how stored reserves in seeds are remobilized during seedling growth . In barley, a multiscale metabolic model was built to couple organ level static FBA models of metabolism with a dynamic functional plant model to simulate the spatiotemporal metabolic behavior of barley . They used this model to study the source-sink interactions during the seed developmental phase of barley plants.
Multiscale models for plant engineering
A third area in plant multiscale modeling are models that were developed to explore strategies for engineering plants to achieve specific objectives. For example, a gene regulatory network model, a protein translation model, a mechanistic photosynthesis model, and a leaf-level physiological model were coupled to explore the impacts of genetic modifications to soybean photosynthesis in ambient and elevated CO2 . The integrated model was used to identify gene regulatory controls of the allocation of resources from Rubisco to RuBP regeneration, which has been shown to improve photosynthesis under elevated CO2. Similarly, a multiscale model of lignin biosynthesis in the model tree Populus trichocarpa, was created by coupling a simple monolignol gene protein translation model, a mechanistic model of the monolignol biosynthetic pathway , and a multiple regression model to predict lignin and other wood traits under transgenic knockdowns of the monolignol genes . This multiscale model was used to explore potential gene engineering strategies for producing trees with improved bioenergy traits while mitigating negative impacts on tree growth. This model was later expanded by incorporating the impact of cross-regulatory influences between the monolignol gene transcripts and proteins, capturing the effect of regulatory mechanisms that occur after transcription, such as potential post-transcriptional and post-translational modifications on predicted monolignol protein abundances [34,35].
Most of the current work in plant multiscale modeling has been to semi-integrate models, where information is passed from one layer to another, but limited feedback regulation is represented. The Arabidopsis FM models [20–23] are close to a fully-integrated framework, but also have few regulatory components modeled. As plant models become more fully-integrated, by communicating information across biological scales, they will better represent dynamic response to perturbation, which may reveal insights about system response to untested conditions. Recent works in microbial multiscale modeling have demonstrated that emergent properties can be predicted using fully-integrated models.
Learning from multiscale microbial models
Twenty eight submodels of diverse cellular processes were integrated to develop a whole cell multiscale model that describes the life cycle of a single cell of the human pathogen M. genitalium at the molecular level . Their model provided insights into previously unobserved cellular behaviors. Through comparison of model predictions and experimental measurements they were able to detect new biological functions including a novel component of the pyruvate dehydrogenase chemical reactions .
A multiscale model of E. coli integrated transcriptional regulation, signal transduction, and metabolic pathways, and was used to predict E. coli growth rates in adverse conditions for single gene knockouts . A multiscale model of chemotactic E. coli was also developed, which models cross-compartment mechanisms linking E. coli to its environment . Further, researchers have used a large-scale integrated model of E. coli to assess the cross-consistency of millions of heterogeneous E. coli parameters collected from the literature, and used the identified inconsistencies to further improve their model .
In S. cerevisiae, Ye et al. followed the whole-cell modeling principles used to model M. genitalium [36,42] to develop a whole-cell multiscale model. This model bridged the gap between genotype and phenotype predictions by developing and integrating models on one second timescales of 26 cellular processes that span five areas of cell biology . Using this model, they were able to study the dynamic allocation of resources during the cell cycle in real-time simulations. Ma et al. took a different approach by developing an interpretable neural network that incorporated 2526 intracellular components, processes, and functions to model the hierarchical structure and function of S. cerevisiae . Their model was able to capture nearly all phenotypic variation in cellular growth, including variation due to genetic interactions under single and double gene deletions. Further, by using an interpretable neural network framework, they were able to identify previously undocumented processes related to cell growth, DNA repair, and UV sensitivity .
Plants are orders of magnitude more complex than microbes, so modeling whole plant systems at the same level of granularity as these microbial models may never be completely feasible. However, recently developed computational and experimental tools have the potential to make more fully-integrated plant multiscale models possible.
The future of multiscale plant modeling
Improving and developing models of metabolic regulation across biological scales including transcriptional, post-transcriptional, translational, and post-translational regulation is one direction for the plant multiscale modeling community. Due to the advances in next-generation sequencing techniques, genomic and transcriptomic research has progressed at a much faster pace than their proteomic and metabolomic counterparts, which primarily rely on mass spectrometry technologies. However, it has become increasingly clear the importance of these omics levels when trying to scale from genotype to phenotype. Novel approaches that couple in vivo proteins with their corresponding mRNA sequences are expected to leverage next-generation sequencing capabilities for proteomics, improving the scalability, standardization, and cost of large-scale proteomic experiments . Another challenge to overcome is obtaining greater resolution of gene and gene product expression over time and space, which would improve multiscale molecular models. Specifically, such spatial and temporal information would inform modeling efforts aimed at understanding development at the cellular, tissue, organ, and whole organism scales. New technologies that provide better spatial and temporal resolution include fliFISH, which allows quantification of mRNA transcripts within organs and cell types , and FRET sensors which are used to track dynamic protein-protein interactions, nutrient signaling, and subcellular visualization of metabolites .
In addition to multiscale models of metabolic regulation in plants, future directions of plant multiscale modeling will involve connecting these microscale cell-type specific, gene regulation, and metabolic models to macroscale models such as multi-tissue and organ models, plant growth models (e.g. APSIM , BioCro , DSSAT ), physiological models, and ecosystem models (e.g. Community Land Model ). In the last few years computational tools have been developed to help achieve this goal. The yggdrasil framework  is an Open source Python package that can couple models written in a variety of programming languages including C/C++, Python, R, Matlab, and Fortran. Kannan et al. used the yggdrasil framework to one-way couple a gene regulatory network model (R), a protein translation model (Python), and a leaf photosynthesis model (Matlab) . yggdrasil is also capable of two-way coupling between models with different timescales, throughout a simulation. Another tool, Vivarium , is an engine that allows different models to be assembled within a hierarchy of embedded compartments and then run as integrated multiscale simulations with multiple timescales and distributed computation. Agmon et al. used this engine to develop their multiscale model of E. coli chemotaxis . The Chromar modeling language  was used in the FM-Life Arabidopsis model , and is a declarative, agent-based language that allows for the compact definition of complex biological models that can evolve with new biological reactions and components throughout a simulation. Additionally, advances in high performance computing (HPC) have been developed and applied to biological problems such as parameterizing multiscale models through statistical inference , modeling biochemical reactions in 3D space , and modeling the relationships between microscale mechanisms and emergent macroscopic behavior of a gliding motility assay . Multiscale models can have high computational costs, especially when scaling from the microscales to the macroscales or when creating fully-integrated models that communicate across scales. As such, HPC could be a key component in creating fully-integrated plant models similar to the fully-integrated microbial models.
There are still many processes in plants that are unknown or not well understood, and as such are difficult to model. Machine and deep learning approaches, which detect statistical correlations in data, have become popular tools to develop predictive models in many fields. These models, however, are black-boxes and do not provide information about the mechanisms behind those predictions, limiting their applicability to the data used to train them and providing little new biological insight [3,4,41,55,56]. Interpretable or knowledge-primed neural networks are a recent advancement in the deep learning field that seeks to address this problem. By forcing the neural networks to have biologically relevant structure and hierarchy, interpretable neural networks have been used to develop models that are both predictive and biologically informative. Interpretable neural networks have been developed to model transcriptional control in fruit fly embryos , to predict cancer and immune cell states from single-cell RNAseq data , and to model S. cerevisiae cellular growth .
Developing models that communicate across these levels of biological organization will enable us to do the following: (1) Improve sink/source dynamics in the source-driven growth models. As many plant responses to stress are sink limited [57–60], this will improve our understanding of plant responses to stress and help us to identify engineering or management strategies for mitigating yield losses under stress; (2) Study carbon and nutrient allocation and remobilization between plant organs. This would allow us to explore methods of achieving optimal allocation strategies and could lead to improved yields and the development of seeds and grains with designed nutritional profiles; and (3) Explore how different plant engineering strategies respond in differing climates. This could have possible ecological impacts through the development of plants that are engineered for higher yields or improved water and nitrogen use efficiencies.
M.L.M. and A.M.C. were both responsible for drafting and revising this article.
Research reported in the publication was supported by the Foundation for Food and Agriculture Research under award number — Grant ID: 602757. The content of this publication is solely the responsibility of the authors and does not necessarily represent the official views of the Foundation for Food and Agriculture Research.
The authors declare that there are no competing interests associated with the manuscript.