Abstract
Drug research, therapy development, and other areas of pharmacology and medicine can benefit from simulations and optimization of mathematical models that contain a mathematical description of interactions between systems elements at the cellular, tissue, organ, body, and population level. This approach is the foundation of systems medicine and precision medicine. Here, simulated experiments are performed with computers (in silico) first, and they are then replicated through lab experiments (in vivo or in vitro) or clinical studies. In turn, these experiments and studies can be used to validate or improve the models. This iterative loop of dry and wet lab work is successful when biomedical researchers tightly collaborate with data scientists and modelers. From an educational point of view, the interdisciplinary research in systems biology can be sustained most effectively when specialists have been trained to have both a strong background in the disciplines of biology or modeling and strong communication skills, which make them able to communicate with other specialists. This overview addresses possible interdisciplinary communication gaps. Focusing our attention on biomedical researchers, we describe the reasons for using modeling and ways to collaborate with modelers, including their needs for specific biological expertise and data. This review includes an introduction to the principles of several widely used mechanistic modeling methods, focusing on their areas of applicability as well as their limitations. A potential complementary role of machine-learning methods in the development of mechanistic models is also discussed. The descriptions of the methods also include links to corresponding modeling software tools as well as practical examples of their application. Finally, we also explicitly address different aspects of multiscale modeling approaches that allow a more complete and holistic perspective of the human body.
Introduction
In the medical field, the goal is to normalize the functionality of biological systems, where an accurate and personalized precision design of intervention is necessary to improve therapeutic efficiency. It is reported that 10 of the bestselling drugs in the United States help only from 1 out of 25 to 1 out of 4 people who assume them. 1 Among the possible solutions to reach a predictive and patient-tailored medicine, mathematical modeling is being recognized as one of the most promising one, given its ability to predict the effects of drugs without having to resort to in vivo or in vitro experiments, with the additional advantage of increasing effectiveness while reducing costs.
Looking at the future of medicine, and given the recent discussions that state how “precision medicine” comprises all the approaches based on a person's genetic, microbiome, environment and lifestyle, we use the term “precision medicine” when referring to “personalized medicine.”2,3 Mechanistic mathematical modeling is a hardly replaceable tool of precision medicine due to the implemented mathematical interaction of systems elements. It can help to find new indications or patient subgroups as good as a 10-year-long drug research study, and it can contribute toward changing biology from being a qualitative and descriptive to a quantitative and explanatory approach. 4
Mathematical modeling, also seen as the translation of beliefs about the functioning of the world into the language of mathematics, has been widely used for developing complex technical systems. 5 We can define a mechanistic mathematical model as the mathematical description of the elements forming a system, their mutual interactions and the interaction with the environment. Such models are used in technical systems to enable the extrapolation of systems behavior relying on the mathematically described features of elements and mechanisms of their interaction.
Systems requiring high reliability, such as buildings, bridges, and aircraft, are designed with the help of mechanistic mathematical modeling approaches with the aim of reducing costs while ensuring the necessary reliability of the subject. In the case of biological systems and processes, the building tasks are currently addressed by synthetic biology through the altering of existing biologically and chemically relevant compounds, designing genetically modified simple organisms. 6 Just as building technical systems, the engineering component should also be present in synthetic biology applying mathematical modeling and optimization.7,8
Mathematical modeling (development of mathematical description of processes, parameter fitting, and model validation), simulation (prediction of different behaviors using the validated model), and optimization (search of the most appropriate action to reach desired behavior) are already used in systems medicine, and mechanistic modeling is currently the main focus.4,9 In contrast to engineering fields, features of elements and mechanisms of their interactions are mostly unknown. Therefore, the complexity of human biology and the lack of detailed information about the biological system elements interaction limit the accuracy and applicability of a mathematical description of the bioprocesses. 4 However, it is useful to exploit the available, although incomplete, knowledge by different modeling approaches. “It is better to be almost right than exactly wrong” as John Maynard Keynes said. Even considering only the interactions between the cell's elements (small molecules, RNA, DNA, proteins), it is possible to extract useful information thanks to network biology 10 methods that belong to mechanistic modeling approaches.
Mechanistic and non-mechanistic (machine learning, “black box”) modeling are two different ways to approach a subject under study. Mechanistic modeling looks at the emerging side of systems properties (phenotype) as a result of the interactions of the systems elements at a cellular level in response to the environment (Fig. 1). Machine learning predicts the behavior of a system relying on the knowledge acquired from the relationships between inputs and outputs, without reasoning them by the interplay between systems elements. 11 On the other hand, mechanistic modeling makes possible the understanding of a systems functionality thanks to the knowledge of the interaction of systems elements. That means also being capable to predict a behavior when the elements, their amount, or the interaction rules change. Systems biology, 12 just as systems medicine, 13 aims at the understanding of a system of interest at a mechanistic level. Therefore, this review is mainly focused on mechanistic modeling, pointing out also to the applicability of machine-learning methods to build inputs for mechanistic models.

Mechanistic representation of system inputs (Ix), external perturbations influencing the system (Px), system outputs (Ox), system forming elements (Ex), and interactions (Iax) between elements. Mechanistic modeling aims at predicting systems behavior and outputs describing the interactions between system-forming elements.
Mechanistic modeling can make predictions of systemic effects, such as defects in a gene that results in different amounts of gene-coded products, application of drugs or their combinations, impact of a therapy on different genotypes, and other cases. Multiscale modeling refers to a modeling approach in which multiple models at different levels or scales are used simultaneously to describe a system. They can be more mechanistic in nature or more empirically oriented.
The holistic understanding of the functionality of a system is generally addressed by systems biology, 12 whereas a specific application of systems biology in medicine is called systems medicine.9,13,14 The personalization of the systems biology approach in medicine, which considers genetics and all the other peculiarities of an individual, has led to the development of the precision medicine branch.15–17 In contrast to the current assumption on clinical trials where patients will respond in a similar way, modeling can help physicians perform precision medicine. 18 Models can take into account important genetic, environmental, and even gut microbiome peculiarities. 19
There are several medical branches that stepped into precision medicine using systems biology and mathematical modeling approaches: cancer research,20,21 liver research,22,23 antibody research, 24 cardiovascular research, 25 blood research, 26 drug discovery,27–29 and more. Many new branches are joining, but the complicated educational process in systems biology 30 may delay new applications. The research community should actively promote opportunities and new approaches to systems medicine.18,31
There are initiatives that have been promoting mathematical modeling applications in medicine: Avicenna Alliance, Virtual Physiological Human Institute, Virtual Metabolic Human, and others.
This review is devoted to the analysis of applicability and appropriateness of different mechanistic modeling formalism and approaches and their combinations, depending on the type and the amount of available data and task setting. Several popular mechanistic modeling methods are introduced to give insight to their versatility, and they give examples of their applicability depending on the available knowledge, data, and task setting to explore them in more detail. Application examples of mathematical modeling of bioprocesses in medicine are named to facilitate similar applications. Multiscale modeling approaches, which use different layers of biological complexity combining modeling approaches, are exemplified and analyzed. Moreover, machine learning-based modeling approaches are mentioned as synergistic activity.
Starting Point of Modeling Approach Selection: Available Information About the System
The available data usually limit the choice of modeling approaches. Data about systems behavior are necessary for model building and validation. Information about systems behavior can be described by different types of data, but a relationship between system inputs (e.g., metabolites, signals, liquid flow, microbiome) and outputs (other metabolites, signals, liquid flow) connected to the state of a system (inflammation, blood vessel blockage) is necessary. With this kind of data, combined with modeling task setting (e.g., determination of element interaction, finding most effective drug target with least changes to the system), a modeling can be initiated.
The large amount of clinical data can be considered for the analysis and localization of various issues. Automatic detection and classification using clinical data is still a challenge. Only low sensitivity with a high rate of false positives has been achieved with currently available techniques, which are usually non-patient specific. This is because the data observed in a clinical setting are noisy, and since they contain artifacts, they are more heterogeneous than other data obtained under controlled laboratory conditions. The reasons for this arise from the actual biological characteristics of the system, and the fact that biomedical-based researchers are largely unaware of the requirements present in the modeling field.
Mechanistic Modeling Approaches Tackling the Single Level of Organization
Mathematical models can be categorized in several ways. One way is to consider the type of outcome that the model returns. In a deterministic model, the random variation in the outcome is ignored. This is different from a statistical model or a stochastic model where the aim is to model the outcome distribution. 32 An entire spectrum of models lies in between strictly deterministic and strictly statistical. 33 There is another way, complementary to the previous one, to categorize the landscape of modeling approaches, and it is defined by how a model copes with hierarchies or scales in the data. For instance, a model may focus on one level of information and merely quantitatively consider other levels (e.g., empirical models including stochastic hierarchical models) or may explicitly account for mechanisms through which changes to the entire system or components can occur (mechanistic model).
A crucial factor in the applicability and appropriateness of modeling methods is the availability of the data about the system's elements that are the interacting entities of a system of interest and form or influence its behavior. The knowledge of the elements of a system and their interactions enables the building of a mechanistic34,35 or cause
In case of missing information about system-forming elements, it is still possible to develop a model that can be validated just by knowing the relationship between the input and output of the system by using machine-learning approaches. This is a “black box” (non-mechanistic) model, mostly used when artificial intelligence aims at predicting the reaction of a system knowing its input and, possibly, the state of the system. 36 A typical feature of this type of model is the need to train the model with a number of cases to teach the model, such as in the case of artificial neural networks. After training the model, it can be used to predict how systems react to a change in the inputs or to perturbations, whereas the system-forming elements and their role are not addressed leaving the system as a black box.
Choosing an inappropriate modeling method may lead to inadequate estimations in case of poor data, ignorance of available data, or pursuit of a false research track due to wrong and/or biologically irrelevant hypotheses. In this review, we classify bioprocess modeling by using mechanistic and machine-learning approaches, discussing and underlining their strengths and limitations as well as their applicability on the available data. Possible “hybrid” approaches (gray box), where part of the process is well defined while another part is vague, are addressed as well.
Mechanistic modeling approaches
Mechanistic modeling requests at least some information about the process of interest-forming elements. Different mechanistic modeling approaches (Table 1) can help to estimate the parameters of interactions during the parameter estimation process and the validation of model performance with experimental data. 4 Although a description of modeling methods can be done in an atypical manner too, we focus more on the description of classical applications.
Application Examples and Tools for Mechanistic Modeling Approaches
Network modeling
There are many biomedical studies where the main interest is not the characterization of the elements composing a system, but instead the modeling of their interactions. To illustrate, a disease may be caused by a modified interaction pattern between a set of genes even when these are not mutated, 100 and the connectivity patterns between brain regions that are known to change in pathology. 101 In these situations, one can resort to complex network theory,102,103 a statistical physics understanding of the classical graph theory. Networks are objects composed of a set of nodes, representing the elements composing the system of interest, connected by links, representing the interactions in the real system. Physicists usually divide networks into two groups, depending on how they are reconstructed: physical networks, where the links are explicitly known (e.g., the co-expression between pairs of genes is validated in vitro); functional networks, where the links are inferred from the dynamics of the elements. 104 Note that, in this latter case, the word functional does not imply a common function, but instead that the dynamics of nodes is a function of their connectivity, such that the latter can be derived from the former. From a biological perspective, it is usually more intuitive to classify networks as physical/chemical (e.g., protein-protein interaction networks, metabolic networks), functional (in this case denoting a similar function, e.g., genetic interaction networks), and others, when the nature of a link is not easily definable (as is the case of correlation in gene correlation networks). In both cases, the resulting structure can be analyzed by different topological metrics, 105 that is, metrics assessing specific structural properties such as the abundance of specific connectivity patterns (also known as motifs 106 ), the identification of the most important nodes, the presence of communities, 107 or even estimating the similarity of multiple networks. 108
As can be inferred from the earlier description, the applicability of network modeling is extremely wide: A network can, indeed, be applied to any problem involving a complex system, that is, a system composed of a large number of elements, and where the interest resides in the interactions between these elements.
Once one or several complex networks have been reconstructed, the researcher can nowadays rely on several software tools to make the analysis process easier. For the sake of simplicity, we group them into two families: libraries for the numerical analysis of networks on the one, and stand-alone software on the other. The first ones can easily be integrated inside a more complex analysis workflow and are usually more efficient at analyzing large batches of networks, whereas the latter ones offer a more user-friendly experience. Among the libraries, the two best known are NetworkX 43 and iGraph 44 for the programming languages Python and R, respectively. On the software side, the options include Cytoscape, 45 Pajek, 46 or Gephi. 47
De novo network enrichment
Information about interactions between biomolecules has been collected in various pathway databases such as Reactome. 109 These pathways describe interactions of a few genes, proteins, or metabolites that have been implicated in a specific biological process. Commonly, a hyper-geometric test is employed to identify pathways enriched with genes of interest that have been identified in an experiment, for example, via differential expression analysis. A limitation of this approach is that they can only be used to consider experimental results in the light of what is already known and captured in databases. New biological processes or pathways that have not been previously described will thus be neglected, limiting the potential for true discovery.
Alternatively, general molecular interaction networks such as BioGrid
110
constructed from experimental evidence for protein
BioNet is an R package and popular tool for network enrichment analysis. 54 In Cytoscape, KeyPathwayMiner is a popular app, 55 which is also available directly in the web browser. 56 KeyPathwayMiner supports the use of multiple omics datasets that can be combined by using customizable logic expression (and/or).
Bayesian modeling
In biomedical research and health care, typically only a subset of all factors involved in a given process can be observed. In addition, such processes include individual variation as well as random effects, resulting in uncertainties. Thus, the overall understanding of such processes as well as predictions regarding progression and outcome remain a challenging task. Bayesian network models utilize probability theory in combination with graph theory,113,114 making them a useful approach to describing and reasoning in problems dealing with uncertainties.115,116
Unlike many other machine-learning models, Bayesian network models are not designed as a black box. The network nodes have a semantic interpretation and are thus human readable, facilitating intuitive understanding and communication of the network structure. Nodes usually represent observed or latent variables, but they can also represent unknown parameters or hypotheses. Bayesian network models are constructed as modular directed acyclic graphs, in which knowledge is represented as relationships between variables, and it is assumed that each node is directly related (linked) only to a subset of the other nodes. These related nodes are assumed to be conditionally dependent, whereas the absence of links implies conditional independence. Each node is assigned to a probability function, which computes the probability of the feature represented by the node conditional on the nodes' parent nodes.
The networks can be either constructed from expert knowledge 117 or learned from data, 118 often in combination with feature selection algorithms. A Bayesian network model can be used to compute the state of a subset of variables given other observed variables (termed evidence variables), with a process called probabilistic inference. For example, a Bayesian network could be constructed from the probabilistic associations between certain blood parameters and certain diseases. Measured blood parameters could then be used to compute the probability of the presence of the respective diseases.
The applicability of Bayesian network models in biomedical research and health care focuses on the diagnosis and prediction of disease trajectory and treatment response in precision medicine approaches, but it can also be applied to aid in health care planning and resource allocation for larger patient cohorts. Another important application of Bayesian networks is the analysis and interpretation of high-dimensional molecular data, for example in gene regulatory networks.
Common tools used to construct and analyze complex Bayesian networks are the bnlearn package for R 62 and the Bayes Net Toolbox for Matlab. 63 Options that do not require programming are the BayANet browser tool, which can be used to manually construct simple networks, or commercially available software, such as Bayesserver, Hugin, or Netica.
Logical modeling
The logical formalism originates from the seminal work of S. Kauffman and R. Thomas on a coarse-grained modeling formalism of gene regulatory networks.119,120 Since then, methods and tools have been developed, and the framework has been successfully applied to large regulatory networks encompassing genetic circuits as well as signaling pathways (see e.g., Abou-Jaoud et al. 121 for a review). Basically, a logical model is defined by an interaction network where nodes represent regulatory components (genes, proteins, etc.), and signed directed links represent regulatory effects (activation or inhibition). Each node is associated to a discrete variable, usually Boolean, that represents the qualitative state or functional level of the corresponding regulatory component (activity, expression, concentration, etc.).
The dynamics of the model is specified by logical regulatory functions defining the state of each component, depending on the state of its regulators. Properties of interest of the resulting discrete dynamical systems refer to their attractors (sets of states in which the system is trapped). Attractors correspond to long-term behaviors and are, thus, associated with cell phenotypes. Hence, the modeler is interested in identifying the attractors, and also in assessing their reachability properties. Figure 2 provides an illustrative example of a Boolean model.

A toy Boolean model:
Over the past two decades or so, efforts have been made to develop efficient methods and tools for the analysis of complex regulatory networks. 121 Importantly, the Consortium for Logical Models and Tools (http://colomoto.org/) aims at coordinating methodological efforts and promoting the usage of the Systems Biology Markup Language (SBML) equal standard format, which allows tools interpretability.122,123 The consortium website maintains a page with software tools for logical modeling.
The logical formalism is applicable to deal with the lack of precise, quantitative, and kinetic data, which is generally the case for large regulatory networks. Despite its high abstraction level, it allows to recover essential dynamical properties of the modeled systems. Applicability of the formalism has been demonstrated in a wide range of biological fields, with the recent emergence of modeling studies of networks involved in complex diseases such as cancer. These studies tackle quite diverse issues, but they all relate to the existence of attractors and their reachability properties, under mutations (easily implemented in logical models by modifying the logical regulatory functions) or environmental conditions (represented by the values attributed to network input components).
Ordinary differential equation-based dynamic modeling
Dynamic (often also named kinetic) modeling by ordinary differential equations (ODEs) can give very accurate characteristics about parameter changes in time of the process of interest, including transition processes and steady states. This approach can take into account different types of non-linearities that can determine complex behavior and cause emerging features as oscillations, instabilities, and others that may not be observed or analytically predicted by other modeling methods. Systemic features such as stability of steady state, sensitivities, elasticities, and other features can be calculated analytically. The results of simulation can be directly compared with experimental results.
The mathematical and analytical part of ODE is well developed due to the rich history of their application in very different branches of research and industry working on simulation and optimization tasks. Mentioned features of ODE-based models come at the cost of detailed information about the interactions between system elements. This kind of information is usually available only for human-built technical systems. Indeed, only the known and mathematically described interactions between elements can be exploited to design technical systems. Different is the case of biology: The interactions have to be studied, estimated, and described by an appropriate equation, and the numerical values of equation parameters have to be determined from literature, databases, experimentally or with parameter estimation methods.
ODE-based modeling is very universal in terms of applicability. It can handle metabolic, signaling, flow, and many other modeling tasks and their combinations because of the flexibility of the definition of process dynamics. The main limitation of ODE application in biology is the necessity to determine parameters of interaction dynamics between elements. Usually, we have insufficient knowledge to mathematically describe the type of interactions and parametrize them. Popular tools for ODE-based models with user-friendly interfaces that do not require programming skills are COPASI77,78 (http://copasi.org) and CellDesigner. 79 A good overview of ODE-based modeling tools for biomedical applications can be found on the SBML website (http://sbml.org/SBML_Software_Guide).
Stoichiometric modeling
Stoichiometric modeling is based on balanced equations of biochemical reactions and mass conservation law. Stoichiometric modeling approach is a very popular modeling metabolism to describe, simulate, and optimize possible steady states. The advantage of the stoichiometric modeling approach is the very limited information that is needed about the object: biochemical reactions (determined by present enzymes in the cellular genome) and their stoichiometry. Transport reactions of species through membranes are represented separately.
Constraint-based modeling is a modification of stoichiometric modeling where lower and upper limits of particular reaction fluxes are limited, giving more accurate estimation of possible metabolic behavior of cells in particular conditions. Stoichiometric modeling has been enriched with opportunities to encode associations of gene
Stoichiometric modeling approach has great application potential in systems medicine and precision medicine, 125 as all humans share the same metabolic reconstruction that can be personalized for an individual by taking into account genetic and other information. Currently, the biggest effort has been human metabolism reconstruction Recon3D incorporating 13543 reactions, 4140 unique metabolites, and 12890 protein structures. 81 It can be accessed and simulated online at http://vmh.life.
Recently, two stoichiometric modeling-based gender-specific whole-body metabolism reconstructions are proposed: They capture metabolism of 20 organs, six sex organs, six blood cells, gastrointestinal lumen, systemic blood circulation, and the blood–brain barrier representing 99% of human body weight except for the skeleton. At the whole-body scale, the model behavior can be constrained by physiological parameters such as heart rate, weight, height, and flow rates of urine and blood. Models can be parametrized by physiological, dietary, and omics parameters 126 to be used for precision medicine tailored for an individual.
The applicability of stoichiometric modeling is focused mostly on metabolism, as mass balance can be applied for metabolism. The advantage is that the models can be built at the genome scale and automatically drafted from a genome sequence. The most popular modeling tools are variations of the COBRA toolbox 84 that is available as a Matlab toolbox as well as Python scripts (CobraPy). 85 Other popular stoichiometric modeling tools are RAVEN 2.086 and Merlin. 87
Stochastic modeling
An alternative to the (deterministic) interpretation of kinetic biochemical models using ordinary or partial differential equations, as described earlier, is stochastic modeling. Here, the system is viewed as a stochastic process. This perspective has the advantage that stochastic fluctuations in particle numbers are explicitly considered over time. These fluctuations are due to the random timings of reactive events, that is, single (bio-) chemical reactions taking place in the system. They are intrinsic to biochemical systems and their effects are, therefore, inseparable from the dynamics of the system.
The effects of these intrinsic fluctuations can also be very important for the functioning of the system, for example, in the case of phenotypic variation (one classic example can be found in Arkin et al. 127 ) or spontaneous switches in multistable systems. 128
Stochastic modeling has a long history. In 1976, D.T. Gillespie described an algorithm called the Direct Method or simply SSA (for stochastic simulation algorithm), which can be used to simulate stochastic time series of kinetic biochemical models.129,130 Simulation in this context means that each simulation run yields a different time course, as the method uses (pseudo-) random numbers in a randomized algorithm. However, all simulated time series are faithful samples from the underlying stochastic process, which is governed by a Chemical Master Equation. 131 Statistical properties of the model, such as mean values of concentrations, covariances between biochemical species, or distributions of period lengths in oscillating systems, can be calculated from a set of simulated time series.
Stochastic modeling is quite universal in terms of applicability and should be considered whenever particle numbers in the system are very small, some subprocesses are slow, or instabilities within the system can lead to an amplification of intrinsic fluctuations. In practice, stochastic modeling requires the stoichiometry of a system and information about the kinetic functions as in deterministic modeling. In addition, all reversible reactions have to be split into a forward and backward component as they can influence the fluctuations separately, and care has to be taken when the model contains so-called lumped kinetics, for example, kinetic functions of reactions that are an approximation to a whole set of underlying elementary reactions.
As the exact simulation of stochastic time courses using Gillespie's algorithm can be computationally demanding, particularly for systems containing very fast reactions or a lot of particles, approximate SSAs have been developed to trade some accuracy for speed of calculation, for instance the τleaping method 132 or stochastic differential equations. Hybrid approaches, which try to combine the advantages of deterministic and stochastic simulation methods, are an important and promising subclass of approximate stochastic simulation methods (for a review see Pahle 133 ).
There are several software tools that allow stochastic simulations. For instance, both COPASI 77 and StochKit 93 provide several different exact and approximate SSAs.
Agent-based modeling
Agent-based modeling (ABM) has evolved as a simulation of two-dimensional movements of systems elements (agents) and their interactions depending on rules assigned to different types of agents. The great advantage of ABM application in biological system modeling is the relatively easy natural incorporation of space and stochasticity in three and more dimensions. 134 Agent-based models are composed of agents, environment, and a set of rules describing agent behavior in terms of possible interactions. 135 In modeling biological systems, the agents can be molecules of metabolites, enzymes, signaling molecules, ligands, as well as complexes of molecules, cells, organisms, or any other formations. Interactions can be binding, activation, biochemical reaction, and others.
Agent-based modeling can be used to find unexpected emerging features of system behavior depending on the changes of agents feature or its parameters. In other words, in case of a system's strange behavior, agent-based modeling can be a way to find what unexpected features of system behavior can emerge 136 as a consequence of interactions assigned to an agent. Sometimes, relatively simple interaction rules can result in seemingly complex and organized behavior. ABM is modular: New agents can be introduced or rules of existing agents can be changed without re-organizing the model. 134 Another important advantage of ABM is the opportunity to involve agents with different levels of detail in the same model. Especially in studies about organisms or multiscale modeling, it is not practical to build all processes at the molecular level even if it would be possible. 136
ABM has a wide applicability range in terms of processes that can be modeled because of freedom in rule definition for different agents. That feature also limits the ABM application because of the difficulty to formally analyze the impact of a particular feature or its parameter on the behavior of the system in contrast to equation-based modeling where stability, sensitivity, and similar parameters can be easily derived. ABM has also a high computational cost compared with equation-based modeling approaches. 134
Among many agent-based tools (see reviews of Allan 135 , Abar et al. 137 ), FLAME (Flexible Largescale Agent-based Modeling Environment) 98 and SPARK (Simple PLatform for Agent-based Representation of Knowledge) 99 can be named as popular in systems biology applications.
Non-mechanistic (black box) modeling approaches
The advantage of non-mechanistic approaches is the prediction of system behavior when only information about inputs and outputs of a system is available. Non-mechanistic models using various methods (e.g., machine learning or artificial intelligence techniques) can be trained to classify input
The growing amount of available, voluminous, and rich data such as electronic health records data coupled with various omics layers have reignited interest in exploiting these methods. Only recently has the landscape of biomedical research started to embrace also the developments in the fields of various omics layers of systems medicine, leading to the production of sizeable datasets amenable for extended modeling.139,140
In addition to the higher availability of biomedically relevant datasets, the underlying quality of data has important implications for the selection of modeling approaches. When comparisons were at low risk of bias, the quality of logistic regression and machine-learning models for clinical risk prediction were similar. Further, another layer of uncertainty is added by the fact that comparison of clinical prediction models based on logistic regression and machine-learning algorithms suffered from poor methodology and reporting, especially in the validation phase. Finally, when comparing machine-learning algorithms with logistic regression in situations with high risk of bias, machine learning turned out to perform better. 139 This short example highlights the need for research to pay attention when identifying which algorithms are the most appropriate for different types of problems and it provides guidelines on how to use them to a wider audience of biomedicine.
Combination of different modeling approaches
Different modeling approaches can be combined to get new insights in the process of interest: Different methods shed light on different aspects of the process of interest just as different types of biological experiments and measurement technologies do.
Looking at simulations of dynamics, ODE-based models give deterministic simulation results: They are always identical as long as the model parameters are the same. When some elements are small in number, it is useful to check the same model at stochastic mode to see whether stochasticity has an impact on the process of interest. This exercise can be well managed by using COPASI software. 77
In the case of metabolism, stoichiometric and kinetic modeling can be combined to gain more knowledge about the possible steady-state limitations. This is useful because kinetic models are usually made at the pathway scale assuming that the organism will be able to deliver all needed cofactors and other necessary molecules. That is a risky assumption, but the ability of a steady state to operate at genome scale can be checked by genome-scale stoichiometric models of metabolism. 141
Non-mechanistic or machine-learning models might be used to identify the impact of the particular input of a system and lead to the identification of important elements to be included in a mechanistic model. Machine learning can find new patterns in large datasets and propose important co-occurrences and dependencies. 11
Multiscale Modeling and Multiscale Computing
Fundamental concept
In engineering sciences, it has become a standard strategy to study materials (and eventually also the structures made thereof) by means of so-called multiscale methods. This modeling approach rests on the desire to understand the behavior of a heterogeneous material by genuinely taking into account its hierarchical organization. If a material is composed of several different constituents exhibiting different physical properties, it is evident that the corresponding physical properties of the overall material are somehow related to the constituents' properties, to the interactions between the constituents, their spatial distributions, and the volume each constituent occupies.
This concept has also been adopted in the field of systems medicine, where the aforementioned hierarchical organization does not necessarily involve continuum-type material phases, but rather discrete biological entities or processes; for example, multiscale modeling is key for interpreting and understanding the complex processes dealt with in systems medicine. 142
The human body, or any other complex biological system, can be considered as a hierarchically organized assembly of building blocks. Molecules and macromolecules (such as lipids, proteins, and DNA) assemble the cells that are found in tissues, organs, systems composed thereof, and finally in the whole organism.143,144 As a result of this, the aforementioned hierarchy can be identified at different levels of biological organization.
145
The level hierarchy is based on the increasing physical length scale, which correlates with an also increasing organizational complexity. Conventional modeling approaches often focus on processes taking place at just one level of organization, such as gene expression or tissue biomechanics, leaving out lower- or higher-order phenomena that influence the process under consideration. However, signals coordinating a physiological function generally communicate across the different levels through bottom
Limitations of modeling approaches at different biological scales
This subsection is devoted to summarizing some limitations of modeling approaches at the various spatial scales (sometimes also referred to as characteristic lengths) of biological systems. As an exemplary system, we consider the immune system and its processes, because of the wide availability of models at different spatial scales. Therefore, with “biological scale” we refer to the spatial dimension at which the respective processes typically occur.
Complex biological systems are arranged into modular and hierarchically structured elements: from molecules (RNA, DNA, proteins, etc.) to organelles; then to cells, tissues, organs, organisms, and ecosystems. Broadly speaking, the biological scales are sorted into three levels: microscopic (109–107 m, relevant for molecules, molecular interactions, and intracellular events), mesoscopic (106–104 m, relevant for cells and cellular processes), and macroscopic (103–100 m, relevant for larger events, such as blood circulation). 148
– At the intracellular level, the immune transduction pathways enhancing or reducing inflammation as isolated cascades are taken into account in many works. However, the behavior of immune cells in an inflammatory environment is eventually determined by the concomitant engagement of many, intertwined pathways. 149 Future MSM should, therefore, focus on describing how different immune pathways interact with each other, possibly leading to both synergistic and antagonistic behaviors.
– At the mesoscopic scale, some effort has been made,150,151 but a consistent problem lies in the lack of quantitative reconstructions of the signaling networks among immune cells.
– At the macroscopic scale, only a few modeling approaches have been developed while taking into account the geometry and the compartmentalization of organs.152–154 It is worth mentioning the problem arising from integrating and bridging diverse modeling paradigms at different levels (such as discrete/continuous, deterministic/stochastic models, fast/slow characteristic times, temporal scales). Further difficulties involved are handling numerical instabilities, estimation of the model-governing parameters, model sensitivities, computational demands, as well as standardization and re-usability of the existing models. 155
– A major challenge entails the deluge of omic data. In particular, integration of data derived from proteomics, genomics, transcriptomics, and metabolomics to higher-level phenotypic features is a crucial yet unresolved problem. In this context, two comprehensive review works have been published, especially devoted to bioinformatics resources,156,157 whereby 157 they are particularly focused on methods to be used across multiple scales. The current hope is that the wider availability of data, databases, and easy-to-use resources will push toward the convergence of omic data and MSM.
In addition to the human-derived data, one also needs to consider the effects from microbiome-derived signals that were shown to influence to a large extent the genome transcription and metabolic behavior of mitochondria and human cell types in various ways in different tissues, building very complex structures of interactions at different scales (Fig. 3). This also means that once multiomics hierarchies are established for human-related data, they would need to be coupled to another hierarchy of microbiome-derived signals over the same (microbiomerelated) omics layers.

Interaction of multilevel and multiomics layers of information within the human microbiome system (reproduced from Stres and Kronegger, 158 initially reproduced with permission and modified from Hasin et al. 159 ). Circles represent the entire pool of molecules detected in various omic data layers. Genetic regulations and environmental impacts are interaction with all data layers, except the genome (GWAS). The potential interactions or correlations are represented by thin red and black arrows, respectively. A, archaea; B, bacteria; F, fungi; GlLip, glycolipids; GWAS, genome-wide association studies; LPS, lipopolysaccharides; mE, mobile elements; P, protozoa; PrGl, proteoglycans; V, viruses.
From uni- to multiscale modeling strategies applicable to biological properties and functions
A wide range of modeling techniques dealing with specific length scales of biological systems is available.
160
For example, at the intracellular scale, differential equations are typically used for the description of molecular processes occurring in the cell membrane or in the cytosol, involving mass-action or Michaelis
Alternatively, so-called microsimulations may be an option as well. For instance, the Gillespie algorithm allows to simulate chemical or biochemical systems of reactions, generating trajectories as possible solutions of a stochastic equation.129,130
On larger characteristic lengths, tissues or whole organs may be modeled as functional compartments, for which black-box modeling approaches are a popular choice. Clearly, such models are purely phenomenological, and the underlying mechanisms are (partly or completely) neglected, which potentially restricts the extrapolator potentials of such models. On the other hand, when considering tissues or whole organs as collections of components, the prevalent modeling paradigm is based on the idea that their overall function can be described as the combined behavior of an array of individual units (i.e., cells), interacting and exchanging signals with the environment.
Multicellular systems of this kind were developed in the past to study solid tumor formations161,162 or simulating the regeneration of complex organs such as the liver. 163 Further, the kinetic theory has been put forward as an alternative approach for deriving macroscopic equations from the dynamics observed at a lower scale. The underlying concept involves the so-called asymptotic method, based on which the macroscopic equations result from the limit of Boltzmann-type equations, which, in turn, are related to the statistical microscopic description.164,165
Intriguing examples of multiscale models also include the approach proposed in Refs.166,167 that is related to the field of hemodynamics. The authors propose the coupling of a local, accurate three-dimensional description of blood flow by means of Navier–Stokes equations in the region of interest (e.g., a specific artery) with a rigorous zero-dimensional lumped model of the remainder of the circulation system. 168
Another methodology worth mentioning aims at solving the problem of heterogeneity and multiscale modeling as well as the link between mathematical and computer models. 169 This methodology, massively used in theoretical computer science and software engineering, uses state transition diagrams170,171 (i.e., deterministic or probabilistic finite state automata) to describe the behavior of heterogeneous entities. However, this methodology does not scale well with the model complexity; thus, while providing a conceptual framework, it does not seem to be used in practice.
Other multiscale models involve aiming at simulating a whole cell as virtual cell 172 or e-cell.173,174 Similarly, but at the level of whole physiological systems or organs, one can turn to models of the heart, 164 of the liver, 175 and of the skeletal system 176 as valid examples of multiscale systems. Further examples include multiscale modeling approaches aiming at predicting tumor evolution, 177 the modeling of angiogenesis, 178 studying the signaling pathways that are relevant for specific kinds of cancer, 74 predicting cardiotoxicity, 179 and introducing so-called precision cardiology, 180 just to name a few examples. Also, the reader is referred to the numerous pertinent review articles (see, for instance, Refs.181–183 ).
Computational multiscale methods
Several techniques have been adopted from other fields for the simulation of biological models spanning different space/temporal levels, 184 including (but not limited to) the heterogeneous method, hybrid quantum mechanics-molecular mechanics, the equation-free method, the quasi-continuum, the multigrid, the multiscale numerical scheme, and the adaptive tabulation approach. 184 It should be noted that, to date, none of these methods has emerged as the multiscale method to be used to model biological phenomena, where each of them is characterized by specific advantages and disadvantages in terms of computational efficiency.
In contrast, the multiscale agent-based modeling paradigm seems to have gained consensus among researchers in the bio-modeling field. An example of using such a multiscale approach is the one related to the simulation of type I hypersensitive phenomena. This model consists of an agent-based formulation of the cell-cell/molecules interaction involved in the immune responses to a generic antigen combined with a detailed gene regulation model set up as a Boolean network. 185 What makes this approach particularly interesting is that genetic data can be integrated with cytological data, making a genotypic/phenotypic cause/effect analysis possible. 186 This approach has two main advantages: (1) It is the kind of information that clinicians are looking for; (2) the two modeling descriptions can be developed, analyzed, and validated independently from each other and only later combined. This paradigm allows for robustly scaling up to more complex phenomena.
Similar works incorporate sets of ODEs rather than Boolean networks in agent-based models. For example, in Beyer and Meyer-Hermann, 187 combining ODEs with agents for chemokine receptor internalization of lymphocytes in the context of tissue instability in arthritis is proposed. Another example in Ref. 188 describes the combination of molecular, cellular, and tissue scales in a spatial model of the intestine. Moreover, in Perfahl et al., 189 the domain size effects in vascular tumors in a 3D agent-based approach along with a reaction-diffusion system is discussed.
Another example of multiscale immune simulation combining agents to represent the cellular mesoscopic level with ODEs to describe the time-dependent antigen presentation process by means of ODEs is provided by Kirschner and coworkers 190 in the context of the immune response to Mycobacterium tuberculosis. In another work, the same authors integrate information over relevant temporal scales to model major histocompatibility complex class II-mediated antigen presentation and to suggest new mechanisms and strategies for treatment and vaccines. 191
As a final note, it is worth mentioning the computational problem arising when both stochastic fluctuations and spatial inhomogeneity are included in the one multiscale model. A useful approach in this case is based on coarse-grained methods. For instance, in Wylie et al., 192 the authors present an algorithm for the simulation of reaction-diffusion kinetics along with coarse-grained fields described by (stochastic or deterministic) partial differential equations, to model cell signaling dynamics under the influence of external fields.
General-purpose integration methods
The development of a multiscale model requires special care in, for instance, consideration of the involved time scales. Generally, lower-level processes develop on a time scale that is smaller than those on upper-level processes. Usually, low-level events are then considered to happen instantaneously, thus they are embedded as some kind of field at the upper levels. 193 When joining different models of processes occurring at separate scales, it is tempting to merely mix existing software modules with one another. However, such an approach fails to consider how inaccuracies of the variables at one level propagate to the model at another level.
A more precise approach, instead, would consider the whole model as unitary rather than the arrangement of smaller ones. Take, for instance, a microscopic cellular-level simulator that is coupled with a model of some signaling pathway; specifically, the phenotypic differentiation process of T lymphocytes into Th1, Th2, Treg, and Th17 is modeled at a cellular level by using individual agents whereas the gene regulations are modeled as a system of differential equations where the activation level of each gene depends on the activation level of each involved gene, and on the parameters relative to the network topology. 185 In this example, the lower-level description of gene regulation is controlled by the time step involved in the numerical resolution of the ODEs, whereas the cellular differentiation process implemented at the higher level for each lymphocyte is ruled by the information coming from the gene expression levels, and therefore follows an evolution that is loosely coupled with the former. The main justification of this adoption is that the two processes progress on quite different time scales.
Multiscale mechanics models
Although summarizing the state of the art in modeling the mechanical behavior of biological tissues is not the main focus of the review article, this side topic of systems medicine should not go unmentioned. In particular, it should be emphasized that, from a historical point of view, the concept of multiscale modeling was introduced in the field of continuum mechanics quite early, aiming at the estimation of the effective properties of hierarchically organized materials. The methods that have been developed for that purpose include (but are not limited to) continuum micromechanics,194,195 periodic homogenization, 196 and purely numerical approaches197,198 (see also Refs.199–201 for more in-depth reviews).
A wide range of different kinds of mechanical behavior of classical engineering materials and biological tissues have been investigated, whereas the fundamental concept of continuum micromechanics has turned out to be applicable and adaptable to a wide range of different kinds of mechanical behavior, such as elasticity,202,203 strength, 204 viscoelasticity,205,206 poroelasticity,207,208 and interface mechanics. 209 As revealed in the pioneering contribution by Dormieux and Kondo, 210 the concept of continuum micromechanics can be analogously applied to transport processes, such as diffusion, 211 or Darcy-type advection.212,213
The modeling concepts introduced earlier are particularly well suited for the field of systems medicine, given that many (if not most) biological processes are, in one way or another, driven and/or excited by solid or fluid mechanical stimuli. Fortunately, the intrinsic hierarchical structure of multiscale mechanics models allows their integration with the multiscale biological models that are dealt with in the From Uni- to Multiscale Modeling Strategies Applicable to Biological Properties and Functions section. See Scheiner et al., 214 Pastrama et al. 215 for related examples in the field of bone remodeling.
Future challenges
The more data are acquired, the higher complexity in systems can be studied. This is expected to give rise to an expansion of the modeling landscape for mathematical models, covering the entire spectrum of deterministic and stochastic or mechanistic and empirical models. More recent multiscale modeling techniques such as deep learning implicitly account for interactions but still impose challenges toward interpretation and using knowledge about mechanistic processes. 216
At the same time, older modeling techniques can be combined and/or readapted to accommodate emerging data analysis needs imposed by system viewpoints, leading to novel modeling frameworks. One example is clustering, which can be seen as a type of modeling as well. Here, systems data can be used to identify similarity between samples from which patterns can be derived or from which hypotheses can be formulated regarding common mechanistic processes. No longer depending on metrics or geodesics as is done in cluster analysis, topological data analysis (TDA) 217 aims at applying principles of topology to analyze high-dimensional data that can be incomplete or exhibit varying levels of noise. Recent developments toward accommodating tons of data or decomposing highly complex data spaces combine TDA with statistical and machine learning (e.g., AYASDI white paper, https://s3.amazonaws.com/cdn. ayasdi.com/wp-content/uploads/2018/11/12131418/TDA-Based-Approaches-to-Deep-Learning.pdf).
In studying complex biological events, it is essential to frame into an integrated view the diverse mechanisms enacted and the causal connections amid different elements composing the system.145,218 The definite feeling in the field is that much work is still to be done for the translation of mathematical theories, models, and practices to the fields of physiology and biology.219–222
A crucial unsolved problem concerns the lack of a theoretical framework to cope with the proper representation of the dynamical behaviors and coupling of a high-dimensional model of a lower scale with a low-dimensional model of a higher scale, so that the coupled model can be utilized to analyze higher-scale, complex events. 155 The ultimate aim of MSM is not only to provide models at different scales but, indeed, also to tie them in a coherent way so that fine-grained data from a lower scale can be coherently incorporated into the higher-scale coarse-grained model. Of course, the use of diverse modeling methods brings “breaches” among levels.
Multiscale modeling, thus, necessarily aims at addressing and solving the difficulty of bridging such gaps arising from the use of different approaches at different scales. It is not simple to face the issue and to accomplish this aim, but empirical approaches and principles can help. Studying the immune system and related diseases, several multiscale models have been built while making use of agents to represent the mesoscopic level of cells (e.g., the multicellular rule-based modeling in Chavali et al. 223 ) and employing ordinary and partial differential equations to describe the molecular intracellular and extracellular (e.g., cytokine diffusion) events. In such examples, level coupling is carried out in a forthright manner by exploiting concentrations as input variables for the agents representing cells.
Although the representation of intracellular processes can be executed in many ways (e.g., generic decision systems such as Boolean networks or others) without openly counting the space variable for computational reasons (but also for the sake of simplicity), the processes of cytokine diffusion or cell movements among anatomical sections, for example, are typical spatial phenomena. Such processes may be represented as continuous (e.g., by partial differential equation) or as discrete (e.g., lattice gas), and computational efficiency represents a key limiting factor. One of the main aims of computational systems biology is to account for a holistic perspective and use both modeling and experiments to disclose how the system performs.145,224 Multiscale models that are suitable to exploit data at different levels coming from both lab and clinical data have the potential to close knowledge gaps among observations at the molecular and gene level and clinical development of complex pathologies. 184
Concluding Remarks
Mathematical modeling is just one of the tools in medical research. In the age of fast data growth, mathematical modeling becomes a way to delegate the analysis of data to the computer and reduce the amount of expensive and sometimes even impossible medical experiments. The adequate selection of a modeling method can save time, money, and other resources.
The choice of modeling approaches heavily depends on the scientific question, the features of the system of interest, and available data. A big amount of detailed data does not necessary mean that very detailed modeling methods have to be used if the scientific question does not request a detailed answer. A simple modeling formalism might be sufficient and adequate if just a possible specific scenario is analyzed. It might be necessary to find just one reason (e.g., thermodynamics of one reaction in a metabolic pathway, exceeding of toxicity concentration by a single metabolite, insufficient surface of a cell, lack of energy) as to why a particular scenario is impossible and the question would be solved without a big and extensive effort.
In case of a detailed study, it might be necessary to go through several modeling approaches and change them if more data become available or the scientific question becomes more detailed. One can start with one method, such as the black-box (machine learning, artificial intelligence) method, to clarify the most influential input/output parameters and seek the elements most closely related to them to initiate a mechanistic modeling attempt with a method that does not request much data, but facilitates experimental planning to clarify the elements involved in the process. Later, it might be useful to switch to a more detailed mechanistic modeling approach that looks for drug targets or simulations of particular therapies.
Footnotes
Acknowledgments
Author Disclosure Statement
No competing financial interests exist.
Funding Information
This study was financially supported by European Cooperation in Science and Technology (COST) Action CA15120 OpenMultiMed and the project “Sustainable use of nature resources in the context of climate changes” no. ZD2016/AZ03.
