Abstract
Background:
Frequently occurring in cancer are the aberrant alterations of regulatory onco-metabolites, various oncogenes/epigenetic stochasticity, and suppressor genes, as well as the deficient mismatch repair mechanism, chronic inflammation, or those deviations belonging to the other cancer characteristics. How these aberrations that evolve overtime determine the global phenotype of malignant tumors remains to be completely understood. Dynamic analysis may have potential to reveal the mechanism of carcinogenesis and can offer new therapeutic intervention.
Aims:
We introduce simplified mathematical tools to model serial quantitative data of cancer biomarkers. We also highlight an introductory overview of mathematical tools and models as they apply from the viewpoint of known cancer features.
Methods:
Mathematical modeling of potentially actionable genomic products and how they proceed overtime during tumorigenesis are explored. This report is intended to be instinctive without being overly technical.
Results:
To date, many mathematical models of the common features of cancer have been developed. However, the dynamic of integrated heterogeneous processes and their cross talks related to carcinogenesis remains to be resolved.
Conclusions:
In cancer research, outlining mathematical modeling of experimentally obtained data snapshots of molecular species may provide insights into a better understanding of the multiple biochemical circuits. Recent discoveries have provided support for the existence of complex cancer progression in dynamics that span from a simple 1-dimensional deterministic system to a stochastic (ie, probabilistic) or to an oscillatory and multistable networks. Further research in mathematical modeling of cancer progression, based on the evolving molecular kinetics (time series), could inform a specific and a predictive behavior about the global systems biology of vulnerable tumor cells in their earlier stages of oncogenesis. On this footing, new preventive measures and anticancer therapy could then be constructed.
Introduction
The pathogenesis of malignant transformation that leads to an inexorably progressing disease is intricately connected to the abnormal integration of several biological networks.1–3 The objective analysis of the holistic approach in cancer research is to obtain a global quantitative description of all key interactions within the cell to ultimately predict how and why cells function the way they do in response to stressors such as hypoxia, DNA damage, retinoblastoma protein degradation, and oncogene activation or in response to external factors such as nutrient or oxygen deprivation. 4
As we came to realize that the initial stage of tumor development follows a complex and multistep process that proceeds sequentially until a full-blown malignant phenotype is achieved and maintained, this begs the questions “what to do with and how to tie together this expanded type of data”?5–7
Fortunately, most of the tumors have a limited diversity in the key genes of cancer development and progression. Furthermore, functional consequences of cancer tissue heterogeneity may be minor. 8
The aim of this report is a modest interpretive synopsis pertaining on how to model the dynamic behavior of interacting biological species in cancer.
Materials and Methods
We begin this review with some mathematical definitions. We then focus on mathematical tools to model only sequential quantitative data set measurements overtime of those biochemical elements to systematically study the determinants of early cancer. This may prove useful in outlining new or emergent therapeutic prospects.7,9
Important cancer biomarkers to consider when modeling parameters of interest
There are many possible underlying dysregulated signaling pathways in cancer.2,3
However, most of the components fall into the following broad categories that warrant consideration. There are those related to ligands/receptors and oncogenes such as PI3k/AKT, TGF, MAPK, P53, RB, CDK, CYCLIN, TKR, NF-κB, miRNA, mTOR complex, P16, TERT, NOTCH, WNT, PTEN, and HEDGEHOG. 2 There are those related to the genes determining pluripotency such as c-MYC, OCT4, SOX2, NANOG, and KLF4. Expressions of those relevant molecular “populations” in these pathways are themselves under the control of multiple feedback loops and under tight homeostatic control and cross talk.3,7
Finally, there are those related to other core circuit systems and comprise the redox state such as cytosolic NADH/NAD+ ratio, the mitochondrial electromechanical potential and mitochondrial transport chain, the energy charge inside the cell (ATP+1/2 ADP/ ATP+ADP+AMP ratio), and the reactive oxygen species (ROS) scavenger NADPH/NADP+ ratio.3,10
In this report, interaction with environmental factors and other as yet unknown processes will not be discussed here. 11
Results
Facing complex biological data in cancer research
A mathematical model relates the dependent variables (such as a population growth or a particular molecular concentration growth) by means of a mathematical equation demonstrating the output cell response for a given input. Through molecular growth, we refer to a time series of concentration growth overtime of proteins, messenger RNAs (mRNAs), or other biomolecules all of which were cited above. Moreover, the model’s objective is to quantify the system’s behavior by the parameters that affect the process dynamics.
Steady state or equilibrium
Actual biological systems do not always exist in equilibrium because they contain many negative feedback loops by which the system’s action is based on some previous state and not on the current state. Feedbacks are necessary so that the system may assume a resting and an active state. Because of the consequent time-delayed effects, this situation is likely to cause the system to miss or overshoot its goal either before the equilibrium target or before the “carrying capacity” of the system is reached (see infra). Studying equilibrium of a biologic process is important not because equilibrium reflects reality but because it is an important reference point for analyzing the final behavior of the model. The pure behavior of the model is then obtained by “shocking” the system out of equilibrium and knowing how the system responds to sudden input using a “step function.” Equilibrium is stable if it does not change following this small but swift disturbance. A step function is a process that, when forced on the input, the output changes from zero and achieves a higher steady state in an instant. A steady-state level of any biomolecule is important for its optimal function. Computer programs can be set up to simulate the step function.
In mathematics, a function is a pairing of any set of inputs observed by changing the independent variables with their corresponding output responses. Functions can be linear or nonlinear, continuous or discrete. A function is continuous when its graph is a single unbroken curve. Polynomial functions, trigonometric, exponentials, logarithmic, and root functions are all continuous functions at every number in their domains. 12
For an exponential input, of first-order or nth-order polynomial, the response will also be an exponential almost every time with the exception when there is resonance or near resonance between input and output frequencies. 13
A graph is the best way to represent a function in which one element depends on another. Power functions and biological oscillations are typically nonlinear.
Cancer biology worlds are also nonlinear and are often discontinuous. In a discontinuous function, the rate of change at any point cannot be determined no matter how much we zoom in on that particular point. Linear systems exhibit growth, decay, equilibrium, or oscillations. A step function that is generated by a switch-like interaction can also be used to approximate the discontinuous function by a continuous one. 14
Critical or fixed points
Fixed or critical points represent equilibria or steady states. Fixed points can be stable or unstable and only stable fixed point represents a stable equilibrium. It is also fundamental that one determines what is happening near the neighborhood of the critical points to estimate robustness as well as finding the limits of a given system’s parameters. The critical points are found at the intersection of the graph, and either the 45° straight diagonal line of zero population growth or the identity coordinate y (output) = x (input), where the function’s rate of change is 0 corresponding to constant operating conditions. 12 In other words, the diagonal line is where inputs get converted to outputs and vice versa.
Evaluating what happens near the fixed points is equivalent to a linearization technique at those points to determine the stability. If the system that begins close to the equilibrium approaches the steady state converges in a finite time reaching a fixed point, it is then stable; if it diverges from the fixed points and it grows indefinitely as time increases, it is then unstable, meaning it “approaches” infinity.12,15 Cancer is considered to be an unstable process and by definition is neither in a steady state nor in equilibrium. Any dynamic system can settle down to equilibrium, keep repeating in periods, or become chaotic. 16 In contrast to oscillations, chaos never settles down to a periodic state and depends totally on the initial conditions.
For nonlinear dynamics in discrete time, it is simple to check for stability by linearization of the model locally, ie, when the time interval becomes shorter or using nonlinear ordinary differential equation (ODE). After linearizing the function, its transform is then computed (to eliminate dependence on time) and the result is just an easier algebraic function of the output. The output is then plotted using a graphing calculator. However, when we are dealing with too many interdependent nonlinear partial derivative equations, the disadvantage of linearization is that it becomes mathematically intractable and computationally prohibitive to implement. 17
Carrying capacity
In nonlinear systems, when expressing the time evolution of molecular species of interest, the logistic model is used to predict the maximum growth limit in the long run. The simplest example of nonlinear population growth is the logistic equation or the sigmoidal stretched out “s” curve similar to the Hill function. Here, the molecular growth moves slowly, and then rapidly, before slowing down and tapering off through a negative feedback interaction. A critical stage in the evaluation of this function is to establish the upper value or the final steady state of the population size. The tapering off upper value is called the population’s “carrying capacity.”
When the molecular population is above the carrying capacity, it decreases, and when it is below the carrying capacity, it increases. In the logistic molecular growth model, the fixed points—here it means the final carrying capacity or the dependent variable—may switch stability as the parameter of growth rate (the independent variable) varies depending on the changes of external feedback factors. A more practical method to solve this type of growth capacity that is drifting in values is nondimensionalization of parameters. This is based on the fact that when parameters become dimensionless, the analysis is simplified because the parameters are reduced to either unity or less than unity. Nondimensionalization is obtained by dividing each parameter’s value by the average of the total sum values or by their steady-state parameters value.
For future predictions of the maximum growth, applying the discrete logistic model is preferable to continuous exponential functions when there might be periodicity, sequences in the process, or when molecular successive generations do not overlap as time passes. When time is thus inherently discrete and consequently the carrying capacity becomes a drifting parameter, a more reliable model called the logistic difference equation should be considered.14,15,17–19 The difference equations can easily be simulated on digital computers. Thus, the logistic function treats time as continuous, whereas the difference equation looks at discrete time steps.
Multistability
The presence of positive feedback loop can induce multiple steady states. Example of 2 possible equilibria or bistability is when considering a system composed of 2 genes that express transcription factors regulating each other, but in a negative manner. 20 A bistable system that can be constantly switched between 2 steady states following a transit input perturbation is called a toggle switch. 20 Bistability of regulatory circuits occurs when the rate of positively autoregulated gene product, by means of positive feedback loops, is strong compared with its degradation rate. MAPK is an example of a bistable signal in which the feedback loop enhances the robustness of 2 stable output states in response to a stimulus. 21 Moreover, on interconnection, such as in gene circuit, where the behavior of downstream system significantly affects the behavior of the upstream system, multistability may occur. The generally accepted reason for this behavior is that the activated state ignores random small changes of signals (noise). If there are environmental heavy changes, some phenotypes will keep on ensuring survival. Furthermore, bistability generally widens the distribution of protein concentration per cell in which the distribution of proteins becomes bimodal. 22 Multistability includes an additional intermediate state called the hybrid state.
Multistable nonlinear dynamics
This is a system with multiple state equilibria in which a small variation leads to diverse steady states. Multistable systems cannot be described by a deterministic continuous equation but by a discrete stochastic modeling that evolves according to discrete jump Markov process. This leads to a probabilistic description of the system dynamics, ie, the likelihood of states and output variables is random and whose properties (mean, standard deviation) are governed by a probability density function (PDF). 17
In a nonlinear dynamic system, there are interactions between its elements—part of the output has a feedback loop into the input. Unlike a time-independent Markovian process, the nonlinear dynamic is recursive and is time-dependent on previous events. 23 Nonlinear dynamic inputs of numerical data can be transformed to a reduced-order model which is derived using dynamic mode decomposition (DMD) algorithm. Dynamic mode decomposition extracts from flow of data only dynamically relevant or dominant features that contribute to the overall evolution of sequence.24,25 Dynamic mode decomposition is a dimensionality reduction technique and is data driven where only snapshots and a set collection of time series of state measurements are needed. 24
Stochastic systems
In biology, a stochastic system is a dynamic biological system that displays a noisy behavior possibly because of random fluctuation of all or parts of its components,26-28 for example, the stochastic noise related to production and degradation of RNA molecules and the stochasticity from association and from dissociation of the transcription factors in the gene regulatory networks. Because of the stochastic noise, repeated experimental data do lead to different results. Because the stochastic system is a Markov process, several update functions need to be defined to formulate the system.28,29 Based on the idea that a Markov process is aperiodic and irreducible, its probability distribution will end up in a stationary state in the long run. Its probability distribution can be approximated in a computationally favored way using a likelihood-based Markov chain Monte Carlo (MCMC) method. This is a repeated random sampling method from a uniform probability distribution (0, 1). The stable stationary state of Markov process allows the identification of RNA molecular PDF and by which deterministic approximation of the process is valid. 29
Stochastic systems may have a deterministic linear component that is due either to an external event such as environmental changes or to internal fluctuations inherent to random binding and unbinding of molecular elements. One important indication of stochastic behavior is bimodality or multiequilibria.27,29
As alluded to above, ODE cannot be applied to solve stochastic process which is usually discrete. Stochastic process can be otherwise described only using stochastic calculus in which the random function is approximated by dividing it into infinite numbers of small time steps which are then linear.
The uncertainty or the stochastic effect perturbing the system is generally assumed to have a Gaussian or normal PDF of mean zero and variance σ2, similar to continuous version of a random walk or to the dynamics of diffusion-based phenomena. Solving the random function, ie, stochastic part is available in MATLAB (Mathworks.com) which provides a random number generated from a normal PDF (0, 1), ie, of mean 0 and variance of 1. After many runs simulating the stochastic process numerous times, the average of the runs will converge to the deterministic solution.
For the time evolution of a large ensemble of stochastic systems and averaging out their stochastic effects, a “master equation” is applied. 27 This equation, which is a system of ODEs, expresses the dynamics of a population fraction in a particular state that jumps to other state. Because the probability of the next transition depends on the current state only, the master equation is considered a continuous-time discrete-space Markov chain.
In the simplest cases, the master equation is analytically solvable. For larger models, more computationally efficient simulations of growth have been developed. The MCMC types of simulations are used that produce a random walk through the possible states of the system.
A stochastic simulation algorithm is also available to describe each particular trajectory of individual stochastic element separately. 30 The reason for stochasticity is due to the presence of many independent parts interacting with feedback control elements, cooperatively or competitively that create robustness, bistability, or oscillating runaway behavior.28,31–33
Robustness
Because there is often redundancy in biological systems, any small alteration of system’s variables even when part of particular pathways is removed, the system can still maintain its performance and ignores these disturbances. This resiliency to large perturbation is interpreted as robustness,34–36 which is closely related to the notion of Claude Bernard’s homeostasis. To test the robustness of findings, sensitivity and specificity analyses should be conducted. In cancer, the parameters are not evolutionary fine-tuned; therefore, malignant cells maintain their robustness by multiple synergistic interactions as we will read later.
Hill function
Based on the law of mass action of chemical kinetics, the reaction rate is proportional to the product of the reactants’ concentration. The kinetics of the equation is described by parameters called rate constants. In this situation, the metabolic rate is constant at high substrate concentration, thereby equivalent to a steady state. However, often enough enzymes have more than one binding site and their affinity changes. Positive cooperativity among macromolecules in enzymatic reaction at a steady state can be described by a sigmoid Hill function, which is a wide-spread function in modeling nonlinear responses, and in particular the regulatory pathways. Nonlinear relation can be log-transformed into a linear function.
Gene transcription networks are usually viewed as interconnected transcriptional components. An example for the use of a Hill function is relating the input of the production rate of a gene to the transcription factor concentration. While the Hill function makes the response faster, interconnectivity has an opposite effect.
Oscillating functions
Oscillations are common dynamic behaviors because they determine the most needed information of “delays” in the biological world of networks.36,37
Examples of oscillations in time are the TNF gene expression, oscillations in NF-κB (nuclear factor κB), and p53 signaling, and hormone secretion. 38 These systems are inherently nonlinear and exhibit varying behavior over time.
Negative feedback loops affect an oscillatory behavior. Positive feedback loops produce robustness and facilitate bistability.35,36
Oscillation means that if the system is perturbed slightly, it always returns to its periodic orbit. Cardiac rhythm, circadian rhythm, hormone secretions, and certain biochemical reactions oscillate spontaneously. Another example is glycolysis, a biochemical metabolic process that proceeds in oscillatory pattern.
If there is stochastic transcription of mRNA, stochastic synthesis of proteins, and in addition there is stochastic degradation of the same elements, the global behavior leads to a nonlinear dynamical system. Such example involves parameters that govern the degradation rates of both proteins and mRNA. To estimate the stable stationary-state solution and to solve these states, the response is transformed. At any fixed time, any periodic function with interacting parameters can be transformed into a sum of harmonics or frequencies with amplitudes or modes depending on the time at which the snapshots were taken. Most of the higher order exponential terms are ignored. Thus, the random distribution of inputs will end up with a simple system and lower dimension ODEs with a corresponding real solution.15,26
Nonequilibrium systems
A random type of motion, termed Brownian motion of a small macroscopic particle due to collisions with other molecules of the fluid when immersed in a liquid, may be applied to approximate the dynamics of nonequilibrium systems. Based on the central limit theorem, the equilibrium PDFs of the fluctuating stochastic noise or “error” obey a Gaussian PDF. 15
As alluded to earlier and because this process is discrete, it cannot be linearized and ODE cannot be applied. A statistical description by probabilistic manner is used instead. A computer-based MCMC simulation then determines the time evolution of the drift’s PDFs from the deterministic trajectory, thus removing the uncertainty of the probabilistic component of the random variable’s movement.4,18
Discerning chaos and chaotic system’s prediction
In a simple linear or periodic system, when taken in isolation, its future behavior is well determined regardless of the initial starting points. In contrast, when that system is acted on by multiple dynamic and nondeterministic components (⩾3 dimensions), different long-term trajectories will occur. 16 In this higher dimension, the motion becomes chaotic. Overall, chaos is not equivalent to randomness because early on it is predictable (deterministic) but within a “time horizon” after which the system becomes effectively unpredictable. 16 Examples of chaotic systems are the movement of the galaxies, the weather, and in biological context, tumor cells; each system has its own time horizon that spans from days (for weather prediction) to millions of years (for galaxies).
After a transient time, the system settles into an irregular self-similar oscillation that persists without ever repeating and becomes aperiodic. 16 This view was labeled as Panta Rhei—meaning everything flows or “all entities move and nothing remains still”—by Heraclitus more than 2500 years ago. We now call this condition “fractal.”
The distance or the separation between the trajectories grows exponentially fast, on average. Their rate of separation is the property of the system. Their distances follow a simple rule and can be quantified as a function of time. Because it is exponential, the logarithm of the distance is approximated by a trending line that is growing in time. The system can also compound or be damped nonlinearly overtime and self-similar without ever actually repeating. 16 For very complicated nonlinear dynamics with 1 degree of freedom and in which time is discrete, the difference equation can be applied at least early on in the deterministic transient period of the system. 16 Chaos then can have a simplistic mathematical model to represent it, and its dynamics is then defined by a recursive iterated steps. The population level at any given time step depends on the growth rate parameter and the previous time step’s population level.39–41 Thus, iterating the process many times by changing the growth rate parameter allows us to visualize the global behavior at a glance in a meaningful way.
You have gathered the experimental biological data in a series of snapshots, now what?
In this section, we provide the broad overview for modeling time-evolution analysis that can be applied to describe the behavior of a complex process such as tumorigenesis. Before analyzing the temporal sequential data points, the data should initially be normalized by either log-transforming or by dividing each value by their mean or by their steady-state value across total time interval.
For the analysis of high-dimensional time evolution data analysis, the available steps are given in the following sections.
Intuitively, visualize the data and make a rough sketch of scatter plot of each time point in the data set.12,14 The next step is to graph the time series of the independent values to the dependent ones, and then check what the function looks like using computer programs. The packages MATLAB, Pylab, and Mathematica have such programs. Is the graph linear or polynomial (quadratic, cubic, or quartic, or higher order)? Is it periodic trigonometric, or is it power or exponential function? The exponential function grows much faster than the power function and is of interest because it occurs very frequently in mathematical models of nature. Finally, is there a need to reflect, shift, stretch, or compress the function to obtain the desired graph?
Ordinary least square regression
We build and graph the ordinary least square (OLS) that allows us to deduce the model with the use of a graphing computer, and the best fitting polynomial is chosen. Now the question is, “How tight is the fit?” To get the perfect fit, we use the coefficient of determination or R2 value for test data. This coefficient is dimensionless and scale independent with a range from 0 (worst) to 1 (best). The higher range of this coefficient means that the model fits tightly and that it explains most the variability in the estimates. So we select a model that has the highest R2 value.
Bayesian computation of large data
This approach seeks to infer the parameters from the available observations. Bayesian classification can be used to classify the function (output) of variable attributes, numeric or categorical, based on the highest computer probability (likelihood) of the class that explains the given observations. The maximum likelihood is then chosen that explains the single best parameter which maximizes the observed data. Because writing the posterior distribution analytically becomes intractable in these cases, it will be necessary to use computational approaches, such as MCMC techniques which will be outlined later.42,43 For instance, it is an approach that is suitable to address differential gene expression. 23 In the inferential statistics, the biggest impediment is that the prior or the apriori likelihood of a class (ie, without knowing the current attributes) remains unidentified. The other hindrance is that Bayesian classifier cannot differentiate classes if the attributes are correlated (ie, multicollinearity) with similar distribution parameters’ PDFs.
Solving the problem of multicollinearity
When we are dealing with too many variables, which is often the case in cancer research, multicollinearity may be a problem. Multicollinearity means that a large set of potential covariates exist, and that those variables are highly correlated with high variance between samples and with completely different values of the regression coefficients. In an ideal data set, we need high variations between the elements, meaning the diagonals of the data regression matrix, but we want small covariations, meaning all off-diagonal entries of the matrix. In that situation, neither OLS nor t test should be used.
What do we do if there is a problem with multicollinearity? The standard approach is to eliminate the variables that are closely correlated with the other variables and do not include them in the model.
Ridge regression is the preferred method, however, and more importantly, it is only used as diagnostic for multicollinearity. Here, we inflate the variances of the diagonals’ matrix by a small “bias factor,” less than 1 to match the covariance. If the coefficients of the independent variables change in magnitude or if they flip in sign when compared with those in the original OLS, then we chose the smallest amount of bias factor that keeps the slope estimates relatively stable.
The second method is using the principal component analysis (PCA) in which each sample is displayed and graphed. The distances (variability) between data points are represented on the coordinates in this graph. The largest distances provide information about the network’s dominant properties. Unfortunately, these variabilities are completely decorrelated from each other, making this method too difficult to interpret, and the correlation with the original data becomes meaningless. Of note, in light of the fact that signaling networks are dynamic and nonlinear due to multiple feedback loops and that some are oscillatory, PCA-based methods cannot be applied in these sinusoidal mode behaviors.
The third method is “Centering.” This is done by rescaling, ie, by subtracting each dependent variable or attribute from the mean. Its disadvantage is that it applies only when we have very few interaction terms.
Finally, if the number of relevant variables surpasses the number of available observations, then the least absolute shrinkage and selection operator (LASSO) regression should be used to achieve sparsity. 44 The LASSO further reduces the coefficients of regression by eliminating the ones with zero values and keeping the nonzero ones thus achieving sparsity and therefore there will be fewer features to keep in the model.
In silico validation
All models need to be validated before applying them for long-term prediction.43,45
Due to the presence of noise (or error) in estimating the parameters, we cross-validate to determine whether the model can be applied to an independent data set in the real world. This is done by splitting the data into a training data set and a testing data set. The next step is to choose the set with the lowest error. Then, the simplest model possible not only accounts for the data at hand but also predicts new data effectively. Other ideas based on cross-validation include use of resampling to directly estimate the prediction error and leave-one-out cross-validation when the data set is not too large.
MCMC technique and simulation
The MCMC is actually similar to a random walk that models each state at each time point as a linear process that proceeds in steps. When the next step occurs, the process can be in the same state or moves to another state. All movements between states are represented by a probability and the sum of each state’s probability must add up to one. This model will find the probability of being in the final steady state many steps into the process. 4 The time that each process spent in each state before it jumps to the next state has an exponentially distributed waiting time or a Poisson distribution, for example, reading off the DNA template by RNA polymerase.
The results from the analysis that is based on only few observations can be highly subjective because the analysis could have missed other possibilities. For the inquiry to be more reliable or more accurate to simulate reality, we must figure out the PDFs of each unknown feature. A more precise analysis is done by computer simulation in which the computer generates, for each unknown factor, a random number that is between 0 and 1. The simulation applies each random number to each unknown factor’s PDF and finds the value corresponding to the random number PDF. The process is repeated to predict the final distribution. Running numerous iterations including random number sampling to simulate a noise model can be applied. This method leaves us better informed about the final decision because it is based on all possible outcomes and not only restricted to a very few.
Applications of Basic Tools in Computational Modeling of Tumorigenesis
Quantitative modeling of complex gene regulation in eukaryote from proteomic data
A unifying model of carcinogenesis implicates the induction of genetic instability (somatic mutations), inflammation, and the release of microenvironmental forces. Specifically, these are the regions of intermittent and chronic hypoxia, ROS, and the acidotic milieu generated by active glycolysis. The synergistic action of those processes feed backs DNA genotoxicity and increases the accumulation of further mutations, inflammation, immune suppression, and tumor growth. Furthermore, cell death response is inhibited, and more hypoxia-induced glycolysis and acidosis accumulate. 46 The dynamics of somatic mutations can be modeled as the main drivers of carcinogenesis. 47 Once mutation rate of a particular gene has been known, the alteration likelihood of cardinal signaling pathway can then be defined. 48 These signaling pathways are involved in balancing energy requirement, survival, cell cycle progression through the restriction point, and the biomass formation.
The transition between on-and-off gene activities is considered a stochastic multistep process. This switch depends on a set of sequentially assembled factors for histone modifications. Its description includes a specific Markovian chain. Assuming that the lifetime of mRNAs is short compared with that of proteins, it can be integrated out; then, the burst size is geometrically distributed and the initiation waiting time of the on-state is exponential. As observed experimentally available from single cell data, proteins are produced in random bursts resulting in a gamma-distributed steady-state protein concentration and exponentially distributed number of protein molecules per burst.
Gamma distribution in this context depends on 2 parameters: the burst size and the mean number of proteins produced per burst. Analytical solution to stochasticity in gene expression can then be linked to the steady-state distribution of a given protein concentration in a cell population. 26
Computational analysis of nonlinear oncogenic and inflammatory signals
To identify the key component of a biological system, a basic numerical method is used which approximates the curve function with a series of slopes (rate of change of the reaction) to monitor a species concentration over time. 21 There are other more advanced numerical integration of complex reactions with Michaelis-Menten kinetics and comprising multiple biochemical species that concurrently solves all ODEs of the system’s attributes. 49
The binding of ligand to receptors such as the insulinlike growth factor or to the epidermal growth factor allows the activation of various signaling pathways such as the PI3K-AKT, EGF, and MAPK/ERK pathway. MAPK maintains energy balance at both the cellular level and whole body level. It monitors the cellular ratios of AMP/ATP and ADP/ATP.
Mammalian target of rapamycin complex 1 (mTORC1) is a serine/threonine kinase that promotes ribosomal protein synthesis and controls cellular growth by monitoring nutrient availability, cellular energy status, oxygen levels, and mitogenic signals. Activation of mTORC1 is achieved through phosphatidylinositol 3-kinase and AKT. Aberrant activation of the mTORC1 pathway has been implicated in a variety of cancers. The E2F-RB pathway controls the G1-to-S-phase transition through the restriction point by the timely activation of genes involved in DNA synthesis and cell cycle control. Experimental evidence, based on regression-type analyses, show that E2F-RB pathway is a nonlinear bistable switch. 50 The switch is generated by 2 positive feedback loops. As we learned this notion before, bimodularity—meaning high and low different concentrations in different cells—is produced by stochastic fluctuations of protein concentrations. Activation by growth factors stimulation relieves inhibition of the transcription factor E2F by RB and allows the migration of E2F through a cascade of events leading to the activation of multiple target genes and cellular proliferation. The most common genetically mutated pathways in human malignancies are the E2F-RB pathway and the p53 tumor suppressor pathway.
Inflammation involves multiple signaling pathways. When applied to time series data in the blood, PCA can identify the critical drivers of inflammatory phenotype that have most influence on cancer progression. The transcription factor NF-κB is particularly important in the regulation of inflammation, apoptosis, and proliferation in cancer cells. Inducers of NF-κB include inflammatory cytokines, growth factors, lipopolysaccharide, oxidative stress, and viral products. Experimental and computer simulation implicated NF-κB in the control of TNF-induced apoptosis of the inflamed tissues.
Monte Carlo strategy, using sampling to generate random sets of parameter values for simulations, then recapitulates in silico key feature of the inflammatory mediators’ dynamic. Computationally, this involves the concentration for each component and the kinetic rates for interactions to be determined. Data quantification of time series parameters is essential for prediction accuracy, and nonlinear ODE of biochemical pathways is then applied for that purpose.
Dynamic Bayesian networks are great frameworks to infer such large network interactions by determining how well the network can explain observed data. For pairwise nonsequential sampling of nonlinear systems (contained in the dependent and independent variables), DMD provides the best-fit linear approximation. In this method, computation can still identify the relationship between future and past values of time series data taken from multiple runs of an experiment to reduce the noise.24,25,51 Dynamic mode decomposition is similar to the difference equation in which the recursive sampling for the data migrates the values of a parameter to the origin becoming sufficiently stationary. Dynamic mode decomposition is used to lower dimensions for dynamical systems with irregular time interval by collecting only few snapshots.
Ordinary least square and ridge regression both have drawbacks based on low predictive accuracy. Furthermore, they should be used only in linear dynamics. If moderate-sized effects in a large biologic data, Lasso is best. 44
Computational modeling of metabolic energetics and the immune response
In all living organisms, metabolites are derived from only 12 precursor metabolites such as glucose-6-P, ribose-5-P, fructose-6-P, erythrose-4-P, glyceraldehyde-3-P, oxaloacetate, 3-P-glycerate, P-enolpyruvate, 2-oxyglutarate, pyruvate, succinyl-CoA, and acetyl-CoA. Their biosynthesis requires the use of 3 pairs of cofactors such as ATP/ADP, NADH/NAD, and NADPH/NADP. In addition, there are approximately 50 building blocks which need to be included in any mathematical model chosen. 52
The enzyme pyruvate kinase (PMK) isoform 2 catalyzes the crucial step in glycolysis, thus facilitating the switch from oxidative phosphorylation to glucose metabolism and generating ATPs in cancer cells (Warburg effect). Nuclear translocation of PMK2 reprogramming is both activated by and activates hypoxia-inducible factor 1–dependent transcription. Both phosphofructokinase and PMK2 have been shown to generate glucose metabolic transition mediated by lactate and AKT.
Certain enzymes play critical roles in the metabolism of cancer. ALDH3B1 acts against oxidative stress derived from lipid peroxidation. IDO1 is involved in tryptophan metabolism and immune suppression. 3
The citric acid cycle is catalyzed by 8 enzymes of which α-ketoglutarate dehydrogenase has the lowest activity and thus is considered the rate-limiting enzyme in the cycle. Computational modeling of the mitochondrial respiratory and energy metabolism encompasses integrating a series of ODEs for each state variable with respect to time. Model simulation is run till it reaches a steady state. The basic components included in the model are the reactions at electron and substrates (ATP/ADP/AMP) transport system and the transporters/antiporters of K+/H+ cations with their own parameter values. 53
Flux balance analysis is also available for dynamic simulation. Flux balance analysis is a mathematical model that uses the flow of metabolites through a stoichiometric metabolic network. 54
The overcoming of immune incursion can be described by the predator-prey iterations (logistic function type model as we noted earlier) that is simulated by a population-based model between tumor and immune cells. 54 To accommodate their energy needs, immune cells use the glycolytic pathways. The immune check point ligand present on cancer cells can suppress the immune cells check point proteins and the glycolytic signaling pathways such as AKT and mTOR. This process impairs immune cells and leads to promoting cancer progression. 55
Study of biological oscillators
Sustained oscillations of interacting proteins or genes could be generated by negative feedback loops. Many biological oscillators including p53, cell cycle, RB-E2F, NF-κB, and cardiac and circadian rhythms also have positive feedback loops. The feedback loops transform the p53 responses from periodic pulses to a single sustained pulse. The p53 has major roles in genomic stability. In response to DNA damage, p53 can show a transient response leading to cell cycle arrest and DNA repair. 54 Pulsatile responses induce cell apoptosis, and sustained responses mainly trigger senescence. The additional loops have been shown, through computational studies and tested using Monte Carlo approach, to impart some performance advantages over simple negative feedback loops. Positive-plus-negative oscillations appear to be more robust and to have better adjustable frequency without compromising output. The parameterized oscillator models were simulated over a wide range of the Hill function coefficients. 56
Epithelial-mesenchymal transition
In cancer, epithelial-mesenchymal transition (EMT) is a necessary switch from an epithelial to a mesenchymal phenotype that enhances metastatic cell dissemination. Different pathways, a variety of transcription factors and microRNAs (miRNAs), have been shown to activate EMT, including TGF-β, p53, NF-κB, WNT, Notch, EGF receptor (EGFR), and Hedgehog. Transcriptomic analysis has revealed that TGF-β is a master regulator of EMT. 57 The EGFR is most frequently altered in cancer through either overexpression or mutation. The ROS, high in hypoxic condition, leads to an enhanced production of TGF-β. Recent works focused more on computational systems biology models of EMT rather than on the kinetics of individual players in the network and have elucidated the underlying mechanisms of this transition.54,58
Epithelial-mesenchymal transition is regulated by intercalated systems of 2 positive and 2 negative feedback loops between miR-mediated reactions (including translational regulation, alternative splicing, and epigenetic modifiers) and EMT-inducing transcription factors such as ZEB1/2, SNAIL1/2, and TWIST1. Activation of EGFR induces high miR/low ZEB levels (epithelial phenotype), and TGF-β induces high ZEB/low miR levels (mesenchymal phenotype). The combination of these states can produce a multistable EMT phenotype. 54
Total integration of multi “omics” data
To acquire the expected value of a continuous variable, we integrate the nonzero values of the variable times the probability mass density (PMD). For a discrete and random variable, we use a sum instead of an integral. Getting PMD requires 2 steps: identifying the type of function by reviewing the data graph and then we look up the table of moment-generating function (MGF). The probability distribution of a random variable is uniquely determined by MGF.
Analyzing complex biological systems such as cancer comprises high-dimensional global sets. These data include stochastic genome transcriptions, epigenetic alterations, proteome, metabolome, and signaling molecules. These biological systems and their interactions maintain phenotype stability and robustness which are strongly chosen by evolution. Positive feedback enhances sensitivity, whereas negative feedback dampens diverse perturbations imposed by the environment ensuring stability for homeostasis. 59
Both normal and cancerous cells use time integral of the error which is fed back into the system. The error is the difference between the actual output and the desired steady-state output. This type of control maintains the error as low as possible despite perturbations in the input. 60
Because the examined number of independent variables is very high and the number of their associated samples is low, a proposed and easy model based on nonlinear regression can be implemented. When the system is first linearized and then its transfer function is determined, the other efficient algorithm uses an iterative approximation to estimate a solution for a system of linear equations. In a manner similar to autoregression algorithm, we use the efficient and preferred Gauss-Seidel method with an initial guess value to solve system of equations iteratively in which convergence is guaranteed if the matrix of coefficients is square and diagonally dominant. If no convergence, then we use stopping criteria in percentage. The other method is a finite difference method where the systems unknowns are solved simultaneously knowing the boundary constraints values.
Protein-protein interactions with a functional role can be selected based on their degree of centrality. 61 Network interconnections that are responsible for the behavior of the cell are dominated by a power-law degree distribution indicating that a few hubs hold the most redistribution points and enzymes are considered in their links. 62 The networks include protein-protein interactions, metabolic, signaling, and transcription-regulatory networks. Deciding which species are to be included in the evaluation of kinetic data is crucial. It should be based on both the previously accumulated experimental research in the literature and the theoretical intuitive considerations.
Conclusions and Outlook
The intricately connected genetic somatic mutations/epigenetic alterations and the environmental cues differentially affect cancer initiation and clonal expansion making cancer an intriguing disease to decipher.1–3,63 The extensive progress that has elucidated the complex biological functions necessary for cellular communications has led to improvement in our understanding related to cell cancer survival and proliferation. 3
For example, to simulate temporal dynamics of molecular networks in cancer, kinetic modeling has revealed transient, oscillatory, sustained, or toggle-switch-like systems properties among cancer regulatory networks. In these situations, it is crucial to identify the nodes that control this dynamic circuitry. These different specific adaptation states and response dynamics depend greatly on the amplitude and duration of signals as well as the different feedback adjustments.
A high-throughput biological technology has resulted in a large amount of high-dimensional data that we are only now starting to investigate cancer as an entire system.42,43
Is mathematical modeling all that is needed to simulate the real world? 63 The answer is not conclusive because interaction with environmental factors and other as yet unknown processes remains not well understood. Furthermore, it remains unclear whether what we are modeling captures the essence of how cancer cells operate. The answer is becoming increasingly distinct, in the near future. With further refinement in mathematical modeling as outlined in this review and experimental validation, it may become even pure.
Footnotes
Funding:
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests:
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Author Contributions
AKA and FC conceptualized the manuscript. AKA and FC helped in methodology of the data. AKA and FC investigated the manuscript. AKA, FC, and BB analysed the data. AKA and FC resources the data. AKA and FC wrote the original draft of the manuscript. AKA, FC, and BB contributed to the writing, review and editing of the manuscript. AKA and BB visualized the manuscript. AKA supervised the manuscript.
