Sage Journals: Discover world-class research

Abstract

While the world recovers from the COVID-19 pandemic, another outbreak of contagious disease remains the most likely future risk to public safety. Now is therefore the time to equip health authorities with effective tools to ensure they are operationally prepared for future events. We propose a direct approach to obtain reliable nearly instantaneous time-varying reproduction numbers for contagious diseases, using only the number of infected individuals as input and utilising the dynamics of the susceptible–infected–recovered (SIR) model. Our approach is based on a multivariate nonlinear regression model simultaneously assessing parameters describing the transmission and recovery rate as a function of the SIR model. Shortly after start of a pandemic, our approach enables estimation of daily reproduction numbers. It avoids numerous sources of additional variation and provides a generic tool for monitoring the instantaneous reproduction numbers. We use Norwegian COVID-19 data as case study and demonstrate that our results are well aligned with changes in the number of infected individuals and the change points following policy interventions. Our estimated reproduction numbers are notably less volatile, provide more credible short-time predictions for the number of infected individuals, and are thus clearly favorable compared with the results obtained by two other popular approaches used for monitoring a pandemic. The proposed approach contributes to increased preparedness to future pandemics of contagious diseases, as it can be used as a simple yet powerful tool to monitor the pandemics, provide short-term predictions, and thus support decision making regarding timely and targeted control measures.

INTRODUCTION

When the COVID-19 pandemic hit the world in early 2020, governments were forced to take immediate, wide-ranging actions to “flatten the curve.” The reproduction number was the key parameter for monitoring the virus spread and became an indicator for policy measures taken by authorities. As most models for its assessment were developed and refined using existing datasets from earlier pandemics, and the knowledge about the new SARS-CoV-2 virus was deficient (Muralidar et al., 2020), a more dynamic way of thinking was needed. Predictions early in the pandemic were based on scarce information sources, which were unavoidably noisy and often incomplete. Consequently, models relying on multiple sources with considerable uncertainty, both in data and in model specification, escalate the overall uncertainty considerably (Elderd et al., 2006).

In this article, we propose a methodology to estimate a nearly instantaneous time-varying reproduction numbers for contagious diseases, requiring only the number of infected individuals as input. Our approach is based on extracting the parameters in the susceptible–infected–recovered (SIR) dynamics (Hethcote, 2000) by optimal tracking of observed infectious cases through estimation of a multivariate nonlinear regression model. The SIR model includes two parameters, transmission, and recovery, controlling the transitions between different states over time. A combination of these parameters results in a basic reproduction number (ℜ₀), which is the average number of secondary infections caused by a primary case in a fully susceptible population (Rothman and Greenland, 1998).

The classical SIR-type model is heavily based on the number of confirmed positive polymerase chain reaction (PCR) test results (Rahbari et al., 2021). However, the extensions of SIR often include unavailable or scarce data, such as the number of exposed (Girardi and Gaetan, 2023) or quarantined individuals (Mehra et al., 2020), number of hospitalized patients (Storvik et al., 2023) because of viral infection, and imported cases and mobility patterns (Storvik et al., 2023; Engebretsen et al., 2023). Is it actually necessary to feed the model by multiple, often unreliable, sources of data in order to be able to provide a sound tool to decision makers? Without reliable input variables, the uncertainty in the output may escalate quickly. Potential new waves of unexpected viruses, or new strains of the SARS-CoV-2, would demand new measures to maintain public safety as highlighted by the Norwegian and other governments (Government, 2022). Also, the COVID-19 task force of Royal Statistical Society in one of their 10 Covid “lessons learned” urges governments to “build an effective infectious disease surveillance system to monitor the spread of disease” in future pandemics (Royal Statistical Society, 2022).

In this article, we claim and show that it is possible to obtain reliable reproduction numbers by a direct approach, using only the confirmed number of positive PCR tests. By estimating a multivariate nonlinear regression model for a given set of infected cases, we provide a framework enabling an easy estimation of nearly instantaneous time-varying reproduction numbers. In this way, our approach fills a gap in a large body of literature using numerous methods to estimate the SIR model parameters, and thus reproduction numbers, including Bayesian methods (Verity et al., 2020), Monte Carlo approaches (Cano et al., 2020), machine learning (Raissi et al., 2019; Alanazi et al., 2020), and likelihood-based methods (White and Pagano, 2008). Our proposed approach solves a mathematical inverse problem, also considered by Taghizadeh and Mohammad-Djafari (2022) and Marinov and Marinova (2022). Taghizadeh and Mohammad-Djafari (2022) solve a classical nonlinear least-squares problem, providing fixed SEIR [SIR with “exposed” (E) state] model parameters, and further model time dependency in transmission parameter through a periodic function. The periodicity is, however, not necessarily present in an outbreak of contagious disease, as, for example, COVID-19 showed to be. Marinov and Marinova (2022) consider transmission and recovery parameters in the SIR model as piece-wise functions of time, thus resulting in constant reproduction numbers over specific periods of time. We argue that the reproduction numbers updated daily are of crucial importance in the evolving pandemic—particularly in the exponential growth phases, where the waiting to introduce measures to flatten the curve or urge the hospitals to increase the preparedness might be very costly—and advance these shortcomings.

Our approach is generic and could be applied on any data stemming from contagious diseases following on the SIR dynamics. We use Norwegian COVID-19 data as a case study and benchmark our results to the numbers produced by R package EpiEstim (Cori et al., 2013) and by Storvik et al. (2023). The approach presented in the study of Storvik et al. (2023) was extensively used by the Norwegian health authorities to guide decisions regarding the measures taken, thus acting as a gold standard for our study. Fraser (2007) suggested a method for estimating the reproduction numbers combining a time-since-infection model with underlying number of infections, implemented as a ready-to-use tool in the R package EpiEstim by Cori et al. (2013). This approach is recommended in the literature (Gostic et al., 2020) and was used in other European countries (Lithuanian Presidential Office, 2022) to monitor and control the COVID-19 pandemic. Through sensitivity analyses, we demonstrate that our method gives results that are well aligned with changes in the number of infected individuals and change points following policy interventions. Our estimated reproduction numbers are less variable and clearly favorable compared with the results obtained by these two popular approaches.

METHODS

Our proposed nonlinear regression approach is based on the SIR model, a compartmental mathematical model of contagious diseases. In the SIR model, the virus- or bacteria-exposed population is fractioned into states $S$ , $I$ , and $R$ , where the state $S$ is the number of susceptible individuals, that is, the part of the total population which is at risk of being infected (immune-naïve population). The share of individuals that has been infected is measured by state $I$ . Recovered individuals move from state $I$ to $R$ , meaning that state $R$ is the share of the total population which has recovered after the infection. The SIR dynamics in a population of size $N$ can be expressed by a system of ordinary differential equations: ${\begin{matrix} \frac{d S (t)}{d t} = - \frac{β}{N} I (t) S (t) \\ \frac{d I (t)}{d t} = \frac{β}{N} I (t) S (t) - γ I (t) \\ \frac{d R (t)}{d t} = γ I (t) \end{matrix}$ (1)

The components $S (t)$ , $I (t)$ , and $R (t)$ represent the number of individuals in each state at time $t$ and $S (t) \geq 0$ , $I (t) \geq 0$ , $R (t) \geq 0$ . According to the SIR model assumptions, individuals can only occupy one state at a time, and hence $N = S (t) + I (t) + R (t)$ . In the simplest epidemic models, the SIR parameters $β$ and $γ$ are assumed constant; however, they are likely to change with time either because of introduced measures or because of natural variation of environmental conditions (Pellis et al., 2022).

The parameter $β$ in the SIR model is the infection rate (or transmission rate), quantifying how fast the pandemic transmits. It could be interpreted also as the average amount of people that each infectious person can infect every day (Liu et al., 2022). The parameter $γ$ denotes the recovery rate. In many studies, the parameter $γ$ is kept constant, whereas β is modeled as a function of time (Calvetti et al., 2020). However, the recovery rate $γ$ likely depends on demographic properties of a population (Girardi and Gaetan, 2023), and if these change, $γ$ could change as well. In our suggested approach, both $β$ and $γ$ are assumed to be time-varying parameters, as also argued in Mehra et al. (2020). The ratio β / γ from the SIR model defines the basic reproduction number ℜ₀. Only when ℜ₀ > 1, the infection outbreak can get started in a fully susceptible population (Hethcote, 2000). We will focus on ℜ_E, the effective reproduction number (Pellis et al., 2022), showing how many new cases will be infected if control measures are introduced or some individuals have become immune, that is, the number ℜ₀ calculated over time as an outbreak progresses.

Usually, when applying SIR-type models to monitor a pandemic, the components $S$ , $I$ , and $R$ are computed assuming some initial values of both the components and parameters in (1). The approach suggested in this article involves solving an inverse problem. To extract the parameters $β$ and $γ$ , this problem can be defined within the framework of a multivariate nonlinear regression model, that is, for each $t$ (day in a pandemic) ${(β, γ)}^{T} = F (I) + ε$ (2)

Here (β, γ)^T is a two-dimensional column vector of dependent variables, with ${()}^{T}$ denoting the transpose, and $ε$ being a bivariate i.i.d. noise. $F$ is a function that maps the number of infected into a two-dimensional column vector of $β$ and $γ$ and is defined through a numerical algorithm approximating a mathematical inverse problem. We describe the details in this algorithmic approach of defining $F$ next. The evolution of the number of infected, $I$ , is defined through the SIR dynamics in (1), where $I$ depends on both $β$ and $γ$ , and is interrelated with $S$ and $R$ through this dynamics. In defining $F$ , we view $I$ as a mapping of $β$ and $γ$ into $I$ , that is, $(β, γ) \mapsto I (β, γ) .$ (3)

The function $F$ is the inverse of this mapping, that is, $F (I) = {(β, γ)}^{T}$ . Finding $F$ is an ill-posed problem, as $F$ is not uniquely defined. Moreover, the map of the parameters $β$ and $γ$ into the number of infected, $I$ , in (3) is not known analytically but is complexly defined through a nonlinear differential equation. To select one $F$ , we minimize a squared difference between the simulated and the observed number of infected for varying $β$ and $γ$ . Inevitably, $F$ becomes a nonlinear function of $I$ in our approach. In Box 1, we present an example demonstrating how a unique inverse function can be identified in a special, approximative situation.

Box 1.

Example of inverse problem

In the beginning of a pandemic the number of susceptible, $S$ , is close to the total population, i.e., $S \approx N$ . One can therefore approximate the number of infected, $I$ , in the SIR dynamics (1) by the ordinary differential equation $\frac{d I (t)}{d t} = (β - γ) I (t)$ , with the solution $I (t) = I_{0} e^{(β - γ) t}$ , where $I_{0} = I (0)$ . Let us assume that $γ$ is known and focus on the functional relationship $β \mapsto I (β) = I_{0} e^{(β - γ)}$ , for $β \geq γ$ and time $t = 1$ . This is a strictly increasing function in $β$ , and therefore, there exists a unique inverse function $F (I) = γ + l n (\frac{I}{I_{0}})$ , for $I \geq I_{0}$ . $F$ is the inverse in the sense that $F (I (β)) = β$ and $I (F (I)) = I$ . Then, minimizing the function $g (β) = {(I (β) - I)}^{2}$ for a given $I$ results in an optimal $β^{*} = l n (\frac{I}{I_{0}}) + γ$ , for which $g (β^{*}) = 0$ [recall minimization in (4)]. We observe that $β^{*} = F (I)$ .

This example demonstrates that even in a situation of the simplified SIR model, $F$ becomes a nonlinear function of $I$ . Furthermore, as we advocate in our article, in the general situation, the inverse problem does not provide us with a unique function $F$ ; we rather define it using a nonlinear regression [see (4)].

The steps in computing $F$ are as follows. Given the observed daily values of $I$ , defined as a k-day moving average of daily measured numbers (and in addition back-calculated values of $S$ and $R$ ), we simulate the SIR dynamics over the next 24 hours for different values of the parameters $β$ and $γ$ . We select then the pair of these parameters, giving a state $I$ the next day with a value closest to the observed value in terms of the mean squared difference. In the simulation, we use a simple Euler discretization procedure with hourly time steps. More precisely, denoting $\hat{I} (t)$ the observed k-day moving average number of infected on day $t$ , we numerically solve the optimization problem: $\hat{β} (t), \hat{γ} (t) ≔ arg \min_{β, γ} {(I (t + 1) - \hat{I} (t + 1))}^{2} .$ (4)

Here, based on the initial values $\hat{S} (t)$ , $\hat{I} (t)$ , and $\hat{R} (t)$ at time $t$ for the SIR dynamics, we simulate $I (t + 1)$ for the next day by the time-step iterations: $S_{h + 1} (t + 1) = S_{h} (t) - \frac{β h}{N} I_{h} (t + 1) S_{h} (t + 1),$ (5) $I_{h + 1} (t + 1) = I_{h} (t) + \frac{β h}{N} I_{h} (t + 1) S_{h} (t + 1) - γ h I_{h} (t + 1),$ (6)where $h$ is the hourly time step, $h = 0, 1, 2, \dots, 23$ , and $R$ is given by the relation $N = S + I + R$ . Moreover, $S_{0} (t + 1) = \hat{S} (t)$ and $I_{0} (t + 1) = \hat{I} (t)$ . We emphasize that the within-day simulations are indicated by an index $h$ , and the daily values range over $t$ , which we use to keep track of the day in question. Stepping over the days $t$ in the study period, we obtain a time series of fitted values of parameters $\hat{β} (t)$ and $\hat{γ} (t)$ , providing the best estimates under the assumption of an SIR dynamics for the virus spread.

Wrapping up, we use ideas from inverse problems to apply a numerical approach in defining a nonlinear function $F$ mapping the number of infected into parameters $β$ and $γ$ . As this map is not unique, we use a regression on the observed number of infected to select one representation of it. This way, we estimate a multivariate nonlinear regression model simultaneously assessing the parameters $β$ and $γ$ as functions of the stochastic SIR model dynamics. The numerical algorithm returning the function to be minimized is depicted in Table 1, and the process flow chart is shown in Figure 1.

FIG. 1.

Process flowchart illustrating the numerical algorithm.

Table 1.

Numerical Algorithm

function

SIR 24 (β, γ)

Import data for

I

S

and

R

for

k = 1 : 168

Initialize

y

y = 0

;
Initialize SIR

S_{0} = S (k, :);

I_{0} = I (k, :);

R_{0} = R (k, :);

Sett SIR target values

\hat{S} = S (k + 1, :);

\hat{I} = I (k + 1, :);

\hat{R} = R (k + 1, :);

Simulate SIR over next day
for

i = 2 : 24

S (i) = S (i - 1) - ((β (k) / N) * I (i - 1) * S (i - 1) / 24);

I (i) = I (i - 1) + ((β (k) / N) * I (i - 1) * S (i - 1) / 24) - γ (k) * I (i - 1) / 24;

R (i) = N - S (i) - I (i);

end

y = y + sqrt ((I (24) - \hat{I})^2);

end
return

SIR 24 (β, γ) = y

SIR, susceptible–infected–recovered.

Marinov and Marinova (2022) solve a similar inverse problem. We, however, approach it differently. Noisy and uncertain data might cause highly volatile solutions. We handle it through the k-day moving average of number of infected individuals, instead of minimizing over subperiods of fixed length of several days resulting in piece-wise constant estimated transmission and recovery parameters. A moving average makes trends in data more apparent, and it is the trend, and not a day-to-day variation, which is of prime interest under the pandemic. Furthermore, we estimate the two parameters ( $β$ and $γ$ ) by only minimizing with respect to the $I$ component, the only actually measurable input in a SIR context. Marinov and Marinova (2022) optimize with respect to both $S$ and $I$ , leaving the definition of $S$ unaddressed.

Data

We apply our proposed approach on Norwegian COVID-19 data retrieved from the Github-repository (2020), an open source that contains daily measurements of key variables for the COVID-19 pandemic, on national level. The daily number of registered infected cases as confirmed by PCR tests is the main input variable, that is, state $I$ in the SIR model. The chosen time window for the main analysis in this study is January 19 to July 11, 2021. It comprises the third wave of the pandemic in Norway, dominated by the alpha variant of the SARS-CoV-2 strain (GISAID initiative, 2021).

The susceptible state ( $S$ ) at day 1 in the chosen period includes the number of inhabitants not yet infected by the index date ( $N$ = 5,305,127), after combining the population total according to Statistics Norway with the number of infected individuals. The numbers in state $S$ in subsequent days are updated by subtracting the number of newly infected from the remaining not infected individuals. Finally, it is assumed that the number in state $R$ at day $t$ is the same as the number in state $I$ at $t + 5$ , that is, $R (t) = I (t - 5)$ , as the mean contagious period was shown to be 5 days in a prospective, longitudinal, community cohort study (Hakki et al., 2022).

Sensitivity analyses

Several steps were taken to assess the credibility of the estimated reproduction numbers.

Step 1—Structural changes. To assess how well the estimated reproduction numbers correspond to changes in number of reported infected individuals, an analysis of structural changes (Bai and Perron, 1998) in the number of infected cases as well as estimated reproduction numbers was performed.

Step 2—Benchmarking. As our proposed approach requires only the observed data for the $I$ state, we compare our estimates with the estimates obtained from the R package EpiEstim, an approach also based on the number of infected (Cori et al., 2013). In line with Storvik et al. (2023), we assume a serial interval with mean of 7.5 days and standard deviation (SD) of 3 days. We further benchmark our estimates to reproduction numbers calculated by Storvik et al. (2023), which was used by the authorities to monitor the pandemic in Norway.

Step 3—Different wave. We apply our approach to the data from the first wave of the pandemic, covering the time window April 19, 2020, to January 16, 2021. This period is characterized by large variations because of the lack of well-established testing regimes and multiple virus mutations.

Step 4—Short-time prediction. We derive 1-, 3-, and 5-day predictions for number of infected individuals based on reproduction numbers calculated by all three approaches. Based on the definition of the reproductive number, not the SIR dynamics, the predictions were calculated as $\hat{I} (t) = I (t - p) * ℜ_{t - p - 1}^{p}$ (7)where $p$ is a prediction horizon (days). The root mean square error (RMSE) was used to assess the results.

Step 5—Sensitivity and elasticity indexes. We calculate the sensitivity index (SI) for $ℜ_{E}$ and $I$ and elasticity index (EI) for $ℜ_{E}$ relative to parameters $β$ and $γ$ . The SI is defined as change in $ℜ_{E}$ or $I$ relative to change in one of the parameters, whereas EI is defined as the percentage change in $ℜ_{E}$ relative to the percentage change in one of the parameters.

The analysis was performed in Matlab R2022b (for extraction of SIR model parameters), R v4.4.1 (EpiEstim package v2.2–4 for calculating the reproduction numbers for comparative purposes), and STATA v17 (xtbreak package for structural change analysis). Figures were generated in Matlab R2022b.

RESULTS

Daily reproduction numbers—empirical analysis

The SARS-CoV-2 testing activity showed clear weekend effects, with markedly fewer tests registered on Sundays and Mondays. This weekend effect was also confirmed by the empirical autocorrelation function, revealing a 7-day pattern (not shown). We therefore use a 7-day moving average of $I$ instead of observed daily numbers. The first 12 days in the study period were thus included to allow for a 7-day moving average of $I$ and 5-day delay in $I$ with respect to $R$ state. Hence, the proposed approach was effectively applied for data from the period January 31 to July 11, 2021. Time series of the daily measured $I$ as well as the 7-day moving average of $I$ values are presented in Figure 2a. Smoothing by a 7-day moving average procedure clearly removes most of this pattern, making the suggested approach more stable.

FIG. 2.

(a) Observed numbers of infected (black line), a 7-day moving average (blue line), and simulated number of infected (red line); (b) a 7-day moving average of estimated time-varying reproduction number ℜ_E with 95% confidence intervals (CIs, “method 1” and “method 2” are explained in the text); black horizontal line denotes ℜ_E = 1. Vertical black and gray lines indicate structural changes in the observed number of infected and the estimated reproduction numbers, respectively.

In our minimization algorithm, where we extract the parameters for the reproduction number ℜ_E, we choose starting values for both $β$ and $γ$ equal to 0.5 for each daily time step (Girardi and Gaetan, 2023). The resulting series of $I (t)$ generated based on estimated $β$ and $γ$ parameters are presented in Figure 1a along with the observed infected cases $\hat{I} (t)$ . There is an evidently good fit with hardly any deviation from the observed numbers. The calculated RMSE of 1.4 implies an average deviation of about 1.4 infected cases from the observed numbers, a very low number given the range of observed infections of ∼200–1000 over the study period.

The estimated parameters $\hat{β}$ and $\hat{γ}$ were further combined into daily ℜ_E values (ℜ_E $(t)$ ), each defined as $\hat{β} (t) / \hat{γ} (t)$ . Daily ℜ_E values showed a great deal of variation, and thus we present the values smoothed by a 7-day moving average in Figure 2b. The figure also depicts the 95% confidence intervals (CIs), derived by two different methods as follows. The distribution of estimated daily reproduction numbers is nearly normal (p = 0.041 for the Kolmogorov–Smirnov test) with mean 1.0 and SD 0.1. Empirically, ∼95% of the values should be within 1.96 SDs around the mean, implying a half-width of the CI for daily ℜ_Es of size 0.19 (method 1). Such a 95% CI would correspond to the worst-case scenario. Another CI was constructed by taking autocorrelations into account, which implies reduced effective sample size and thus larger standard errors and consequently wider 95% CIs than in the case of independent observations. To obtain an effective sample size, the sample size of $n$ is divided by $τ = 1 + 2 \sum_{k = 1}^{\infty} | ρ_{k} |$ , with number of lags truncated at a point where autocorrelations become close to zero (Kass et al., 1998). We have calculated $τ$ to be 16.4 for the first 100 lags, providing an effective sample size of 10, and consequently half-width of the CI for ℜ_E of 0.06 (method 2), markedly narrower than the one in method 1 (Fig. 2b).

Sensitivity analysis

Step 1—Structural changes. According to the analysis of structural changes, three breakpoints were identified in the time series for the number of infected (black vertical lines in Fig. 2a) and two for the daily ℜ_E series (gray vertical lines in Fig. 2b). The first and second breakpoint in number of infected clearly show increasing and declining numbers, correspondingly, whereas the third point indicates flattening. The first breakpoint in the daily ℜ_E series occurs at the time were ℜ_E becomes larger than 1, whereas the second one corresponds to the time point where ℜ_E is clearly below 1. From the time point of the first breakpoint in ℜ_E, there is about an 11-day delay until the breakpoint in the number of infected, indicating a clear increase. The second breakpoint in ℜ_E corresponds well to the tipping point in the $I$ curve, where a slower decline is observed. This decline accelerates after the second breakpoint in the $I$ curve. Interestingly, the control measures were relaxed in Oslo and Viken, the two most populated counties in Norway, on February 16, just 1 week before ℜ_E exhibited the first breakpoint and 2 weeks before first breakpoint in $I$ . The second breakpoint in $I$ indicating decreasing trend appears just after a couple of weeks with strict measures introduced before and during the Easter at the end of March.

Step 2—Benchmarking. The estimated reproduction numbers are clearly more stable than the numbers obtained by the approach of Cori et al. (2013) (Fig. 3a). The latter approach overestimates the reproduction numbers in intervals where an increase in $I$ is observed and exhibits a delay by ∼4–5 days. The reproduction numbers reported in Storvik et al. (2023) show no such acceleration in values as seen in numbers by Cori’s approach (Fig. 3b). However, the 95% CI is particularly wide, and the reproduction numbers seem to move in many cases in the opposite direction than what ours and Cori et al.’s (2013) numbers indicate.

FIG. 3.

(a) A 7-day moving average (MA) of estimated reproduction numbers by our approach and by the approach of Cori et al (2013) with corresponding 95% CIs; (b) a 7-day moving average of estimated reproduction numbers by our approach and by approach in Storvik et al (2023) with corresponding 95% CIs. Black horizontal line denotes ℜ_E = 1. Vertical black and gray lines indicate structural changes in observed number of infected and estimated reproduction numbers, respectively.

Step 3—Different wave. We next run our approach for ℜ_E on the Norwegian data from the beginning of the pandemic, including the period April 19, 2020–January 16, 2021 (Fig. 4). The model fit was good also for this period, resulting in an RMSE of 2.0 indicating an average deviation of about two infected cases from the observed numbers. There was only one breaking point identified in the series of reproduction numbers, where the clear increase toward 1 was observed. It corresponds well with slightly growing numbers in state $I$ . Also, in number of infected only one breaking point was identified, though much later in the pandemic, where the spread seemed to approach a tipping point. An expected covariation between the number of infected and the reproduction number is clear in this period as well despite low numbers in the beginning and very volatile and high numbers at the end.

FIG. 4.

(a) Observed numbers of infected (black line), a 7-day moving average (blue line) and simulated number of infected (red line); (b) a 7-day moving average of estimated time-varying reproduction number ℜ_E with 95% CIs; black horizontal line denotes ℜ_E = 1. Vertical black and gray lines indicate structural changes in observed number of infected and estimated reproduction number, respectively.

Comparison of our reproduction numbers to the corresponding numbers obtained by the approach of Cori et al. (2013) and Storvik et al. (2023) shows remarkable discrepancies (Fig. 5), indicating that our approach is considerably less sensitive to large variations in the number of infected individuals.

FIG. 5.

Reproduction numbers by the three approaches. Black horizontal line denotes ℜ_E = 1.

Step 4—Short-time prediction. This sensitivity analysis applies the reproduction numbers derived by our and the two other approaches as predictor for a short-term (1-, 3-, and 5-day) prediction for the number of infected individuals (Fig. 6). In all three cases, the approach by Cori et al. (2013) performs worst, whereas our approach is convincingly best according to RMSE. The pronounced volatility and delay observed when benchmarking our results to Cori et al. (2013) and Storvik et al. (2023) propagate into the short-term prediction. Although a 1-day prediction by our approach performs very well, more variation is observed in 3- and 5-day predictions. When the number of infected escalates, the variation increases as well. The increased variation in predictions seems to closely follow this pattern and the breakpoints in the number of infected individuals.

FIG. 6.

Number of infected individuals predicted from reproduction numbers as defined in (7). The RMSE (*) is calculated by skipping predictions up to day 19 to avoid escalated values by approach of Cori et al. (2013).

Step 5—Sensitivity and elasticity analysis. It is easy to show that the SI for $ℜ_{E}$ relative to $β$ is $1 / γ$ , which implies that when infection rate $β$ increases, $ℜ_{E}$ increases by $1 / γ$ . With respect to $γ$ , the SI for $ℜ_{E}$ can be defined as $- β / (γ (t) (γ (t) + Δ γ))$ , where $Δ γ = γ (t + 1) - γ (t)$ . As $γ$ varies little around 0.5 in our dataset, SI is approximately $- β / 0.25$ , implying that when recovery rate increases, the $ℜ_{E}$ decreases by one-fourth of the infection rate. SI for number of infected individuals, $I$ , relative to $β$ and $γ$ is illustrated in Figure 7a and 7b, respectively, demonstrating that $I$ increases with increasing $β$ and decreasing $γ$ . We notice that the example in Box 1 relates directly to the SI for $I$ relative to $β$ . Furthermore, it is easy to show that the EI for $ℜ_{E}$ relative to $β$ is 1. The EI for $ℜ_{E}$ relative to $γ$ is defined as $- γ (t) / γ (t + 1)$ and illustrated in Figure 7c together with the regression equation showing that 1% increase in $γ$ results in ∼1% reduction in $ℜ_{E}$ . Figure 7d depicts EI for $ℜ_{E}$ relative to $γ$ as a function of time.

FIG. 7.

(a) Sensitivity of the model (1) relative to infection rate, $β$ ; (b) sensitivity of the model (1) relative to recovery rate, $γ$ ; (c) elasticity of ℜ_E relative to recovery rate, $γ$ (blue line illustrates the regression equation); (d) daily elasticity index for ℜ_E relative to recovery rate, $γ$ .

DISCUSSION

We propose an approach for estimating nearly instantaneous daily reproduction numbers based only on the number of infected. The approach uses the SIR model dynamics through a multivariate nonlinear regression model, allowing for simultaneous assessment of the time-varying transmission and recovery rate, parameters defining the reproduction number. In some respects, our approach resembles machine learning methodology in the pandemic context (for example, Raissi et al., 2019; Alanazi et al., 2020). However, instead of training the model on a huge amount of input variables, we inform it by the SIR dynamics consisting of ordinary differential equations, making the model parsimonious and easy to apply and fine-tune. As the SIR is an epidemiological model, it points our approach toward the area of physics-informed learning yet not at the expense of numerous input variables, vast amount of data, and computer-intense training.

The proposed method is a direct approach, requiring minimal computational resources, which could provide an attractive tool for monitoring the instantaneous reproduction number, and thus allowing for nowcasting of the pandemic. Only 12 days from the start of a pandemic, our approach enables estimation of daily reproduction numbers for monitoring and decisions regarding the measures. As the input required is only the number of infected individuals, this approach avoids numerous sources of additional variation following from multiple input variables in complex and computationally intensive models (e.g., Storvik et al., 2023).

When benchmarked on reproduction numbers by Cori et al. (2013), our results show that the same tendencies, however, are much less explosive in periods with increasing infection rates. Cori et al. (2013) define the reproduction number at one time point as the ratio between the number of new infections at that time point and the weighted sum of infections up to that time point. This definition implies a long-term memory, and one may therefore ask whether the mixture of historical data and current data slows down—or exaggerates—the response to changes. Furthermore, the approach of Cori et al. (2013) suffers from systematic bias in the initial estimation period (O’Driscol et al., 2021). Comparison with the reproduction numbers acquired by Storvik et al. (2023) shows that our estimated values catch up trends in infection rates more effectively (quicker and more correct) and are much less volatile. There are a couple of interesting points to be mentioned here. Storvik et al. (2023) inform their SEIR model by, among other things, time series of hospitalized individuals. Since there is a delay of up to 10 days in hospitalization with respect to symptom onset (Faes et al., 2020), the reproductive numbers projected from such an SEIR model become outdated and fail to reflect the real time dynamics of the pandemic. Moreover, the model might imply causality problems as hospitalizations occur after infection, which, in turn, is closely linked to the reproductive numbers. In our view, it is counterintuitive to include the hospitalizations as input in the model for determining future reproductive numbers. Furthermore, the number of parameters in the approach by Storvik et al. (2023) is tremendous and thus requires multiple inputs and significant computational resources. Although powerful but costly processors handle the computational issues in the Western world, this might cause serious issues for timely decisions based on such models in less economically developed countries where epidemic is nearly everyday problem. Multiple, often highly uncertain, or difficult-to-assess, inputs only escalate the problem. The distinct delay and strikingly wide 95% CIs in the numbers by Storvik et al. (2023) are likely because of complexity of the model and not necessarily plausible choice and amount of input variables, as well as additional stochastically specified parameters.

The covariation between profiles of series of infected numbers and series of reproduction numbers in our sensitivity analysis was clear when applying the suggested approach to the data from the beginning of the pandemic, exhibiting high volatility in the number of infected individuals. This is a sign of a robust tool. The sensitivity analysis consisting of short-term predictions by the three approaches also demonstrates a superiority of our approach. It is of interest and high relevance that a simple prediction of the number of infected individuals based on the definition of the reproduction number instead of the SIR dynamics performs apparently very well. Finally, sound results of sensitivity analyses and elasticity indexes close to 1 further emphasize the strengths of the suggested approach.

Our approach was developed by assessing several model specifications discussed next.

The best model fit in terms of RMSE was achieved for the case of a 7-day moving average of number of infected as input. The model struggled with some convergence problems when running it on observed daily data, likely because of the apparent weekend effect and the considerable variations in observed values. As a moving average smoothes data and makes trends more perceivable, we argue that such averaged input data make the model more plausible than the approach suggested by Marinov and Marinova (2022). While a 7-day moving average seems to be a credible choice for COVID-19 data, considerations should, however, be done for each contagious disease individually.

The assumed initial value of 0.5 for transmission rate $β$ is well within the interval reported in other studies (Girardi and Gaetan, 2023; Liu et al., 2022). The values of recovery rate $γ$ are usually reported to be below 0.2 (McGowan et al., 2021) but also 0.6 or higher (AlQadi and Bani-Yaghoub, 2022), while we assume the initial value of 0.5. In the context of the number of days a person is infectious (corresponding to 2 days), this value seems to be too high. However, the initial values of $γ$ lower than 0.5 imposed numerical problems. This might indicate complex interactions in the SIR model and should be a subject for future studies. Besides, the parameter identifiability issues in framework of SIR models is a problem discussed in the literature (Melikechi et al., 2022). The extracted values of the two parameters of the SIR model, transmission rate and recovery rate, varied around the initial values. It is, however, important to point out that even though these values are within the range reported in the literature, these are not inherently biological rates but rather rates driven both by biological, environmental, and behavioral factors in the population (Buch et al., 2023).

Numerical problems occurred when attempting to minimize a sum of squares with respect to all three SIR model components simultaneously. The back-calculation of $S$ and $R$ components based on information from the $I$ component stems from the literature in the field. The assumptions made might not be sufficiently sensitive to catch up all interactions between the three components. It, therefore, seemed justified to minimize only with respect to the number of infected, which is, moreover, the only input parameter actually observed. Marinov and Marinova (2022) do optimize over both $S$ and $I$ components; however, their definition of $S$ remains unclear.

To summarize, the comparative analysis demonstrates clearly better properties of our reproduction numbers as compared with other approaches within the same (Storvik et al., 2023) as well as across different (Cori et al., 2013) class of models widely used in assessing SARS-CoV-2 virus. Our reproduction numbers are the function of two virological parameters, transmission rate and recovery rate. The number of infected individuals simulated from the estimated parameter values is very close to the observed ones, showing that the parameters capture the virus properties well. The same is confirmed by a short-term prediction. The assumptions made in our analyses seem to be plausible but should be assessed again when modeling other viruses. For example, even though the SARS-CoV-2 virus had a contagious period of on average 5 days, other viruses may have shorter or longer contagious periods. The contagious period of any novel viruses is usually known, however, from early case reports in the initial phase of an outbreak.

CONCLUSIONS

Our suggested approach is entirely based on the number of infected individuals. Hence, to calculate reliable reproduction numbers, it requires a strong testing regime in an evolving pandemic. Uncertainties because of under-registration of infected individuals propagates into uncertainties in the estimated instantaneous reproduction numbers. However, this is the only source of uncertainty, contrary to the models relying on multiple (and numerous) sources with each own associated uncertainty, both in data and in model specification, which may rapidly lead to an escalation of the overall uncertainty and thereby loss of information content.

Footnotes

ACKNOWLEDGMENTS

The authors are grateful to Geir Storvik and colleagues for providing with their estimates of reproduction numbers in Norway, which were used for comparative purposes. The authors also thank the Editor and the Referees for valuable and constructive comments and questions, which improved the article considerably.

AUTHORS’ CONTRIBUTIONS

All authors contributed to the study conception and design. The analysis plan was prepared by J.Š.B. and F.E.B. Material preparation, data extraction, and analysis were performed by J.Š.B. The first draft of the article was written by J.Š.B., F.E.B., and E.R.N. who contributed significantly to previous versions of the article. J.Š.B., F.E.B., and E.R.N. read and approved the final article.

DATA SHARING

The dataset used and analyzed in this article is an open data source available at Github-repository ().

AUTHOR DISCLOSURE STATEMENT

The authors declare that they have no competing interests.

FUNDING INFORMATION

No funding was received for this article.

References

Alanazi

, Kamruzzaman

, Alruwaili

, et al. Measuring and preventing COVID-19 using the SIR model and machine learning in smart health care. J Healthc Eng, 2020; 2020:8857346; doi: 10.1155/2020/8857346 Article ID 8857346.

AlQadi

, Bani-Yaghoub

. Incorporating global dynamics to improve the accuracy of disease models: Example of a COVID-19 SIR model. PLoS One, 2022; 17(4):e0265815; doi: 10.1371/journal.pone.0265815

Bai

, Perron

. Estimating and testing linear models with multiple structural changes. Econometrica, 1998; 66(1):47–78; doi: 10.2307/2998540

Buch

, Johndrow

, Dunson

. Explaining transmission rate variations and forecasting epidemic spread in multiple regions with a semiparametric mixed effects SIR model. Biometrics, 2023; 79(4):2987–2997; doi: 10.1111/biom.13901

Calvetti

, Hoover

, Rose

, et al. Metapopulation network models for understanding, predicting, and managing the coronavirus disease COVID-19. Front Phys, 2020; 8:261; doi: 10.3389/fphy.2020.00261

Cano

, Morales

, Bendtsen

. COVID-19 modelling: The effects of social distancing. Interdiscip Perspec Infect Dis, 2020; 2020:1–7; doi: 10.1155/2020/2041743

Cori

, Ferguson

, Fraser

, et al. A new framework and software to estimate time-varying reproduction number during epidemics. Am J Epidemiol, 2013; 178(9):1505–1512; doi: 10.1093/aje/kwt133

Elderd

, Dukic

, Dwyer

. Uncertainty in predictions of disease spread and public health responses to bioterrorism and emerging diseases. Pnas, 2006; 103:15693–15697; doi: 10.1073/pnas.06008161

Engebretsen

, Palomares

AD-L

, Rø

, et al. A real-time regional model for COVID-19: Probabilistic situational awareness and forecasting. PLoS Comput Biol, 2023; 19(1):e1010860; doi: 10.1371/journal.pcbi.1010860

10.

Faes

, Abrams

, van Beckhoven

, et al. Time between symptom onset, hospitalisation and recovery or death: Statistical analysis of Belgian COVID-19 patients. Int J Environ Res Public Health, 2020; 17(20):7560; doi: 10.3390/ijerph17207560

11.

Fraser

. Estimating individual and household reproduction numbers in an emerging epidemic. PLoS One, 2007; 2(8):e758; doi: 10.1371/journal.pone.0000758

12.

Girardi

, Gaetan

. An SEIR model with time-varying coefficients for analysing the SARS-CoV-2 epidemic. Risk Anal, 2023; 43(1):144–155; doi: 10.1111/risa.13858

13.

GISAID initiative. 2021. Available from: https://covariants.org/per-country?country=Norway [Last accessed: March 23, 2023].

14.

Github-repository. 2020. Available from: https://github.com/folkehelseinstituttet/surveillance_data [Last accessed: November 18, 2022].

15.

Gostic

, McGough

, Baskerville

, et al. Practical considerations for measuring the effective reproductive number, R_t. PLoS Comput Biol, 2020; 16(12):e1008409; doi: 10.1371/journal.pcbi.1008409

16.

Government. 2022. Available from: https://www.regjeringen.no/en/aktuelt/long-term-covid-19-strategy-to-normalise-everyday-life/id2907426 [Last accessed December 1, 2022].

17.

Hakki

, Zhou

, Jonnerby

, et al. Onset and window of SARS-CoV-2 infectiousness and temporal correlation with symptom onset: A prospective, longitudinal, community cohort study. Lancet Respir Med, 2022; 10(11):1061–1073; doi: 10.1016/S2213-2600(22)00226-0

18.

Hethcote

. The mathematics of infectious diseases. SIAM Rev, 2000; 42(4):599–653; doi: 10.1137/S0036144500371907

19.

Kass

, Carlin

, Gelman

, et al. Markov chain Monte Carlo in practice: A roundtable discussion. Am Stat, 1998; 52(2):93–100; doi: 10.2307/2685466

20.

Lithuanian Presidential Office. (2022). Available from: www.lrp.lt/lt/sveikatos-ekspertu-taryba [Last accessed: June 15, 2023].

21.

Liu

, Hendeby

, Gustafsson

. Joint estimation of states and parameters in stochastic SIR models. IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), Bedford, 2022; 10.1109/MFI55806.2022.9913861

22.

Marinov

, Marinova

. Inverse problem for adaptive SIR model: Application to COVI-19 in Latin America. Infect Dis Model, 2022; 7(1):134–148; doi: 10.1016/j.idm.2021.12.001

23.

McGowan

, Grantz

, Murray

. Quantifying uncertainty in mechanistic models of infectious disease. Am J Epidemiol, 2021; 190(7):1377–1385; doi: 10.1093/aje/kwab013

24.

Mehra

AHA

, Shafieirad

, Abbasi

, et al. Parameter estimation and prediction of COVID-19 epidemic turning point and ending time of a case study on SIR/SQAIR epidemic models. Comput Math Methods Med, 2020; 2020:1465923; doi: 10.1155/2020/1465923

25.

Melikechi

, Young

, Tang

, et al. Limits of epidemic prediction using SIR models. J Math Biol, 2022; 85(4):36; doi: 10.1007/s00285-022-01804-5

26.

Muralidar

, Ambi

, Sekaran

, et al. The emergence of COVID-19 as a global pandemic: Understanding the epidemiology, immune response and potential therapeutic targets of SARS-CoV-2. Biochimie, 2020; 179:85–100; doi: 10.1016/j.biochi.2020.09.018

27.

O’Driscol

, Harry

, Donnelly

, et al. Comparative analysis of statistical methods to estimate the reproduction number in emerging epidemics, with implications for the current coronavirus disease 2019 (COVID-19) pandemic. Clin Infect Dis, 2021; 73(1):e215–23–e223; doi: 10.1093/cid/ciaa1599

28.

Pellis

, Birrell

, Blake

, et al. Estimation of reproduction numbers in real time: Conceptual and statistical challenges. J R Stat Soc Ser A Stat Soc, 2022; 185(Suppl 1):S112–S130; doi: 10.1111/rssa.12955

29.

Rahbari

, Moradi

, Abdi

. rRT-PCR for SARS-CoV-2: Analytical considerations. Clin Chim Acta, 2021; 516:1–7; doi: 10.1016/j.cca.2021.01.011

30.

Raissi

, Ramezani

, Seshaiyer

. On parameter estimation approaches for predicting disease transmission through optimization, deep learning and statistical inference methods. LiB, 2019; 6(2); doi: 10.30707/LiB6.2Raissi

31.

Rothman

, Greenland

. Modern Epidemiology. 2nd ed. Lippincott Williams & Wilkins; New York. 1998.

32.

Royal Statistical Society. 2022. Available from: https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/statistics,-data-and-covid-(1)/ [Last accessed: April 1, 2023].

33.

Storvik

, Palomares

AD-L

, Engebretsen

, et al. A sequential Monte Carlo approach to estimate a time varying reproduction number in infectious disease models: The Covid-19 case. J. R. Stat Soc Ser A, 2023; 186(4):616–632; doi: 10.1093/jrsssa/qnad043

34.

Taghizadeh

, Mohammad-Djafari

. SEIR modeling, simulation, parameter estimation, and their application for COVID-19 epidemic prediction. Phys Sci For, 2022:18; doi: 10.3390/psf2022005018

35.

Verity

, Okell

, Dorigatti

, et al. Estimates of the severity of coronavirus disease 2019: A model-based analysis. Lancet Infect Dis, 2020; 20(6):669–677; doi: 10.1016/S1473-3099(20)30243-7

36.

White

, Pagano

. A likelihood-based method for real-time estimation of the serial interval and reproductive number of an epidemic. Stat Med, 2008; 27(16):2999–3016; doi: 10.1002/sim.3136