Sage Journals: Discover world-class research

Abstract

The novel coronavirus disease 2019 (COVID-19) is a contagious disease with high transmissibility to spread worldwide, reported to present a certain burden on worldwide public health.

This study aimed to determine epidemic occurrence probability at any reasonable time horizon in any region of interest by applying modern novel statistical methods directly to raw clinical data. This paper describes a novel bio-system reliability approach, particularly suitable for multi-regional health and stationary environmental systems, observed over a sufficient period of time, resulting in a reliable long-term forecast of the highly pathogenic virus outbreak probability.

For this study, COVID-19 daily recorded patient numbers in most affected Sweden regions were chosen. This work aims to benchmark state-of-the-art methods, making it possible to extract necessary information from dynamically observed patient numbers while considering relevant territorial mapping.

The method proposed in this paper opens up the possibility of accurately predicting epidemic outbreak probability for multi-regional biological systems. Based on their clinical survey data, the suggested methodology can be used in various public health applications.

Key findings are:

A novel spatiotemporal health system reliability method has been developed and applied to COVID-19 epidemic data.

Accurate multi-regional epidemic occurrence prediction is made.

Epidemic threshold confidence bands given.

Keywords

COVID-19 epidemic outbreak probability forecast public health SARS-CoV-2

Introduction

The main motivation of this study was to apply a novel reliability method, newly developed by the authors, to a contemporary unfiltered health system data set.

Statistical aspects of COVID-19 (SARS-CoV-2) and other similar recent epidemics were receiving much attention in the modern research community.¹ Generally, it is challenging to calculate realistic biological system reliability factors and outbreak probabilities under actual epidemic conditions by using conventional theoretical statistical methods.^2–11 The latter is usually due to many degrees of system freedom and random variables governing dynamic biological systems spread over extensive terrain.^12–16 In general, the reliability of a complex bio-system may be accurately assessed straightforwardly by having enough measurements or by using direct Monte Carlo simulations (if such a model is available). For COVID-19, however, the only available observation numbers are limited by the beginning of the year 2020. Motivated by the latter argument, the authors have introduced a novel reliability method for biological and health systems to predict and manage epidemic outbreaks more accurately. His study focused on COVID-19 epidemics in Sweden, focusing on cross-correlations between different regions within the same climatic zone. Sweden was chosen because of its COVID-19 origin and extensive health observations and related research available online.^4,17–29 For other studies related to statistical variations per country.¹⁷

In this paper, an epidemic outbreak is viewed as an unexpected incident that may occur in any region of a given country at any time; therefore, the spatial spread is accounted for. Moreover, a specific non-dimensional factor $λ$ is introduced to predict the latter epidemic risk at any time and any place.

Biological systems are subjected to ergodic environmental influences. The other alternative is to view the process as being dependent on specific environmental parameters whose variation in time may be modelled as an ergodic process on its own.

The incidence data of COVID-19 in 21 Sweden regions from February 2020 until today were retrieved from the public website.³⁰ As this valuable dataset is per Sweden region, the biological system under consideration can be regarded as a multi-degree of freedom (MDOF) dynamic system with highly inter-correlated regional components/dimensions. Some recent studies have already used statistical tools to predict COVID-19 development. For the linear log model, see Chu.²

Note that while this study aims to reduce the risk of future epidemic outbreaks by predicting them, it is solely focused on daily registered patient numbers and not symptoms. For long-lasting COVID-19 symptoms, the so-called ‘long COVID’, and its risk factors and whether it is possible to predict a protracted course early in the disease. Figure 1 presents the map of Sweden's regions (counties).

Figure 1.

Left: Map of Sweden with regions (counties) with recorded COVID cases. Right: Coronavirus image.

Methods

Let one consider a MDOF bio-system represented by either response or environmental load, or combined response/load vector $R (t) = (X (t), Y (t), Z (t), \dots)$ , that has been either measured or simulated over a sufficiently long time period $(0, T)$ . Unidimensional $R$ vector component maxima being denoted as $X_{T}^{\max} = max_{0 \leq t \leq T} X (t)$ , $Y_{T}^{\max} = max_{0 \leq t \leq T} Y (t)$ , $Z_{T}^{\max} = max_{0 \leq t \leq T} Z (t), \dots$ . Let $X_{1}, \dots, X_{N_{X}}$ be temporally consequent local maxima of the component process $X = X (t)$ at discrete temporally increasing times $t_{1}^{X} < \dots < t_{N_{X}}^{X}$ within $(0, T)$ . Identical definitions follow for other MDOF components $Y (t), Z (t), \dots$ namely $Y_{1}, \dots, Y_{N_{Y}};$ $Z_{1}, \dots, Z_{N_{Z}}$ and so on. For simplicity, all $R (t)$ components, and hence their maxima have been assumed to be non-negative. Then

\begin{array}{l} P = ∭_{(0, 0, 0,, \dots)}^{(η_{X}, η_{Y}, η_{Z}, \dots)} p_{X_{T}^{max}, Y_{T}^{max}, Z_{T}^{max}, \dots} \\ (X_{T}^{max}, Y_{T}^{max}, Z_{T}^{max}, \dots) d X_{T}^{max} d Y_{N_{Y}}^{max} d Z_{N_{z}}^{max} \dots \end{array}

(1)

being the probability of dynamic system survival with critical values of system components being denoted as

η_{X}

η_{Y}

η_{Z}

,…,

\cup

beings logical unity operator «or»

p_{X_{T}^{\max}, Y_{T}^{\max}, Z_{T}^{\max}, \dots}

being joint probability density function (PDF) of the individual component maxima. As the system number of degrees of freedom (NDOF) is large, it is not practically feasible to estimate the joint PDF directly

p_{X_{T}^{\max}, Y_{T}^{\max}, Z_{T}^{\max}, \dots}

and therefore survival probability P. The latter probability

P

, however, needs to be estimated, as the system expected lifetime, according to equation (1). Bio-system unidimensional components

X, Y, Z, \dots

being now re-scaled and non-dimensionalised as follows

X \to \frac{X}{λ η_{X}}, Y \to \frac{Y}{λ η_{Y}}, Z \to \frac{X}{λ η_{X}}, \dots

(2)

making all bio-responses non-dimensional and having the same target failure limit, when

λ = 1

, with target failure probability

P = P (1)

. Equation (2) may be used now to define

P (λ)

as a function of non-dimensional level

λ

. Next, unidimensional system components’ local maxima being merged into one temporally non-decreasing synthetic vector

\vec{R} = (R_{1}, R_{2}, \dots, R_{N})

in accordance with the corresponding merged time vector

t_{1} \leq \dots \leq t_{N}

N = N_{X} + N_{Y} + N_{Z} + \dots

. Each local maxima

R_{j}

being actual encountered bio-system component local maxima, corresponding to either

X (t)

Y (t)

Z (t)

or other system components. Constructed synthetic

\vec{R}

vector has no data loss (see Figure 2).

Figure 2.

Example of how two components, X and Y, are merged to create a new synthetic vector $\vec{R}$ .

Now the non-decreasing synthetic vector $\vec{R}$ and its corresponding temporally non-decreasing occurrence times have been fully introduced.

Figure 3 presents a schematic flowchart, sketching the suggested methodology as a tool for epidemic spread surveillance.

Figure 3.

Flowchart, sketching suggested methodology.

Results

Prediction of influenza-like epidemics has long been the focus of attention in epidemiology and mathematical biology. It is well known that public health dynamics is a highly non-linear multidimensional and spatially cross-correlated dynamic system that is always challenging to analyse. Previous studies have used a variety of approaches to model influenza-like cases. This section illustrates the efficiency of the above-described methodology using the new method applied to the real-life COVID-19 datasets, presented as a new daily recorded infected patient time series spread over large terrains.

COVID-19 and influenza are contagious diseases with high transmissibility to spread worldwide with considerable morbidity and mortality. They occur most frequently seasonally in late autumn, winter and early spring, reaching their peak prevalence mostly in winter. Seasonal influenza epidemics caused by influenza A and B viruses typically occur annually during winter in temperate regions and present an enormous burden on worldwide public health, resulting in around 3–5 million cases of severe illness and 250,000–500,000 deaths worldwide each year, according to the World Health Organization (WHO).³

This section presents a real-life application of the above-described method. The statistical data in the present section are taken from the official Sweden website.³⁰ The website provides the number of newly diagnosed cases every day in Sweden from 22 January 2020 to 6 May 2022. Patient numbers from 21 different Sweden regions were chosen as components $X, Y, Z, \dots$ thus constituting an example of a twenty-one dimensional (21D) dynamic biological system. To unify all 21 measured time series $X, Y, Z$ ,…, the following scaling was performed according to equation (2), making all 21 responses non-dimensional and having the same failure limit equal to 1. Failure limits, or in other words, epidemic thresholds, were chosen differently for different regions in this paper $η_{X}, η_{Y}, η_{Z}, \dots$ and were set equal to observed 2 years maxima, twice increased. Next, all local maxima from 21 measured time series were merged into one single time series by keeping them in time non-decreasing order: $\vec{R} = (\max {X_{1}, Y_{1}, Z_{1}}, \dots, \max {X_{N}, Y_{N}, Z_{N}})$ with the whole vector $\vec{R}$ being sorted according to non-decreasing times of occurrence of these local maxima.

Figure 4 presents new daily recorded patients number plotted as a surface. Figure 5 presents the number of new daily recorded patients as a 21D vector $\vec{R}$ , consisting of assembled regional new daily patient numbers. Note that vector $\vec{R}$ does not have physical meaning on its own, as it is assembled of different regional components with different epidemic backgrounds. Index j is just a running index of local maxima encountered in a non-decreasing time sequence.

Figure 4.

New daily recorded patients number plotted as a surface: Provinces and time dependency.

Figure 5.

Number of new daily recorded patients as 21D vector $\vec{R}$ . Left: As it is. Right: Scaled by equation (12).

Figure 6 (left) presents 100 years return level extrapolation according to equation (9) towards epidemic outbreak with 100 year return period, indicated by the horizontal dotted line, and somewhat beyond, $λ = 0.1$ cut-on value was used. Dotted lines indicate extrapolated 95% confidence interval according to equation (10). According to equation (5), $p (λ)$ is directly related to the target failure probability $1 - P$ from equation (1). Therefore, in agreement with equation (5), system failure probability $1 - P \approx 1 - P_{k} (1)$ can be estimated. Note that in equation (5), N corresponds to the total number of local maxima in the unified response vector $\vec{R}$ . Conditioning parameter $k = 5$ was found to be sufficient due to occurrence of convergence with respect to k, see equation (6). Figure 6 exhibits reasonably narrow 95% CI. The latter is an advantage of the proposed method.^31–35

Figure 6.

Left: 100 years return level (horizontal dotted line) extrapolation of $p_{k} (λ)$ towards critical level (indicated by a star) and beyond. Extrapolated 95% CI indicated by dotted lines. Right: COVID Swedish national statistics, SODP plot.

Note that while being novel, the above-described methodology has a clear advantage of utilising available measured data sets quite efficiently due to its ability to treat health system multi-dimensionality and perform accurate extrapolation based on a quite limited data set. Note that the predicted non-dimensional $λ$ level, indicated by a star in Figure 6 (left), represents the probability of an epidemic outbreak in any Sweden region in the years to come.

The second-order difference plot (SODP) originated from the Poincare plot. SODP provides observing the statistical situation of consecutive differences in time series data.

Figure 6 (right) presents the SODP plot. This kind of plot can be used for data pattern recognition and comparison with other data sets, for example, for the entropy artificial intelligence (AI) recognition approach.³⁶ Note that EVT is asymptotic and 1DOF, while this study introduces MDOF and sub-asymptotic approaches. To summarise, the predicted non-dimensional λ level, indicated by the star in Figure 6 (left) represents the probability of world cancer deaths in the years to come. The methodology's limitation lies in its assumption of the underlying bio-environmental process quasi-stationarity.^37,38

Discussion

Traditional health systems reliability methods dealing with observed time series do not have the advantage of dealing efficiently with systems possessing high dimensionality and cross-correlation between different system responses. The key advantage of the introduced methodology is its ability to study the reliability of high dimensional non-linear dynamic systems.

Despite the simplicity, the present study successfully offers a novel multidimensional modelling strategy and a methodological avenue to implement the forecasting of an epidemic during its course.

This paper studied recorded COVID-19 patient numbers from 21 different Sweden regions, constituting an example of a 21 dimensional (21D) observed in 2020−2022. The novel reliability method was applied to new daily patient numbers as a multidimensional system in real-time. The theoretical reasoning behind the proposed method is given in detail. Note that the use of direct either measurement or Monte Carlo simulation for dynamic biological system reliability analysis is attractive; however, dynamic system complexity and its high dimensionality require the development of novel robust and accurate techniques that can deal with a limited data set at hand, utilising available data as efficient as possible.

The main conclusion is that Sweden's public health system under local environmental and epidemiologic conditions is well managed. The predicted 100-year return period risk level $λ$ of an epidemic outbreak is very low.

Various authors with different approaches have shown the usage of statistics through EVT and other models in medicine. One such method used the block maxima approach, while another used the Peak Over Threshold approach to estimate the distribution of extremes. Even though both these studies showed their suitability for estimating the extreme values, each of them had its limitations, with one of them requiring a large amount of data.

This study further aimed to develop a general-purpose, robust, and straightforward multidimensional reliability method. The method introduced in this paper has been previously validated by application to a wide range of simulation models, but for only one-dimensional system responses and, in general, very accurate predictions were obtained. Both measured and numerically simulated time series responses can be analysed. It is shown that the proposed method produced a reasonable confidence interval. Thus, the suggested methodology may become appropriate for various non-linear dynamic biological systems reliability studies. Finally, the suggested methodology can be used in many public health applications. The presented COVID-19 example does not limit areas of new method applicability.

The major limitations of the suggested approach are as follows:

Manipulated underlying data set

Underlying trend

System non-stationarity

While, as mentioned, C is not a major obstacle for the suggested method, future works may extend current findings to cope with B's limitation, namely to identify the underlying trend.

Footnotes

Availability of data and materials

The datasets analysed during the current study are available online.³⁰ The authors confirm that all methods were performed following the relevant guidelines and regulations according to the Declarations of Helsinki.

Contributorship

All authors contributed equally to this work.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Ethical approval

Not applicable.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Informed Consent

Not applicable, as no patients were involved in this study.

ORCID iD

Oleg Gaidai

References

Chen

Lei

Zhang

, et al. Using extreme value theory approaches to forecast the probability of outbreak of highly pathogenic influenza in Zhejiang, China. PLoS ONE 2015; 10: e0118521.

Chu

A statistical analysis of the novel coronavirus (COVID-19) in Italy and Spain. PLOS ONE 2021. https://www.folkhalsomyndigheten.se/smittskydd-beredskap/utbrott/aktuella-utbrott/covid-19/statistik-och-analyser/bekraftade-fall-i-sverige/

World Health Organization. Influenza Fact Sheet. 2014 Mar [cited 10 June 2014]. Geneva: World Health Organization. Available at: http://www.who.int/mediacentre/factsheets/fs211/en/index.html

Gaidai

Cao

Xing

, et al. Piezoelectric energy harvester response statistics. Micromachines (Basel) 2023; 14: 271; Algorithms Group, 2010. NAG Toolbox for MatLab. Oxford, Sweden: NAG Ltd.

Rice

SO.

Mathematical analysis of random noise. Bell System Tech J 1944; 23: 282–332.

Madsen

Krenk

Lind

NC.

Methods of structural safety. Englewood Cliffs: Prentice-Hall Inc. 1986.

Kim EK, Seok JH, Oh JS, et al. Use of Hangeul Twitter to track and predict human influenza infection. PLoS One 2013; 7: e69305. PMID: 23894447.

Lazebnik

Blumrosen

Advanced multi-mutation with intervention policies pandemic model. IEEE Access 2022; 10; 22769–22781.

Di Giamberardino

Iacoviello

Papa

, et al. A data-driven model of the COVID-19 spread among interconnected populations: epidemiological and mobility aspects following the lockdown in Italy. Non-Linear Dyn 2021; 106: 1239–1266.

10.

Gaidai O, Xing Y and Xu X. Novel methods for coupled prediction of extreme wind speeds and wave heights. Sci Rep. 2023. https://doi.org/10.1038/s41598-023-28136-8.

11.

Viguerie

Lorenzo

Auricchio

, et al. Simulating the spread of COVID-19 via a spatially-resolved susceptible–exposed–infected–recovered–deceased (SEIRD) model with heterogeneous diffusion. Appl Math Lett 2021; 111: 106617. ISSN 0893-9659.

12.

Bajiya

Bugalia

Tripathi

JP.

Mathematical modeling of COVID-19: impact of non-pharmaceutical interventions in India. Chaos 2020; 30: 113143. PMID: 33261327.

13.

Bugalia

Bajiya

Tripathi

, et al. Mathematical modeling of COVID-19 transmission: the roles of intervention strategies and lockdown. Math Biosci Eng 2020; 17: 5961–5986. PMID: 33120585.

14.

Bugalia

Tripathi

Wang

Mathematical modeling of intervention and low medical resource availability with delays: applications to COVID-19 outbreaks in Spain and Italy. Math Biosci Eng 2021; 18: 5865–5920. PMID: 34517515.

15.

Akman

Chauhan

Ghosh

, et al. The hard lessons and shifting modeling trends of COVID-19 dynamics: multiresolution modeling approach. Bull Math Biol 2022; 84: 3.

16.

Romero-Severson

Sanche

, et al. Estimating the reproductive number R0 of SARS-CoV-2 in the United States and eight European countries and implications for vaccination. J Theor Biol 2021; 517: 110621. Epub Feb 13, 2021. PMID: 33587929; PMCID: PMC7880839

17.

Gondauri

Mikautadze

Batiashvili

, Research on COVID-19 virus spreading statistics based on the examples of the cases from different countries. Electron J Gen Med 2020; 17, Article No: em209,

18.

Ditlevsen

Madsen

. Structural reliability methods. Chichester (Sweden): John Wiley & Sons, Inc, 1996.

19.

Zhu

Zhang

Wang

, et al. A novel coronavirus from patients with Pneumonia in China, 2019. N Engl J Med 2020.

20.

Leung

GM.

Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study. Lancet 2020: 1–3. Hopkins Johns. University Center for Systems and Science Engineering. Coronavirus COVID-19 Global Cases. Available at: https://coronavirus.jhu.edu/map.html (Accessed March 25, 2020).

21.

Gaidai

Yan

Xing

Prediction of extreme cargo ship panel stresses by using deconvolution. Front Mech Eng, 2022.

22.

Deng

Coronavirus disease 2019 (COVID-19): what we know?

J Med Virol 2020; 92: 719–725.

23.

McGoogan

JM.

Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: summary of a report of 72 314 cases from the Chinese center for disease control and prevention. JAMA 2020; 2019: 3–6.

24.

Organization WH. Coronavirus disease 2019 (COVID-19) Situation Report - 70. 30 March 2020.

25.

Bailey

NTJ.

The total size of a general stochastic epidemic. Biometrika 1953; 40: 177.

26.

Becker

Britton

Statistical studies of infectious disease incidence. J R Statist Soc B 1999;61:287–307.

27.

Lan

, et al. Positive RT-PCR Test Results in Patients Recovered from COVID-19. JAMA. 2020.

28.

Kermack

McKendrick

AG.

A contribution to the mathematical theory of epidemics. Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character 1927; 115: 700–721.

29.

Bailey

NTJ.

Maximum-likelihood estimation of the relative removal rate from the distribution of the total size of an intra household epidemic. J Hyg (Lond) 1954; 52: 400–402. PMID:13212043 PMCID: PMC2217790

30.

https://www.folkhalsomyndigheten.se/smittskydd-beredskap/utbrott/aktuella-utbrott/covid-19/statistik-och-analyser/bekraftade-fall-i-sverige/

31.

Gaidai

Cao

Loginov

. Global cardiovascular diseases death rate prediction. Curr Probl Cardiol 2023.

32.

Gaidai

Cao

Xing

, et al. Extreme springing response statistics of a tethered platform by deconvolution. International Journal of Naval Architecture and Ocean Engineering 2023. https://doi.org/10.1016/j.ijnaoe.2023.100515

33.

Gaidai

Xing

Balakrishna

, et al. Improving extreme offshore wind speed prediction by using deconvolution. Heliyon 2023. https://doi.org/10.1016/j.heliyon.2023.e13533

34.

Gaidai

Xing

Prediction of death rates for cardiovascular diseases and cancers. Cancer Innovation 2023.

35.

Gaidai

Wang

Yakimov

COVID-19 multi-state epidemic forecast in India. Proceedings of the Indian National Science Academy 2023.

36.

Yayık

Kutlu

Altan

Regularised HessELM and Inclined Entropy Measurement for Congestive Heart Failure Prediction. Cornell University. https://arxiv.org/abs/1907.05888

37.

Gaidai

Yan

Xing

, et al. A novel statistical method for long-term coronavirus modelling. F1000 research 2022. https://orcid.org/0000-0003-0883-48542

38.

Gaidai

Yan

, et al. Novel methods for wind speeds prediction across multiple locations. Sci Rep 2022; 12: 19614.

Multi-regional COVID-19 epidemic forecast in Sweden

Abstract

Keywords

Introduction

Methods

Results

Discussion

Footnotes

Availability of data and materials

Contributorship

Declaration of conflicting interests

Ethical approval

Funding

Informed Consent

ORCID iD

References