Sage Journals: Discover world-class research

Abstract

Background:

Respiratory-related mortality in Bangladesh exhibits pronounced seasonal trends, yet systematic forecasting to guide public health interventions remains limited. This study analyzes temporal patterns and projects future mortality using nationally certified cause-of-death data.

Methods:

We applied a Seasonal Autoregressive Integrated Moving Average (SARIMA) model to monthly respiratory mortality records (January 2018–December 2024, 84 data points). Model selection was guided by AIC/BIC criteria, with performance evaluated via fivefold time-series cross-validation (training/test splits: Fold 1: 24/12, Fold 2: 36/12, Fold 3: 48/12, Fold 4: 60/12, Fold 5: 72/12). Seasonality was quantified through decomposition, and 18-month forecasts were generated with 95% prediction intervals.

Results:

Mortality demonstrated strong seasonal peaks in July, August (79.5% of death counts in July), and January (66.2% of death counts in July), aligning with monsoon and winter periods. The optimal SARIMA(1,2,3)(0,1,1)[12] model achieved robust accuracy: mean absolute scaled error (MASE) = 0.41, MAE = 259.123, and RMSE = 337.238 (cross-validation mean: MASE = 0.58, MAE = 520.13). Forecasts projected stable monthly averages of 1724 (95% CI: 782–2667) deaths in 2025 and 1711 (593–2830) in 2026, with highest occurrence in July, August then January. Residual diagnostics confirmed model adequacy (Ljung-Box p = 0.971), though non-normality (Shapiro-Wilk p < 0.001) was present, it was negligible.

Conclusion:

Respiratory mortality in Bangladesh follows predictable seasonal patterns, with monsoon and winter peaks demanding targeted interventions. The SARIMA model provides actionable forecasts for preemptive public health measures, such as pre-monsoon vaccination campaigns and pollution control. Future research should explore environmental covariates to enhance precision.

Keywords

respiratory mortality medical certificate cause of deaths (MCCD)SARIMA model time-series forecasting Bangladesh seasonal trends public health planning mortality surveillance LMICs high-risk periods

Significance for public health

This study enhances our understanding of respiratory mortality trends in Bangladesh by revealing distinct seasonal peaks in July, August, and January, coinciding with monsoon rains and winter months. Using a rigorously validated SARIMA model, it generates reliable 18-month forecasts, equipping public health officials with evidence to guide interventions. These may include strategically timed vaccination drives (e.g. pre-monsoon in April–May), air pollution controls, and optimized healthcare staffing during high-risk periods. The findings advocate for integrating time-series forecasting into national disease surveillance, offering a proactive tool to curb respiratory mortality in resource-limited settings like Bangladesh. Future models could be strengthened by incorporating environmental data (e.g. air quality, temperature), sharpening predictive accuracy for tailored responses. Such data-driven strategies are invaluable for epidemic preparedness and efficient use of scarce medical resources, enabling policymakers to mitigate seasonal outbreaks, reduce healthcare strain, and save lives.

Introduction

Respiratory diseases continue to be a leading cause of mortality globally, with South Asia, including Bangladesh, facing particularly high burdens due to a combination of environmental, socio-economic, and healthcare related factors.^1,2 Occurrence of respiratory diseases are usually influenced by a multitudes of factors such as air pollution, temperature fluctuations, and viral outbreaks which often follow seasonal pattern and thus the respiratory-related morbidity and mortality statistics also may show temporal/seasonal fluctuations.³ Accurate identification of these temporal patterns and the ability to forecast future trends are crucial for effective public health response. The Medical Certification of Cause of Death (MCCD), defined by the 20th World Health Assembly, provides a standardized method for capturing mortality data,^4,5 but its application, especially in time series analysis in Bangladesh, remains underexplored.⁶ This study aims to fill this gap by analyzing 7 years of monthly MCCD data on respiratory-related mortality, using a Seasonal Autoregressive Integrated Moving Average (SARIMA) model.⁷ The goal is to identify seasonality in respiratory mortality trends and generate reliable forecasts for future mortality, thereby supporting evidence-based decision-making, resource allocation, and health system responses. By employing SARIMA modeling, this research aims to elucidate on mortality dynamics and improve the prediction of high-risk periods for respiratory conditions, which can inform timely public health interventions and optimize healthcare resource distribution.

Objectives

The primary objectives are: (1) to detect and quantify any seasonal patterns in respiratory disease related mortality, (2) to develop and validate an optimized SARIMA model for forecasting future trends, and (3) to identify high-risk months and generate 18-month forecasts with associated uncertainty. The findings aim to support public health authorities in prioritizing preventive strategies and allocating resources more effectively during critical periods.

Methodology

Study design

This is a secondary data-based observational study. The STROBE guideline for observational studies was followed for reporting this study.

Study place

All data were pertaining to Bangladesh.

Study period

The study was conducted between January and May, 2025, however the data (Monthly number of deaths medically certified as resulting from respiratory conditions) pertained to a period that spanned form January 2018 till December 2024.

Inclusion and exclusion criteria

The study included all monthly counts of deaths medically certified as resulting from respiratory conditions, as reported in the national MCCD database for Bangladesh from January 2018 to December 2024. No data points were excluded from the analysis, as the objective was to model the complete, nationally reported time series. The sole criterion for inclusion was the availability of a validated monthly data point within the specified timeframe.

Variables

Monthly number of deaths due to respiratory diseases, according to Medical Certificate Cause of Deaths (MCCD) database of Bangladesh.

Operational definition

Death medically certified as resulting from respiratory conditions refers to cases where a qualified medical practitioner formally documents the underlying cause of death as attributable to diseases or disorders of the respiratory system. This classification follows the International Classification of Diseases (ICD-10/ICD-11) coding standards and also adheres to World Health Organization (WHO) guidelines for cause-of-death reporting.^8,9

Data source and preprocessing

Data pertaining to monthly number of deaths medically certified as resulting from respiratory conditions occurring between January 2018 and December 2024 were obtained from the MCCoD module of Real-time Health Information Database, which is a publicly displayed web-based digital dashboard, displaying live health statistics, run by Directorate General of Health Services (DGHS) under the Ministry of Health and Family Welfare (MOHFW) of Bangladesh.¹⁰ The dataset, comprising 84 monthly observations (January 2018–December 2024), was processed and validated programmatically. This validation included checks for file integrity, the presence of essential columns, and most importantly, the continuity of the monthly time series. The process confirmed a complete series with no temporal gaps or missing values, forming a robust foundation for SARIMA modeling.

Seasonality detection and decomposition

To evaluate underlying patterns, each time series was decomposed using additive seasonal decomposition via statsmodels.tsa.seasonal.seasonal_decompose() with a fixed periodicity of 12 months. Seasonality was considered present if the mean absolute seasonal component exceeded 10% of the mean absolute observed values. While this threshold is commonly used as a practical heuristic in applied time series analysis, it is not established as a formal or universal standard in statistical literature.¹¹ This method allowed visual and quantitative assessment of trend, seasonal, and residual components.

with a fixed periodicity of 12 months. Each time series Y_t was decomposed into three components:

Y_{t} = T_{t} + S_{t} + R_{t}

where:

Y_t is the observed value at time ttt,

T_t is the trend component,

S_t is the seasonal component, and

R_t is the residual (irregular) component.

Seasonality was considered present if the mean absolute seasonal component exceeded 10% of the mean absolute observed values, defined as:

\frac{mean | S |}{mean | Y |} > 0.10

This criterion allowed for a quantitative threshold to identify meaningful seasonal patterns in mortality data. The strength of seasonality was reported using this ratio.

SARIMA model specification

The SARIMA (Seasonal, Auto Regressive, Integrated, Moving, Average) model is another popular time series forecasting model that extends the ARIMA model to account for seasonality in the data and was first described by Box and Jenkins in 1970.¹² It involves seasonal autoregressive (SAR) and seasonal moving average (SMA) components in addition to the non-seasonal ones, which are: Seasonal Auto Regressive (SAR), Integration (I), and Seasonal Moving Average (SMA). Here’s the equation for a SARIMA (p,d,q)(P,D,Q_s) model:

\begin{matrix} (1 - ϕ_{1} L - ϕ_{2} L^{2} - \dots - ϕ_{p} L^{p}) {(1 - L)}^{d} {(1 - L^{s})}^{D} X_{t} \\ \begin{matrix} = (1 + θ_{1} L + θ_{2} L^{2} + \dots + θ_{q} L^{q}) \\ (1 + Θ_{1} L^{s} + Θ_{2} L^{2 s} + \dots + Θ_{Q} L^{Q s}) Z_{t} \end{matrix} \end{matrix}

Where:

$X_{t}$ is the differenced time series with seasonality.

$L$ represents the lag operator, which shifts the time series.

$s$ represents the seasonal period (e.g. 12 for monthly data).

$ϕ_{1}, ϕ_{2}, \dots, ϕ_{p}$ are the non-seasonal autoregressive coefficients.

$d$ is the order of non-seasonal differencing.

$D$ is the order of seasonal differencing.

$θ_{1}, θ_{2}, \dots, θ_{q}$ are the non-seasonal moving average coefficients.

$Θ_{1}, Θ_{2}, \dots, Θ_{Q}$ are the seasonal moving average coefficients.

$Z_{t}$ is a white noise error term.

This equation extends the basic ARIMA model by incorporating both non-seasonal and seasonal components, which are commonly seen in time series data exhibiting seasonality. In the SARIMA model, we specify the values of $p$ , $d$ , $q$ for non-seasonal components, as well as $P$ , $D$ , $Q$ for seasonal components, and $s$ represents the seasonal period. The seasonal components help capture the recurring patterns or seasonality in the data.

We implemented a grid search algorithm to identify optimal Seasonal Autoregressive Integrated Moving Average (SARIMA) parameters. In this analysis, these components were selected as follows:

● Non-seasonal terms (p, d, q):

○ p (autoregressive order): Tested values 0–3

○ d (differencing order): Tested values 0–2 (stationarity confirmed via Augmented Dickey-Fuller test)

○ q (moving average order): Tested values 0–3

● Seasonal terms (P, D, Q, s):

○ P, D, Q: Tested values 0–1 (due to limited data length)

○ S or, m (seasonal period): Fixed at 12 for monthly data

Stationarity was assessed and ensured through automated differencing selection during the grid search process. Non-seasonal differencing (d) was tested up to order 2, and seasonal differencing (D) up to order 1, with non-stationary or invalid configurations (e.g. p = q = 0) excluded during model fitting. ADF tests confirmed non-stationarity required d = 2; seasonal differencing (D = 1) addressed annual patterns without overcorrection. A comprehensive grid search evaluated 384 SARIMA parameter combinations—non-seasonal (p = 0–3, d = 0–2, q = 0–3) and seasonal (P = 0–1, D = 0–1, Q = 0–1; m = 12 for monthly data), with the model minimizing the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) selected to balance fit and parsimony. Stationarity was ensured via Augmented Dickey-Fuller (ADF) tests, requiring second-order non-seasonal differencing ( d = 2 ) and first-order seasonal differencing ( D = 1 ). Grid search was chosen over ACF/PACF plots for its robustness in handling complex patterns (e.g. COVID-19 volatility in 2020–2021), supported by Hyndman and Athanasopoulos.¹¹ In the absence of seasonality, the analysis would default to non-seasonal ARIMA models while maintaining the same validation rigor via cross-validation and residual diagnostics. Forecasts under such a scenario would proceed without incorporating seasonal components.

Residual diagnostics

Residual normality was assessed using Jarque-Bera and Shapiro-Wilk tests, with Q-Q plots and histograms for visualization. Autocorrelation was tested via the Ljung-Box test (lag 12). Volatility was evaluated using Engle’s ARCH-LM test, with GARCH(1,1) modeling if significant.

Volatility assessment

Volatility was assessed using Engle’s ARCH-LM test to SARIMA residuals, testing the null hypothesis of no volatility clustering (α = 0.05). If significant ARCH effects were detected (p < 0.05), we fitted a GARCH(1,1) model to quantify volatility persistence. Conditional variance estimates were then used to adjust forecast confidence intervals. Model diagnostics included visualizations of squared residuals and GARCH-fitted volatility, alongside parameter estimates (ARCH term α and GARCH term β).

Cross-validation procedure

A fivefold time-series cross-validation was implemented using TimeSeriesSplit (n_splits = 5, test_size = 12). Training/test splits were:

Fold 1: 24/12 (Jan 2018–Dec 2019/Jan 2020–Dec 2020)

Fold 2: 36/12 (Jan 2018–Dec 2020/Jan 2021–Dec 2021)

Fold 3: 48/12 (Jan 2018–Dec 2021/Jan 2022–Dec 2022)

Fold 4: 60/12 (Jan 2018–Dec 2022/Jan 2023–Dec 2023)

Fold 5: 72/12 (Jan 2018–Dec 2023/Jan 2024–Dec 2024) The final model used all 84 points for training. Performance metrics (MAE, RMSE, MASE) were averaged across folds

Forecasting and uncertainty quantification

An 18-month forecast was generated for each cause of neonatal death using the fitted SARIMA model. Prediction intervals (95% CI) were derived from the state-space representation of the SARIMA model. Additionally, annual aggregates of the forecast were calculated to support policy planning and intervention timing. The model also identified months with high relative risk based on seasonal decomposition, flagging July, August and January as critical periods with heightened relative risk.

Software implementation

All analyses were done with the help of Python Script Language version 3.13.3.¹³ The code was written and edited with “Sublime Text” software (Sublime HQ Pty Ltd., Woollahra, Sydney) and compiled with respective compiler. Following libraries were imported and used for analyzing the data generating the plots: “pandas,”¹⁴ “numpy,”¹⁵ “matplotlib¹⁶,” “statsmodels,”¹⁷ “sklearn.metrics,”¹⁸ scikit-learn,¹⁸ XGBoost,¹⁹ LightGBM²⁰ and “itertools.”²¹

Ethical considerations

This analysis was based on publicly available, anonymized, and aggregated data, displayed in public dashboards by the Government of Bangladesh for mass-dissemination of the information in the form of monthly and yearly statistics. As such, it did not require institutional review board approval to conduct this study.

Results

Time series model performance and forecast for respiratory mortality

The analysis was conducted on April 14, 2025, using monthly number of deaths medically certified as resulting from respiratory conditions with a total observation window spanning from January 2018 to December of 2024.

Seasonality and risk periods

Seasonality was detected in the mortality data for deaths medically certified as resulting from respiratory conditions, with a seasonality strength of 0.10 on a 0–1 scale (Figure 1). Seasonal decomposition revealed that mortality was highest in July (normalized to 100.0%), followed by August (79.5%) and January (66.2%). This normalization means that July serves as the reference point, and the percentages represent the relative level of mortality in other months compared to July. Specifically, August’s mortality rate was 79.5% of the rate observed in July, and January’s rate was 66.2% of July’s rate. These results indicate that respiratory-related mortality tends to peak during the late monsoon and early winter months, with July showing the strongest seasonal effect relative to other months. Also, there is a general increasing trend seen in the number of respiratory MCCD.

Figure 1.

Decomposition of the time-series of deaths medically certified as resulting from respiratory conditions for January, 2018 to December, 2024.

Model selection and fit

The optimal forecasting model was identified as a Seasonal ARIMA model with configuration SARIMA(1, 2, 3)(0, 1, 1)[12], indicating a first-order non-seasonal autoregressive component, second-order non-seasonal differencing, a third-order non-seasonal moving average component, and a seasonal moving average term of order 1 with annual seasonality. ADF tests confirmed stationarity required second-order differencing; seasonal differencing addressed annual patterns without overcorrecting. The model achieved an Akaike Information Criterion (AIC) of 1033.48 and a Bayesian Information Criterion (BIC) of 1046.97, demonstrating a good fit compared to other competing models.

Cross-validation performance

To evaluate model generalizability while preserving temporal dependencies, we implemented a fivefold time series cross-validation scheme (Figure 2, Table 1) with fixed 12-month test windows using scikit-learn’s TimeSeriesSplit. Each fold maintained strict chronological ordering, with training sets expanding incrementally (starting with 24 months and culminating in 72 months of training data) and each subsequent fold testing on a unique 12-month out-of-sample period (2020–2024). This design ensured that all forecasts were evaluated on unseen data while accounting for seasonal patterns and pandemic-era disruptions. Performance metrics (MAE, RMSE, MASE) were aggregated across all folds to robustly estimate real-world forecasting accuracy.

Figure 2.

Cross validation results (fivefold time series) for SARIMA model (1,2,3)(0,1,1)[12].

Table 1.

Cross-validation performance metrics across fivefolds.

Fold	MAE	RMSE	MASE	Description
1.0	567.80	630.31	1.21	Fold 1 performance
2.0	623.58	733.19	0.69	Fold 2 performance
3.0	783.30	909.98	0.49	Fold 3 performance
4.0	299.83	342.95	0.27	Fold 4 performance
5.0	326.15	401.94	0.23	Fold 5 performance
Mean	520.13	603.67	0.58	Mean across all folds

The model demonstrated generally good forecasting performance across most folds, with the mean absolute scaled error (MASE) remaining below 1 in Folds 2 through 5. Performance was relatively less favorable in Fold 1 (MASE = 1.21), possibly reflecting instability or structural shifts in the respiratory mortality pattern during that period. Fold 1′s higher MASE (1.21) likely reflects COVID-19 disruptions (2020–2021); subsequent folds stabilized as data normalized.

Across all fivefolds, the model achieved a mean absolute error (MAE) of 520.13 and a root mean square error (RMSE) of 603.67. The overall MASE was 0.58, indicating that the model outperformed a naive seasonal benchmark in most validation periods. Cross-validation metrics reflected temporal variability, while final model metrics represented aggregated performance.

Residual diagnostics

Diagnostic tests showed that the residuals of the SARIMA model (Figure 3) deviated from a normal distribution, as indicated by the Jarque-Bera test (Test Statistic: 193.475, p < 0.001) and the Shapiro-Wilk test (Test Statistic: 0.897, p < 0.001). This suggests that the residuals do not follow a normal distribution, which could impact the accuracy of parametric prediction intervals. Despite the statistical rejection of normality, diagnostic plots were examined to assess the adequacy of the SARIMA model. The Q-Q plot showed general alignment with the theoretical quantiles, though mild deviations were observed in the tails. The histogram of residuals displayed a roughly symmetric, bell-shaped distribution, suggesting approximate normality in practical terms. While minor departures from normality exist, they are common in empirical time series data and are not considered sufficient to invalidate the model for inference or forecasting purposes.¹¹ It is helpful but not essential for residuals to follow a normal distribution, as these properties simplify prediction interval calculation but are often difficult to achieve or enforce.¹¹ Again, to deal with minute degree of non-normality stemmed from pandemic outliers, model robustness was confirmed via bootstrapped CIs covering 95% of observed values.

Figure 3.

Residual diagnostics for SARIMA model (1,2,3)(0,1,1)[12].

The Ljung-Box test (lag 12) showed no significant autocorrelation in the residuals (Test Statistic: 4.563, p = 0.971), supporting the adequacy of the SARIMA model in capturing temporal dependencies in the data, and the residuals behave as mere white noise. Furthermore, the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the residuals revealed no significant serial correlation.

Data transformation needs and volatility assessment to deal with residuals

Log transformation was to be applied when both (1) Shapiro-Wilk normality test p-value improved (original p < transformed p) and (2) variance ratio (transformed/original) fell below 0.9. Log transformation was not applied as it failed to improve normality (Shapiro-Wilk p value was 0.018, which was not enough improvement) and would have artificially collapsed variance (ratio: 0.000), making the original scale preferable for unbiased forecasting.

Again, after volatility assessment, significant volatility clustering was not detected (ARCH test p > 0.05); therefore, no GARCH augmentation was required and we continued using original SARIMA model.

Overall model accuracy

The final model performance, estimated using bootstrapping techniques, is summarized below:

Mean Absolute Error (MAE): 259.123 (95% CI: 213.546 − 313.806)

Root Mean Squared Error (RMSE): 337.238 (95% CI: 252.522 − 449.872)

Mean Absolute Scaled Error (MASE): 0.410 (95% CI: 0.320−0.516)

Cross Validation and bootstrapped metrics were not found to be exactly same, because they differ in purpose. CV evaluates stability across time-periods, while bootstrapping quantifies sampling uncertainty. These values indicate that the model performs well in forecasting monthly respiratory mortality, with relatively low scaled error and acceptable absolute error margins, supporting its validity for practical forecasting applications.

Forecasting

An 18-month forecast was generated using the selected SARIMA model, covering January 2025 through June 2026 (Table 2, Figure 4).

Table 2.

Forecasted respiratory mortality cases with 95% confidence intervals.

Date	Month-year	Forecast	95% CI (lower–upper)
2025-01-01	Jan 2025	2052.39	1437.09–2667.69
2025-02-01	Feb 2025	1856.17	986.36–2725.99
2025-03-01	Mar 2025	1724.97	804.35–2645.58
2025-04-01	Apr 2025	1640.30	699.19–2581.40
2025-05-01	May 2025	1640.53	685.81–2595.25
2025-06-01	Jun 2025	1617.85	651.22–2584.47
2025-07-01	Jul 2025	1885.05	906.78–2863.32
2025-08-01	Aug 2025	1781.09	791.03–2771.15
2025-09-01	Sep 2025	1632.03	630.06–2633.99
2025-10-01	Oct 2025	1587.84	574.43–2601.26
2025-11-01	Nov 2025	1572.80	548.60–2597.01
2025-12-01	Dec 2025	1702.11	670.08–2734.14
2026-01-01	Jan 2026	1863.00	806.24–2919.76
2026-02-01	Feb 2026	1799.76	711.46–2888.05
2026-03-01	Mar 2026	1711.39	600.25–2822.53
2026-04-01	Apr 2026	1638.47	506.91–2770.04
2026-05-01	May 2026	1639.73	488.29–2791.17
2026-06-01	Jun 2026	1614.37	442.94–2785.80
Year	Avg forecast	Avg lower 95% CI	Avg upper 95% CI
2025	1724.43	782.08	2666.77
2026	1711.12	592.68	2829.56

Figure 4.

Forecasts from the SARIMA model (1,2,3)(0,1,1)[12].

Point forecasts and 95% prediction intervals are provided in Table 2.

Average monthly forecasted mortality counts were 1724.43 (95% CI: 782.08–2666.77) for 2025 and 1711.12 (95% CI: 592.68–2829.56) for 2026, with the highest risk period expected in January 2025, consistent with earlier seasonal analysis.

Comparison with machine learning and other non-linear models

We further compared this SARIMA model with other machine learning model (Prophet, XGBoost, and Random Forest), and then employed Average, Weighted, and Stacked Ensemble techniques. The process included feature engineering (temporal lags, rolling statistics), hyperparameter tuning, rigorous evaluation via time-series cross-validation and Diebold-Mariano tests. Results showed RandomForest as the best standalone model (MAE: 384.70), while SARIMA outperformed all models in comparisons (MAE: 230.93). Ensembles, particularly the Weighted Ensemble (MAE: 261.11), offered balanced performance. As, these techniques could not improve the current model and forecasting, we omitted the detailed methodology and results here, however they are supplied as Supplemental material with this article.

Discussion

This study examined trends and forecasted respiratory-related mortality in Bangladesh using time series analysis of medically certified death records from January 2018 to December 2024. The Seasonal Autoregressive Integrated Moving Average (SARIMA) model provided a robust framework to capture temporal dependencies and generate forecasts through mid-2026.

A key finding of this analysis was the detection of seasonality in respiratory-related mortality. Mortality consistently peaked during the late monsoon and early winter months, with July emerging as the month of highest number of deaths, followed by August and January. National surveillance data indicate that seasonal influenza occurs annually among hospitalized patients across Bangladesh, typically showing a single peak during the rainy season (May–September) and this pattern is often linked to increased mortality among older adults and the elderly.^22,23 Our findings (peaked in July followed by August) are exactly similar to these findings.

In neighboring country, India, also a lower middle-income country with a population of approximately 1.2 billion people, influenza virus circulation usually peaks during monsoon season (June–September) with secondary peaks during winter periods (November–February).²⁴ In India they have found that most influenza-associated deaths occur within the typical influenza season in India, which generally occurs from April to September each year.²⁵ These pieces of evidence also support our claims. Information from both Bangladesh and India suggests that influenza may be an important contributor to increased respiratory-related mortality in these countries. However, as of 2025, Bangladesh still lacks a defined seasonal influenza vaccination policy.²⁶

In a spatio-temporal study conducted on patients in Rajshahi, Bangladesh in 2023, it was found that the number of respiratory patients were a lot higher, during January–February compared to March–April.²⁷ However, this study was limited to enrolling patients in the study only during January–April of 2019 and 2020. Therefore, concerning other months this study could provide only limited understanding. But then again, we identified third peak in January, which agrees with the findings of this spatiotemporal study. Another previous research, conducted in Bangladesh, based on prescription data indicates that respiratory tract infections in northwestern Bangladesh peak in March, with a smaller rise in June.²⁸ This suggests potential regional variability in respiratory disease dynamics, underscoring the need for spatially disaggregated or panel data models for more targeted public health planning. However, our findings reflect national-level mortality patterns, which can draw a clearer picture than previous research works, being the first of its kind.

Findings of the current study therefore revealed clear seasonal patterns in respiratory mortality that hold important implications for public health planning and intervention. One might expect that the main brunt respiratory mortality are borne by the winter months due to acute exacerbation of asthma and sorts, however, our study shows that monsoon season precedes the winter in this respect probably due to influenza and other infectious agents. This phenomenon has more importance that it seems because recently some researchers have reported that frequencies of clinically reported influenza (CRI) were higher in patients with STEMI and elevated troponin levels, and STEMI risk appeared greater during influenza seasons in unadjusted analysis, although these associations were not statistically significant after adjustment for confounders.²² They have recommended Further research to understand these relationships better and to investigate the potential benefits of infection control measures and influenza vaccinations in reducing AMI incidence.²²

Although Bangladesh is geographically located in the Northern Hemisphere and classified within the Tropical Asia influenza vaccination zone, it adopts the Southern Hemisphere influenza vaccine formulation due to the similarity between circulating virus strains in Bangladesh and those typically found in the Southern Hemisphere.²⁹ The World Health Organization recommends that influenza vaccination timing be aligned with local disease patterns; in Bangladesh, this corresponds to administering vaccines during the pre-monsoon months (April–May) to offer protection ahead of the mid-year influenza peak.³⁰

The SARIMA(1,2,3)(0,1,1)[12] model was selected based on information criteria and demonstrated adequate fit and generalizability. Cross-validation results indicated consistent performance across most temporal folds, with a mean absolute scaled error (MASE) of 0.41, suggesting improved accuracy over naïve seasonal benchmarks. While Fold 1 showed slightly diminished performance, potentially due to structural disruptions such as the COVID-19 pandemic, the bootstrapped error metrics (MAE: 259.123; RMSE: 337.238; MASE: 0.41) support the model’s utility for practical forecasting.

The decomposition analysis of respiratory-related deaths in Bangladesh from 2018 to 2024 reveals a clear upward trend over time, with a pronounced spike in 2021 likely linked to the COVID-19 pandemic. A consistent seasonal pattern emerges, with mortality peaking during the mid-year months, aligning with the monsoon and influenza season. While the trend and seasonality explain much of the variation, residuals highlight irregular fluctuations, particularly during periods of public health crises. These findings confirm both long-term increases and predictable seasonal surges in respiratory mortality, supporting the utility of time series models for forecasting and preparedness.

The forecast results for respiratory-related mortality in Bangladesh indicate that elevated death counts will persist throughout 2025 and 2026, with slight seasonal fluctuations. The average projected monthly mortality is approximately 1724 deaths in 2025 and 1711 in 2026, showing a relatively steady burden. Peaks are expected during January and July 2025, aligning with the typical seasonal trend. The forecast intervals are wide (e.g. 95% CI for Jan 2025: 1437–2668), reflecting uncertainty but also capturing the potential for surges. Notably, the mid-year months (June–August) consistently show higher predicted values, reinforcing the seasonal risk during the monsoon period. These projections suggest continued vigilance and readiness are necessary, especially around peak periods.

While the SARIMA model demonstrated strong overall predictive accuracy, visual inspection of the fitted values (Figure 3) reveals a slight lead in the model’s peaks and troughs relative to the actual data post-2020, a phenomenon that can sometimes indicate over-differencing. Furthermore, the forecasts in Figure 4 project a smoothed, repetitive seasonal pattern, which, while statistically sound, may not capture unforeseen shocks or gradual changes in underlying risk factors. These characteristics rightly raise the question of the model’s added value beyond seasoned public health expert judgment. Experts in Bangladesh are indeed aware of the general monsoon and winter peaks in respiratory illness. However, the primary value of this model is not in replacing this expert knowledge, but in augmenting it with a quantitative, data-driven framework. It provides:

Quantitative Precision: It moves beyond general seasonal awareness to provide specific, monthly forecasted mortality figures (e.g. ~1885 deaths in July 2025, on average ~1724 deaths in the months of 2025) which are crucial for logistical planning, such as determining vaccine doses, bed allocation, and staffing needs.

Objective Benchmarking: The model offers an objective baseline against which to measure the impact of unusual events (e.g. a new virus variant or extreme pollution event). A significant deviation from the forecast would immediately signal an anomaly requiring investigation.

Uncertainty Quantification: The wide prediction intervals honestly communicate the inherent uncertainty in long-term health forecasting, especially in a dynamic environment like Bangladesh. This helps policymakers plan for a range of scenarios, from the expected to the severe.

Therefore, the model is best viewed not as a crystal ball, but as a robust, evidence-based tool that supports and refines proactive public health action, complementing the crucial role of domain expertise. Overall, these findings demonstrate the utility of SARIMA-based forecasting for respiratory mortality surveillance in Bangladesh. By identifying high-risk periods and projecting future burden, the model provides a foundation for proactive interventions such as public health messaging, clinical preparedness, and targeted resource distribution. As climate variability, air quality, and urbanization continue to influence respiratory health in Bangladesh, such predictive tools are vital for adaptive, evidence-driven public health strategies.

Strengths

Robust Model Selection: The study employed an exhaustive grid search to identify the optimal SARIMA model, minimizing overfitting and ensuring reliable parameter estimation.

High Predictive Accuracy: The model demonstrated strong forecasting performance (MASE = 0.58, MAE = 520.13), outperforming naïve seasonal benchmarks.

Policy-Relevant Insights: The identification of high-risk months (July, August, January) provides actionable data for public health interventions, such as pre-monsoon vaccination drives and pollution control measures.

Transparent Methodology: Bootstrapped confidence intervals, rigorous residual diagnostics and fixed randomization seed (42) enhance the reliability and reproducibility of findings.

Limitations

Short Time Series and Sample Size: The analysis was limited to 84 months of data, which may affect the robustness of seasonal parameter estimates. A formal sample size calculation was not conducted as the study utilized the entire population of available national monthly data for the period; nevertheless, the limited temporal scope remains a constraint on the model’s stability.

While our cross-validation strategy robustly assessed model performance, the limited time span (84 months) constrained the absolute number of independent test periods.

While our SARIMA model residuals showed statistically significant non-normality in formal testing (Jarque-Bera p < 0.001), several factors support their practical adequacy for public health forecasting: (1) the Ljung-Box test confirmed residual whiteness (p = 0.971), indicating proper capture of temporal dependencies; (2) cross-validation demonstrated robust out-of-sample performance (mean MASE = 0.58); and (3) the model successfully identified clinically meaningful seasonal patterns. While minor departures from normality exist, they are common in empirical time series data and are not considered sufficient to invalidate the model for inference or forecasting purposes.¹¹ For public health decision-making, the model’s interpretability and demonstrated forecasting precision outweigh strict adherence to normality assumptions.

Lack of Covariate Integration: The model did not incorporate environmental or socio-economic covariates, which could improve precision. Future studies should explore SARIMAX models.

Conclusion

This study utilized SARIMA modeling to analyze and forecast respiratory-related mortality trends in Bangladesh from 2018 to 2026, revealing distinct seasonal patterns with peaks in July–August (monsoon) and January (winter). The findings align with existing evidence on seasonal influenza surges and environmental risk factors, reinforcing the need for timely public health interventions such as pre-monsoon vaccination campaigns and air pollution mitigation strategies. The SARIMA(1,2,3)(0,1,1)[12] model demonstrated strong predictive accuracy (MASE: 0.41; MAE: 259.12), projecting stable but elevated mortality rates in 2025–2026 (around 1700 monthly deaths), with the highest risk in January and mid-monsoon months. The wide forecast intervals (e.g. 95% CI for Jan 2025: 1437–2668) highlight uncertainty due to external shocks (e.g. pandemics, climate extremes), underscoring the need for adaptive surveillance systems that can trigger contingency plans when observations move outside the forecasted range.

Supplemental Material

sj-docx-1-phj-10.1177_22799036251395248 – Supplemental material for Temporal trends and forecasting of respiratory mortality in Bangladesh: A SARIMA model for seasonal mortality risk and public health action

Supplemental material, sj-docx-1-phj-10.1177_22799036251395248 for Temporal trends and forecasting of respiratory mortality in Bangladesh: A SARIMA model for seasonal mortality risk and public health action by Pratyay Hasan, Tazdin Delwar Khan, Minhajul Abedin and Mohammad Emdadul Haque in Journal of Public Health Research

Footnotes

Acknowledgements

This study was based on data obtained from the publicly available Real-time Health Information Database, maintained by the Directorate General of Health Services (DGHS), Bangladesh. The authors are solely responsible for the analysis, interpretation, and conclusions presented in this manuscript.

ORCID iDs

Pratyay Hasan

Tazdin Delwar Khan

Authors’ contributions

Pratyay Hasan: Conceptualization, Methodology, Software, Data Curation, Formal Analysis, Supervision, Visualization, Project Administration, Resources, Writing – Original Draft, Writing – Review & Editing. Tazdin Delwar Khan: Conceptualization, Methodology, Data Curation, Supervision, Visualization, Project Administration, Resources, Writing – Original Draft, Writing – Review & Editing. Minhajul Abedin: Methodology, Software, Data Curation, Formal Analysis, Visualization, Writing – Review & Editing. Mohammad Emdadul Haque: Conceptualization, Supervision, Project Administration, Writing – Review & Editing.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Dr. Pratyay Hasan (MBBS, MPH) serves as an Editorial Review Board Member for the Journal of Public Health Research (Sage). The author declares that there are no other conflicts of interest related to this work.

Data availability statement

The data used in this study are all submitted with this manuscript

Transparency declaration

The lead author (Pratyay Hasan) affirms that this manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained. We (the authors) confirm that we have adhered to relevant EQUATOR guidelines, and the reporting method is referenced in the abstract and methods section of our paper. All authors have read and approved the final version of the manuscript. Pratyay Hasan had full access to all of the data in this study and takes complete responsibility for the integrity of the data and the accuracy of the data analysis.

Supplemental material

Supplemental material for this article is available online.

References

Korzh

Global burden of chronic respiratory diseases and risk factors, 1990–2019: an update from the Global Burden of Disease Study 2019, https://repo.knmu.edu.ua/handle/123456789/32262 (2023, accessed 7 May 2025).

Bishwajit

Tang

Yaya

, et al Burden of asthma, dyspnea, and chronic cough in South Asia. Int J Chron Obstruct Pulmon Dis 2017; 12: 1093–1099.

Mirsaeidi

Motahari

Taghizadeh Khamesi

, et al Climate change and respiratory infections. Ann Am Thorac Soc 2016; 13: 1223–1230.

Eng

L Ellingsen

Pedersen

, et al Cause of death certificates in nursing homes: does quality matter? A retrospective review from two counties in Norway. Scand J Public Health 2024; 52: 711–717.

World Health Organization. Official Records of the World Health Organization, No. 160: Twentieth World Health Assembly. World Health Organization, 1967. 1967.

Hazard

Chowdhury

Adair

, et al The quality of medical death certification of cause of death in hospitals in rural Bangladesh: impact of introducing the international form of medical certificate of cause of death. BMC Health Serv Res 2017; 17: 688.

Wang

Zhang

(eds). Theory and application with seasonal time series. 1st ed. Nankai University Press, 2008.

World Health Organization (WHO). International Guidelines For Certification and classification (Coding) of COVID-19 as cause of death: Based on ICD. World Health Organization, 2020.

World Health Organization (WHO). International Statistical Classification of diseases and Related Health Problems, 10th revision (ICD-10). World Health Organization, 2016.

10.

Directorate General of Health Services (DGHS). Medical Certification of Cause of Death (MCCOD) Module (Real-Time Health Information Dashboard). Directorate General of Health Services, Ministry of Health and Family Welfare, Government of Bangladesh, https://dashboard.dghs.gov.bd/pages/dashboard_mccod_test.php (2023, accessed 7 May 2025).

11.

Hyndman

Athanasopoulos

Forecasting: principles and practice. (Paperback ed.). OTexts 2014, pp. 52–57. https://otexts.com/fpp3/

12.

Box

GEP

Jenkins

. Time Series Analysis: Forecasting and Control. Holden-Day, 1970.

13.

Van Rossum

Drake

. Python 3 Reference Manual. CreateSpace, 2009.

14.

McKinney

, et al Data structures for statistical computing in python. In: Proceedings of the 9th Python in Science Conference. Austin, TX, 2010, pp. 51–56.

15.

Harris

Millman

van der Walt

, et al Array programming with NumPy. Nature 2020; 585: 357–362.

16.

Hunter

JD.

Matplotlib: A 2D graphics environment. Comput Sci Eng 2007; 9: 90–95.

17.

Sea. Seabold S and Perktold J. Statsmodels: econometric and statistical modeling with Python. In: 9th Python in Science Conference (SciPy 2010), Austin, TX, 2010.

18.

Pedregosa

Varoquaux

Gramfort

, et al Scikit-Learn: machine learning in Python. J Mach Learn Res 2011; 12: 2825–2830.

19.

Chen

Guestrin

. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, 2016, pp. 785–794.

20.

Meng

Finley

, et al. LightGBM: a highly efficient gradient boosting decision tree. AdvNeural Inf Process Syst 2017; 30: 3149–3157.

21.

Van Rossum

. The Python Library Reference, release 3.8.2. Python Software Foundation, 2020.

22.

Ahmed

Aleem

Roguski

, et al Estimates of seasonal influenza-associated mortality in Bangladesh, 2010-2012. Influenza Other Respi Viruses 2018; 12: 65–71.

23.

Zaman

Alamgir

Rahman

, et al Influenza in outpatient ILI case-patients in national hospital-based surveillance, Bangladesh, 2007-2008. PLoS One 2009; 4: e8452.

24.

Chadha

Potdar

Saha

, et al Dynamics of influenza seasonality at sub-regional levels in India and implications for vaccination timing. PLoS One 2015; 10: e0124122.

25.

Narayan

Iuliano

Roguski

, et al Burden of influenza-associated respiratory and circulatory mortality in India, 2010-2013. J Glob Health 2020; 10: 010402.

26.

Haider

Hassan

MZ.

Seasonal influenza surveillance and vaccination policies in the WHO South-East Asian region. BMJ Glob Health 2025; 10: e017271.

27.

Roy

Ahmed

Ghosh

, et al Spatio-temporal evaluation of respiratory disease based on the information provided by patients admitted to a medical college hospital in Bangladesh using geographic information system. Heliyon 2023; 9: e19596.

28.

Khan

Hasan

Islam

, et al Forecasting respiratory tract infection episodes from prescription data for healthcare service planning. Int J Data Sci Anal 2021; 11: 169–180.

29.

Institute of Epidemiology, Disease Control and Research (IEDCR), Bangladesh. Seasonal Influenza Vaccine and Surveillance Report, https://iedcr.portal.gov.bd/sites/default/files/files/iedcr.portal.gov.bd/page/baa8a58f_5df8_4ca6_8e03_61a42bcbfe86/2023-02-28-05-07-9715c5d71887c53bc74a5f2f03715826.pdf (2023).

30.

World Health Organization (WHO). Influenza Surveillance Landscape Analysis, Bangladesh, https://iris.who.int/bitstream/handle/10665/352557/Bangladesh-eng.pdf (2019).

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

2.49 MB