Abstract
Although significant progress has been made in reducing malaria transmission in Zimbabwe, the path to elimination remains challenging. The disease remains a persistent threat, particularly in resource-constrained areas such as Mberengwa, necessitating an urgent need to understand the demographic, behavioural, socioeconomic, and structural factors influencing long-lasting insecticide-treated net use and case severity. This study investigated these factors using individual malaria case data to inform the development of locally tailored strategies for malaria elimination. Individual malaria case data from 2019 to 2024 were collected from the District Health Information System Tracker-2 database for this study. Data were triangulated with line list and health facility register data to verify records and complete the missing data. The resulting 662 cases were analysed using stratified descriptive analysis, multivariate logistic regression, and Random Forest classification models. There is an overall gradual decline in the annual Test Positivity Rate, despite seasonal peaks. A critical finding was the disparity between long-lasting insecticide-treated net ownership (95%) and use (7.7%), suggesting that ownership does not translate to protective use. In the multivariate logistic regression, none of the tested variables were significant determinants of Long-Lasting Insecticide Net use. However, random forest modelling identified age, time to seek care, religious group, distance to health facilities, and education level as the top 5 influential factors. For malaria case severity, greater distance to a health facility (P < .001) and increasing age (P = .002) were consistently identified as significant factors associated with severity. The Random Forest model demonstrated enhanced performance in discriminating case severity compared to Logistic Regression. The findings of this study highlight that effective malaria elimination requires a combined focus on behavioural change, structural improvements in healthcare access, and data-driven programming supported by advanced analytics. Tailored malaria elimination strategies must address the long-lasting insecticide-treated net use gap and structural barriers.
Keywords
Plain English Summary
This study used detailed individual patient information collected between 2019 and 2024 to understand the main factors influencing long-lasting insecticide-treated net use and to predict serious cases in Mberengwa District, Zimbabwe. Overall, the number of malaria cases has been slowly decreasing, although it usually increases during the rainy seasons. A noticeable decline accelerated after April 2020, possibly reflecting the impact of increased malaria elimination activities or changes in health-seeking behaviour during the Covid-19 pandemic. Regarding treated bed nets, despite very high ownership (95%), very few people were using them (7.7 %). This large gap indicates that simply having a net is insufficient to prevent bycatch. None of the factors analysed in the regression analysis were found to be important indicators of net use. However, advanced computer models have identified age, time to seek treatment, religion, distance to clinic, and education level as key factors. Older age and living far from the clinic were consistently associated with an increased risk of serious diseases. This result strongly suggests that addressing practical access challenges is important, as early care is necessary for everyone, particularly for vulnerable groups such as older adults. Finally, this study successfully showed that advanced computer models are more effective at assessing the risk of more serious malaria than traditional statistical methods. These models are effective in public health because they can uncover hidden relationships that traditional statistical tools may miss, leading to more accurate forecasts and improved health programmes. Encouraging proper use of bed nets and strengthening healthcare access, especially in rural areas, remain critical for malaria elimination in Mberengwa, Zimbabwe.
✓ This study utilised individual malaria case data (2019-2024) to explore the disparity between long-lasting insecticidal net use and severity predictors.
✓ The number of malaria cases showed a marked decline after April 2020, reversing the earlier upwards trend.
✓ Random Forest Model identified age, time spent seeking care, religion, distance, and educational level as the most important predictors of long-lasting insecticidal net use.
✓ Further distance to health facilities and increasing age were consistently identified as significant factors predicting malaria severity.
✓ Integrating machine learning (Random Forest) enhanced the predictive performance and provided deeper insights into the determinants of malaria severity and long-lasting insecticide-treated net use.
Introduction
The World Health Organisation (WHO) has established an ambitious objective to reduce global malaria incidence and mortality rates by at least 90% by 2030. 1 Substantial progress in malaria control has led countries, including Zimbabwe, to reorient their efforts towards the pre-elimination phase in low-transmission districts. 2 The success of this phase is critically influenced by the complex interplay of demographic, socioeconomic, behavioural, and structural factors that shape transmission dynamics and influence the effectiveness of elimination strategies.3,4 In Zimbabwe, where malaria remains a persistent challenge, especially in regions characterised by limited resources, it is imperative to implement integrated and contextually tailored strategies to disrupt residual transmission.
Mberengwa District, located in the Midlands Province, was enrolled in malaria pre-elimination activities in 2018 after achieving a significant reduction in annual malaria incidence to less than 5 cases per 1000 population. 5 These gains were primarily driven by intensified vector control, enhanced surveillance, and improved case management. 6 Despite this success, the district continues to grapple with persistent local transmission, often occurring as seasonal surges that threaten elimination efforts. 7 Pregnant women, young children, and mobile populations are among the most vulnerable groups.8,9 Key gaps remain in understanding the complex interactions of demographic, socioeconomic, behavioural, and structural determinants that drive low Long-lasting Insecticidal Net (LLIN) utilisation and predict malaria case severity in this low-incidence context. The mass and continuous distribution of LLINs constitutes the principal vector control intervention in the district, a strategy prioritised since Mberengwa’s enrolment in pre-elimination activities. 10 Distribution efforts aim for high sleeping space and population-protected coverage; however, the true rate of consistent protective use among the population, particularly among confirmed malaria cases, remains unquantified and is critical. Furthermore, the precise burden and predictors of malaria case severity in this transitioning pre-elimination context have not been adequately characterised.
The ultimate success of malaria elimination depends on addressing the unique local challenges that sustain residual transmission. 11 To maintain and accelerate progress towards elimination, there is an urgent need for granular local evidence that refines strategies to reduce malaria morbidity and mortality. Case-based surveillance provides individual-level malaria case data, which are required to identify micro epidemiological risk patterns, enabling resource allocation and targeted interventions. 12 Despite the availability of comprehensive case-based surveillance data in Mberengwa, there is a paucity of research that explicitly examines how demographic, behavioural, and structural factors interact to influence disparities in LLIN use and predict malaria case severity in this unique pre-elimination setting. Elucidating these complex and interacting determinants is essential to support data-driven programming and ensure the implementation of locally tailored public health strategies.
This study utilised retrospective individual malaria case data (2019-2024) to investigate the demographic, socioeconomic, behavioural, and structural factors that drive disparities in LLIN use and predict malaria case severity in the Mberengwa District. To achieve this aim, the study sought to (1) analyse the temporal trends of reported malaria cases, (2) identify and characterise the determinants of LLIN use, (3) identify and characterise the determinants of malaria case severity, and (4) compare the predictive performance of traditional statistical models (logistic regression) and machine learning approaches (random forest). These objectives were formulated to provide empirical evidence to guide focussed surveillance efforts, optimise resource distribution, and inform policy adjustments necessary for the final stages of malaria elimination. The insights derived from this study are necessary to target and optimise locally tailored malaria elimination strategies and resource deployment in the Mberengwa District.
Methods
Study Design
This was a retrospective observational study. This approach leveraged routinely collected individual malaria case records and health facility registers from 2019 to 2024. The design is a pragmatic approach that draws on existing routine data to conduct trend analyses and elucidate the determinants of LLIN use and malaria severity. A critical limitation of this retrospective observational study is the inherent inability to definitively establish cause-and-effect relationships. 13 However, its key advantage is the efficient use of readily available routine surveillance data for vital trend analysis and informing future malaria elimination strategies without requiring time- and resource-demanding primary data collection. 14
The study included fully investigated, laboratory-confirmed malaria case records (defined as positive results by Rapid Diagnostic Test or microscopy) reported between 2019 and 2024 that retained complete data following a systematic triangulation of records from the District Health Information System-2 Tracker database, facility line lists, and malaria registers. This systematic inclusion process resulted in a final dataset of 662 cases, representing approximately 89% of all reported cases. While restricting the analysis to laboratory-confirmed malaria cases may introduce a selection bias that could exaggerate the observed relationships, this methodological decision strengthens diagnostic validity and ensures the use of high-quality surveillance data. 15
Study Setting
The study was conducted in the Mberengwa District, a malaria pre-elimination setting in the southern part of Zimbabwe’s Midlands Province (Figure 1). Administratively, the district is managed by the Rural District Council and comprises 37 wards serviced by 37 health facilities. The local environment is defined by a distinct climatic pattern that includes a pronounced rainy season (generally October to March), which creates conditions conducive to malaria vector mosquito proliferation. 16 Following the transition to pre-elimination, vector control in Mberengwa emphasised a targeted foci-based response and the mass and continuous distribution of LLINs as the principal intervention. Diverse socio-economic activities in the district, such as artisanal mining, farming, cross-border trading, and fishing, significantly influence population dynamics and exposure risks, which are crucial for case-based surveillance systems.17,18 This specific combination of low incidence, structured surveillance, and complex socioeconomic activities makes Mberengwa a critical transitional landscape for studying malaria determinants.

Map of the study area.
Data Sources and Case Definition
The data utilised in this study comprised secondary individual malaria case records routinely collected in the Mberengwa District between 2019 and 2024. The primary data records used were case notifications, case investigations, and foci investigation forms documented in the DHIS-2 Tracker database. In the process of data triangulation, these records were integrated with information obtained from health facility malaria line lists, antenatal care (ANC), foci, and malaria registers. All data were routinely collected by trained health facility workers and district health information personnel as part of the National Malaria Control Programme’s (NMCP) enhanced surveillance process. This strategy was employed to complement the primary database by filling in missing information and cross-verifying the data where possible.19,20 The triangulation process aimed to improve data completeness and enhance the credibility and robustness of the study findings while minimising data loss. In the event of conflicting information between sources during cross-verification, facility registers, and validated line lists were prioritised for clinical and demographic details.
Malaria cases were defined and classified according to the WHO and the NMCP guidelines. Specifically, a malaria case was defined as a person in whom the presence of malaria parasites in the blood was confirmed by a quality-controlled laboratory diagnostic test.21,22 In this study, individuals with positive malaria Rapid Diagnostic Test (RDT) or microscopy results were considered confirmed malaria cases. Diagnostic quality assurance procedures are routinely supervised by the NMCP to ensure consistency across facilities.
Outcome Variables
This study investigated 2 primary outcome variables: malaria case severity and LLIN use. Malaria case severity was classified as a binary outcome: severe or uncomplicated. Severe malaria was defined based on standard clinical indicators, such as convulsions, repeated vomiting, unconsciousness, or other WHO-defined danger signs.21,23 This classification was determined by trained health facility clinicians based on the clinical presentation. On the other hand, uncomplicated malaria was defined as a patient who presented with symptoms suggestive of malaria and a positive parasitological test, but without any signs or symptoms of severe malaria. 22
A LLIN is technically defined as a factory-treated net that does not require retreatment and is designed to maintain its efficacy for at least 3 years. 24 In this study, LLIN use was classified as a binary outcome (use or no use). LLIN use as the second outcome variable was self-reported sleep under a factory-treated mosquito net the night before the survey. This variable was collected as a standard component of routine case investigation forms within the Zimbabwean NMCP malaria surveillance system.
These 2 variables were chosen as key outcomes, reflecting both preventive behaviour and clinical burden relevant to elimination progress.25,26 LLINs are the principal vector control intervention in the district. Monitoring malaria case severity is critical for promptly identifying and managing severe cases, which is essential for preventing mortality and further transmission. 27 The analysis of these outcomes is intended to inform strategic decision-making, effective resource allocation, and timely public health actions in pursuit of elimination.
Predictor Variables
This study investigated a comprehensive range of potential predictors of LLIN use and the severity of malaria. The investigated predictors represented a combination of data collected at the individual level (such as age, sex, and pregnancy status), behavioural level (eg, sleeping outdoors and travel history), and structural/environmental level (such as distance to a health facility and presence of breeding sites). All predictor variables were chosen based on their established or hypothesised influence on malaria transmission and prevention, as substantiated by the existing literature and the local malaria epidemiological context.28 -30 Only the predictor variables accessible from the case-based surveillance data in triangulated sources were considered for the analysis.
Pregnancy status was used as a key individual-level predictor of malaria severity. However, consistent with the limitations of retrospective studies, the availability and definition of this variable relied entirely on whether the pregnancy was known, reported, and formally documented in the DHIS-2 tracker database or complementary health facility registers such as ANC registers. Reliance on documented status poses a limitation, as cases involving undiagnosed or unreported pregnancies among women of childbearing age may have been misclassified.
Sampling and Sample Size
The study utilised a complete enumeration (census) of all fully investigated malaria case records reported during the period 2019 to 2024. Of the 744 reported cases in the Mberengwa District during this period, 662 cases with complete data were identified for inclusion. These cases represented approximately 89% of all reported cases during the study period. Cases were excluded if missing information could not be complemented or cross-verified through the data triangulation process, primarily because of incomplete forms or reporting gaps (11% of cases excluded). Given that the proportion of cases with missing data was below 15%, complete-case analysis (listwise deletion) was deemed appropriate, minimising the potential bias introduced by missingness. 31 The flow diagram of the sampling process is presented in Figure 2.

Study sample selection process from routine malaria case-based surveillance records (2019-2024).
Data Management and Quality Assurance
Data were extracted and cleaned using a predefined Microsoft Excel template designed to harmonise and integrate all variables required for analysis. Rigorous data cleaning and validation procedures were performed, including the detection of missing data, duplicate entries, outliers, and logical inconsistencies. Missing data were assessed across all key predictor variables to determine the extent of missingness. To address incomplete data, this study employed complete-case exclusion. Cases were excluded if they lacked essential information for the required variables, even after a rigorous triangulation process with health facility registers and malaria line lists. Variables such as age, which were collected using different formats, were standardised to facilitate merging and analysis. To ensure security, the final datasets were stored in a well-secured password-protected computer, with access only to the researchers. In addition, all research fields were frequently backed up to a secure external hard drive that was kept separate from the main computer system.
Statistical Analysis
All data manipulation and statistical analyses were performed using R software version 4.5.1 (2025-06-13) utilising packages such as pROC, Random Forest, and segmented, in addition to base R functions. The analysis code and anonymised data are available upon reasonable request to ensure transparency and reproducibility. Statistical analysis commenced with the characterisation of malaria cases in the Mberengwa District by calculating baseline summary statistics. Frequencies and percentages were calculated for categorical variables, while continuous variables, such as age, medians, and corresponding Interquartile Ranges (IQR), were computed.
Temporal trends in malaria transmission were analysed using monthly and annual malaria case-based surveillance data to detect significant infections over time. Monthly malaria test positivity rate (TPR) trends were modelled using segmented regression with a 3-month moving average to smoothen short-term fluctuations. To quantify changes in trend slopes, segmented (join-point) regression analysis was performed using the segmented package in R. This analysis fitted multiple linear regression segments to the time-series data, identifying statistically significant points (breakpoints) at which the rate of change in malaria cases shifts by. 32
Pearson’s chi-squared test was applied on categorical variables across each outcome variable stratum, while Fisher’s exact test was employed where expected cell counts were less than 5. The Wilcoxon rank-sum test was used for continuous variables, such as age and distance to health facilities. Statistical significance was determined by calculating the P-values, with a significance threshold of P < .05.
Variables for inclusion in the multivariate logistic regression model were selected based on statistical significance in the stratified analysis and their epidemiological relevance. All factors with a P-value < .20 in the stratified analysis were considered eligible for entry into the multivariable model. Additionally, variables known from prior literature or biological plausibility to influence each outcome variable were controlled for potential confounding despite having P values greater than .20, to control for potential confounding. This stepwise approach allows researchers to systematically include variables that have a meaningful impact on each outcome, thereby ensuring that the model remains robust and relevant to real-world scenarios. 33 Multicollinearity was assessed for independent variables by calculating the Variance Inflation Factor (VIF). Variables with VIF values greater than 10 were considered to have unacceptable multicollinearity. 34
Random Forest (RF) classification models were developed to complement the established multivariate logistic regression analysis. The purpose of deploying these RF models was to investigate the predictive power and relative importance of independent variables concerning both LLIN use and the severity of malaria cases. These RF models were implemented using R software, specifically employing the caret and random forest packages. Notably, for the LLIN use model, the synthetic minority oversampling technique (SMOTE) was applied via the Thémis package before model training to correct for class imbalance. For data handling, the models were trained on 70% of the dataset and were subsequently validated using the remaining 30%. This division was performed using stratified random sampling. The specific parameters defined for the RF models included between 500 and 700 trees, along with the default
The comparison of predictive performance focussed on Logistic Regression and Random Forest models to assess the relative utility of traditional linear statistical methods against advanced machine learning classifiers designed to capture complex, non-linear relationships. Although other machine learning classifiers were initially considered, this study focussed on comparing these 2 contrasting modelling approaches to provide a pragmatic assessment of their suitability for surveillance data. The performance of the models was assessed using Receiver Operating Characteristic (ROC) curves and their corresponding areas under the curve (AUC). Model training relied on 10-fold cross-validation (CV). Given the nature of the outcomes, CV was performed using stratified sampling by outcome to ensure that each fold maintained the outcome proportions of the original dataset. The DeLong test was used to statistically compare the AUCs and determine the significance of the observed differences in predictive accuracy. Since 2 distinct comparisons were performed (LLIN use and malaria severity), the resulting P-values from the DeLong test were adjusted using the Bonferroni correction method to control the Family-Wise Error Rate. 35 Accordingly, the statistical significance threshold for these comparisons was set at α = .025, that is, .05 divided by 2tests.
Ethical Considerations
Ethical approval was granted by the Institutional Review Board of the National University of Science and Technology. The protocol number for the study was
Results
Between January 2019 and December 2024, 744 cases of malaria were reported in the Mberengwa District. Data triangulation between the DHIS2 Tracker, Excel line list, and health facility registers resulted in 662 (89%) cases with complete data for inclusion in the analysis.
Baseline Characteristics of Reported Cases
A total of 662 malaria cases were analysed, with a median age of 18 years (Interquartile Range [IQR], 11-33). The sex distribution was nearly even, with 51.7% males (n = 342) and 48.3% females (n = 320). The most frequently reported occupations were unemployed (43.4%, n = 287) and students (41.8%, n = 277). The majority of individuals attained a secondary education level (46.4%, n = 307), followed by those with no formal education (31.9%, n = 211). More than half of the reported cases (63.7%, n = 422) had a recent travel history. A complete profile detailing the distribution and summary statistics of the baseline characteristics is presented in Table 1.
Baseline Demographic, Socioeconomic, and Behavioural Profile of Reported Cases in Mberengwa District, 2019 to 2024.
Temporal (Monthly and Annual) Malaria Case Trends
The monthly and annual trends of the Malaria Test Positivity Rate (TPR) in the Mberengwa District from 2019 to 2024 are presented in Figure 3, along with a 3-month moving average. The trends revealed an established seasonal transmission pattern, with peaks generally coinciding with the rainy season (October-November). Despite these seasonal increases, a gradual decline in the overall magnitude of TPR was observed over the study period. To precisely identify significant shifts in malaria transmission dynamics, segmented regression (join-point regression) was employed, focussing on the Test Positivity Rate (TPR) over the study period. The analysis identified a single statistically significant breakpoint, estimated to be in April 2020. From January 2019 to April 2020, the TPR increased at a rate of 0.38% per month. After April 2020, the TPR trend shifted to a declining rate of −0.11% per month.

Monthly malaria test positivity rates (TPR) segmented trends.
LLIN Use Determinants
LLIN Use, Stratified Analysis, and Logistic Regression
Stratified descriptive analysis was used to examine the association between demographic, socioeconomic, and environmental factors and LLIN use (Table 2). Despite high household LLIN ownership (95%, n = 575 households with ≥1 LLINs), self-reported LLIN use the previous night was strikingly low (7.7%, n = 51 cases). Additionally, only 10.1% (19 out of 188) of individuals residing near active breeding sites reported using LLIN the night before the survey. Stratified descriptive analysis revealed a statistically significant association between recent travel history and LLIN use (P = .05). Specifically, 51% (26 out of 51) of individuals who used LLINs reported a recent travel history, compared to a higher proportion of 65% (396 out of 611) among those who did not use LLINs. All other analysed variables yielded P-values above the significance threshold (P > .05), indicating that they had no statistically significant association with LLIN use.
Determinants of LLIN Use.
Wilcoxon rank-sum test.
Pearson χ2 or Fisher’s exact test.
Education level was categorised as: no formal education (including children below school-going age), primary, secondary, and tertiary.
A multivariate logistic regression model was employed to identify the independent determinants of LLIN use and to account for potential confounding variables. This model included recent travel history based on the inclusion threshold (P < .2) in the stratified analysis. Additionally, several other variables with P-values above .2, such as age, sex, outdoor sleeping behaviour, occupation, LLIN ownership, and recent travel history, were incorporated because of their recognised theoretical, epidemiological, or clinical plausibility as predictors of health behaviours. Initial checks confirmed no concerns regarding multicollinearity, with VIF values for all predictors ranging from 1.004 to 2.038. The multivariate logistic regression analysis demonstrated that none of the included predictor variables were significantly associated with LLIN use, as all respective P-values were above the established threshold of P < .05 (Table 2). Specifically, factors such as age (P = .25), recent travel (P = .15), outdoor sleeping (P = .63), and being married (P = .07) were not significantly associated with LLIN use after adjusting for other factors. Further machine learning analysis was performed using a random forest model to explore potential nonlinear relationships and variable importance.
Random Forest Analysis for LLIN Use
A Random Forest model was developed to capture the potential nonlinear effects. Before balancing the model, 7.7% (36 out of 464) of the individuals reported sleeping with an LLIN the previous night. After SMOTE, both groups were equally represented (428 each), which enabled more robust model training. The test dataset (n = 198) was not altered to preserve the real-world class distribution. The Random Forest model demonstrated strong overall performance, achieving an out-of-bag (OOB) error rate of 4.6% and a classification accuracy of 88.9% on the test dataset. The model correctly identified most non-users, reflected by a specificity of 96.2%, but failed to correctly predict any LLIN users (sensitivity = 0%), indicating difficulty in distinguishing the minority class. The overall discriminative ability was modest, with an area under the ROC curve (AUC) of 0.538.
The variable importance analysis (Figure 4) based on the Mean Decrease Accuracy revealed that age (58.3), time to seek care (57.9), religious group (48.5), distance to health facility (48.1), and education level (47.1) were the 5 most influential factors. Other variables contributing to the model, although with lower importance scores, included sex (39.4), recent travel history (39.3), pregnancy status (36.9), occupation (33.7), and availability of active breeding sites (30.8).

Random forest analysis of LLINs use.
Malaria Severity Determinants
Stratified Descriptive Analysis and Logistic Regression for Severity
The vast majority of the analysed malaria cases were classified as uncomplicated (89.3%, n = 591), with 10.7% (n = 71) classified as severe malaria. The median age for severe cases was 26 years (IQR 11-47), which was significantly higher than 18 years (IQR 11-32) for uncomplicated cases (Wilcoxon rank-sum test, P = .02). Distance to a health facility was also significantly associated with severity (P < .001), with a median distance of 9 km (IQR 6-12) for severe cases compared with 6 km (IQR 5-7) for uncomplicated cases. Stratified analysis revealed a notable difference in the proportion of cases residing near an active breeding site between the severity groups (38% of severe cases and 27% of uncomplicated cases). However, this association was not statistically significant (P = .06). All other investigated variables, such as time to seek care (P = .13), sex (P = .7), and pregnancy status (P = .10), showed no significant association with malaria case severity in the stratified descriptive analysis (Table 3).
Determinants of Malaria Case Severity.
The multivariate logistic regression model included all variables meeting the inclusion threshold of P < .2 (age, distance to health facility, time to seek care, occupation, active breeding sites near home, recent travel history, and pregnancy status). Other variables deemed clinically or epidemiologically relevant to malaria case severity were also included (sex and previous LLIN use). Multicollinearity was low across all variables, with all VIF values inflation factor values <5. The analysis identified age as a significant factor (P = .002), with each additional year slightly increasing the odds of developing severe malaria (AOR = 1.03; 95% CI: 1.01-1.05). Distance to health showed a strong association with severe malaria (P < .001). Compared to the reference group (Far, >10 km), residing at a near distance (<5 km) strongly reduced the odds of severe malaria (AOR = 0.06; 95% CI: 0.02-0.14), as did a moderate distance (5-10 km; AOR = 0.11; 95% CI: 0.05-0.21). Pregnancy status was not statistically significant in the final multivariate model (P = .053). However, pregnant women showed substantially increased estimated odds of progression to severe malaria (AOR = 7.39; 95% CI: 0.81-55.56). The remaining variables, including time to seek care (P = .86), LLIN use the previous night (P = .70), and the presence of active breeding sites near the home (P = .43), were not significantly associated with malaria severity.
Random Forest Analysis for Case Severity
The model achieved an overall out-of-bag performance of 11.9%, indicating good discrimination performance. Additionally, the model achieved high sensitivity (99.4%) for the large uncomplicated class but low specificity (9.5%) for the small severe class. This indicates the difficulty in accurately identifying severe cases, potentially owing to the low proportion of severe cases in the dataset. Variable importance metrics indicated that the top 5 factors with the highest importance scores for case severity were distance to health facility (Mean Decrease Accuracy = 13.51; Mean Decrease Gini = 18.68), age (3.10; 15.48), marital status (3.04; 3.99), treatment compliance (1.73; 2.78), and recent travel history (1.70; 2.09). The model further revealed minimal or negative contribution of variables such as residence (0; 0), case detection type (−1.18; 2.37) and LLIN use (−1.30; 1.30). These findings highlight that both individual-level and healthcare access factors play critical roles in determining malaria case severity. Figure 5 summarises the relative importance of the factors analysed using the random forest model.

Random forest variable importance for malaria case severity.
Model Comparison
The performance of the multivariate Logistic Regression and Random Forest models was evaluated using Receiver Operating Characteristic (ROC) curves and Area Under the Curve (AUC) values (Figure 6). For LLIN use, the difference in AUC between logistic regression (AUC = 0.53) and Random Forest (AUC = 0.54) was not statistically significant (Z = −.08, P = .94), indicating a comparable discrimination performance between the 2 models. In contrast, for malaria case severity, the Random Forest model achieved a significantly higher AUC (.76) than the logistic regression model (0.55), with a mean difference of 0.21 (95% CI: −0.35-−0.07; Z = −3.01; P = .003). This demonstrates that the Random Forest algorithm provides superior classification ability in distinguishing between severe and uncomplicated malaria cases.

Receiver operating characteristic (ROC) curve for logistic regression and random forest models predicting malaria severity and LLIN use.
Discussion
Summary and Interpretation of Key Findings
This study provides a comprehensive investigation of the demographic, socioeconomic, behavioural, and structural factors associated with observed disparities in LLIN use and malaria case severity in the Mberengwa District. Temporal trend analysis based on the Test Positivity Rate (TPR %) revealed pronounced seasonal peaks consistent with the traditional rainy season (October to March), reflecting established transmission patterns where rainfall increases vector breeding sites. Crucially, the overall trend exhibited a gradual decline in cases, with a significant breakpoint in April 2020. This shift is plausibly linked to intensified malaria elimination activities, such as targeted vector control, social behaviour change, and enhanced surveillance. However, this trend should be interpreted cautiously, as the onset of the COVID-19 pandemic likely introduced altered health-seeking and reporting behaviours that potentially contributed to the observed decline through disruptions in routine surveillance and healthcare access. 36
A key finding related to prevention efforts was the striking disparity between household LLIN ownership (95%) and self-reported LLIN use (7.7%) among the included malaria cases. This low usage rate strongly suggests that access alone is insufficient to guarantee protective use, highlighting the urgent need to prioritise robust social and behaviour change (SBC) programmes. This discrepancy may be influenced by behavioural and cultural factors, such as perceived low mosquito density, discomfort during hot seasons, and misconceptions about malaria risk.37,38 Operational challenges, including net wear and tear, inadequate replacement, and household sleeping arrangements, may further limit consistent LLIN use despite their availability. 39 The initial stratified analysis highlighted that recent travel history was significantly associated with LLIN use (P = .05). This supports existing evidence that travel patterns complicate adherence, possibly due to inconsistent sleeping environments or altered risk perceptions away from home. 40
Subsequent multivariate logistic regression analysis, adjusted for sociodemographic and environmental confounders, failed to identify any statistically significant independent factors associated with LLIN use (P > .05). This outcome strongly suggests that the tested factors do not exert strong independent linear effects on LLIN use when evaluated collectively within this population. Despite this null result in the linear model, the Random Forest analysis identified age, time to seek care, religious group, distance to health facilities, and education level as the top 5 factors related to LLIN use.
The notable discrepancy between the model outputs establishes the complementary values of the 2 analytical approaches. Logistic Regression remains crucial for linear hypothesis testing, whereas the Random Forest model uncovers complex nonlinear relationships. Previous epidemiological studies have shown that random forest models can capture complex interactions missed by traditional models. 41 This capability allowed the Random Forest model to reveal variables that meaningfully contribute to LLIN use dynamics, despite the lack of significant linear associations. However, the variables identified by the Random Forest model should be interpreted with caution. The lack of significant linear effects found by Logistic Regression, combined with the Random Forest model’s failure to classify any positive LLIN use (sensitivity = 0%), demonstrates that the influential variables reflect only weak, non-linear associations. This reflects the limited informational value available for modelling this highly rare behavioural outcome (LLIN use was only 7.7% of cases).
Despite applying SMOTE to address the marked class imbalance during training, the Random Forest model critically failed to correctly classify any positive LLIN utilisation instances in the test set. This failure highlights the inherent weakness of the predictive signal associated with this highly rare behavioural outcome (LLIN use was only 7.7% of cases), demonstrating the limitations of both linear and complex nonlinear models in reliably predicting extremely rare events.
Regarding malaria severity, most cases (89%) were uncomplicated. Both stratified analysis and multivariate logistic regression consistently identified age and distance to a health facility as significant predictors of severity. Increased vulnerability with each additional year likely reflects physiological factors, such as immature immunity in young children and waning immunity in older adults, increasing the risk of progression to severe malaria.42,43 The analysis also showed that pregnancy status was not statistically significant in the multivariate model (P = .05), although the estimated effect size was large (AOR = 7.39; 95% CI: 0.81-55.56). This large effect size that failed to reach significance is due to statistical instability arising from the extremely small sample size (N = 5 pregnant women) and the resultant extremely wide 95% Confidence Interval, which indicates insufficient statistical power to exclude the null effect. This finding, while failing to meet the P < .05 threshold, aligns with established epidemiological evidence regarding the increased susceptibility of pregnant women to malaria infection and its severity, attributed to immunological changes during pregnancy. 44
The significant role of distance to a health facility as an independent factor associated with severity demonstrates the structural barriers to accessing timely care in rural settings. Interestingly, although a far distance (>10 km) was associated with severe malaria, the time taken to seek care after symptom onset was not statistically significant (P = .862). This suggests that physical distance presents a more formidable structural barrier to accessing immediate care than individual delay once symptoms appear. However, this discrepancy may stem from differing measurement precision, since distance is an objective structural variable, while time to seek care is a self-reported measure reliant on patient recall. Consequently, the inherent measurement error in self-reported data reduces the statistical power required to detect an association with the time to seek care. Random Forest analysis supported these findings by identifying distance to health facilities, age, marital status, treatment compliance, and recent travel history as the top 5 most important variables in modelling malaria severity. This supports the consideration of Random Forest alongside traditional models rather than relying on statistical significance alone. 45
In contrast to LLIN use, the Random Forest model demonstrated enhanced predictive performance for malaria case severity compared to the logistic regression model. This improvement indicates that machine learning approaches can more effectively capture the complex interactions among the structural and demographic factors influencing disease severity. The Random Forest model captured nonlinear relationships and variable interactions that the logistic model could not, providing an enhanced capacity to identify these complex patterns. This finding demonstrates the utility of machine learning methods in enhancing predictive insights in complex epidemiological datasets. 46 By accommodating heterogeneous variables, Random Forest provides a more flexible framework for modelling malaria severity risk patterns in real-world surveillance data.
Implications for Policy and Practice
The findings of this case-based surveillance study in the Mberengwa District have several important implications for the development of future policies and practices targeting malaria control and elimination. The observed disparity between LLIN ownership and actual use among the surveyed cases requires immediate policy attention. This necessitates a clear policy shift towards prioritising behaviour change programmes designed specifically to enhance LLIN use. Interventions must be targeted based on the identified influential variables, customising messages to resonate with specific age groups and locations.
Community-based health workers should strengthen LLIN distribution. This intervention is necessary to improve LLIN use among populations characterised by delayed care-seeking, lower education, or limited access to health facilities. Furthermore, policies should extend malaria prevention efforts to include mobile and travelling populations by promoting consistent LLIN use during travel. Integrating portable net distribution and targeted awareness campaigns along major travel routes can strengthen protection among these high-risk groups.
The consistent identification of distance to a health facility as a major predictor of severity highlights the urgent need to address the structural barriers to timely access to care. Policies must focus on strengthening rural healthcare systems by improving the CHW network, constructing new clinics or health posts, and enhancing outreach services in hard-to-reach areas to mitigate the physical distancing barriers. These actions must be integrated into Zimbabwe’s malaria elimination framework. Given the estimated increased odds of progression to severe status among pregnant women, policies must prioritise effective preventive measures, including appropriate and timely case management, as this group is globally recognised as highly vulnerable.
Finally, the enhanced predictive discrimination of Random Forest models for case severity demonstrates the utility of integrating advanced analytical techniques into malaria surveillance and programme evaluation at the subnational level. This approach offers insights to inform more targeted resource allocation by identifying complex relationships missed by traditional regression. Machine learning should be leveraged to complement traditional regression, which remains indispensable for hypothesis testing and estimating the specific effect sizes (AORs) of individual variables.
Strengths and Limitations of the Study
A notable strength of this study is the use of multiple analytical approaches, including traditional statistical approaches and advanced machine learning techniques, ensuring a comprehensive analysis of the predictors of LLIN use and the severity of malaria cases. The use of complementary analytical approaches to identify predictors of malaria severity and LLIN use helped capture the complex nonlinear relationships, which were previously missed by traditional statistical approaches. 47 Another strength of this study was the triangulation of different case-based surveillance data sources, such as the DHIS2 Tracker, malaria line lists, and health facility registers. This improved data completeness and reduced bias, allowing the analysis of a substantial proportion (89%) of malaria cases. This multi-source approach enhances data reliability and addresses common challenges related to data quality in surveillance studies. 48 Additionally, the joint analysis of structural, socioeconomic, demographic, and behavioural factors aligns with established health disparities frameworks. 49
Nonetheless, this study has some limitations to this study which must be acknowledged when interpreting the findings. First, approximately 11% of the reported cases were excluded due to incomplete information, which may have introduced non-random bias and limited the generalisability of the findings. Second, key variables such as LLIN use, time to seek care, and occupation were self-reported, potentially introducing recall or social desirability bias. This bias may be particularly relevant to the finding that the distance to a health facility was a significant predictor of malaria severity, whereas the time to seek care was not.
Third, the retrospective observational design restricted causal inference, limiting the interpretation of associations rather than cause-to-effect relationships. 13 Fourth, despite the inclusion of plausible covariates, residual confounding may persist because unmeasured factors are not captured in routine surveillance data. 50 Fifth, although the Random Forest model demonstrated high predictive performance. This may partly reflect overfitting to the training data rather than genuine generalisable accuracy, underscoring the need for external validation to confirm model stability and reproducibility. Finally, although the SMOTE algorithm was applied to address class imbalance, Random Forest variable importance measures may still exhibit instability in relatively small or imbalanced datasets, warranting cautious interpretation of their relative contributions.
Conclusions
This study successfully investigated the demographic, behavioural, socioeconomic, and structural drivers of malaria severity and LLIN use in the Mberengwa District using case-based surveillance data from 2019 to 2024. The findings revealed an overall decline in the Test Positivity Rate, marked by a breakpoint in April, likely attributable to a combination of enhanced malaria elimination activities and changes in health behaviour potentially related to the COVID-19 pandemic. A critical finding is the striking disparity between high household LLIN ownership (95%) and low use (7.7%), emphasising that access alone is insufficient for protective behaviour. Structural factors significantly influenced malaria severity outcomes, with distance to health facilities and age consistently emerging as key predictors. Random Forest models demonstrated superior predictive performance for case severity compared to traditional logistic regression, highlighting the utility of machine learning in public health. This further underscores the potential of machine learning to generate predictive insights to guide malaria elimination programming in Zimbabwe. To sustain progress towards elimination, there is an urgent need to prioritise robust, tailored behaviour change programmes that specifically address the barriers to LLIN use related to travel, distance to access points, and specific sociodemographic groups. Priority must also be given to improving access to health services in hard-to-reach areas by strengthening CHW programmes and targeted outreach services to mitigate the progression of uncomplicated cases to severe disease. Effective malaria elimination requires a combined focus on behavioural changes, structural improvements and data-driven programming supported by advanced analytics.
Footnotes
Authors’ Note
Ethical Considerations
Ethical approval for this study was obtained from the National University of Science and Technology Institutional Review Board (IRB; Ethics No:
Dissemination
The findings will be disseminated to the relevant stakeholders through meetings, conferences, seminars, and peer-reviewed journals.
Authors Contributions
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
