Sage Journals: Discover world-class research

Abstract

Objective

To develop and interpret an interpretable machine learning model for classifying HIV viral load suppression (VLS) using routinely collected clinical data in a low-resource Ethiopian cohort, enabling early identification of patients at risk of treatment failure.

Methods

A retrospective cohort study was conducted using electronic medical records of 4,152 patients on antiretroviral therapy (ART) at the University of Gondar Comprehensive Specialized Hospital, Ethiopia (March 2005–December 2024). Eight machine learning algorithms, Logistic Regression, Random Forest, Gradient Boosting, Naive Bayes, Support Vector Machine, K-Nearest Neighbors, Decision Tree, and XGBoost, were trained and optimized to classify binary VLS outcomes. Model performance was evaluated using accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC). The best-performing model was interpreted using SHapley Additive exPlanations (SHAP) to identify significant predictors and their directional impacts.

Results

The optimized Gradient Boosting model achieved the highest performance with 76% accuracy, 0.74 F1-score, and 0.79 AUC-ROC. Baseline CD4 category and duration on ART (months) emerged as the most influential predictors. SHAP analysis revealed that longer ART duration and higher baseline CD4 count were associated with increased odds of suppression, while advanced WHO clinical stage (Stage 4) and male sex were associated with unsuppressed viral load. Individual-level predictions were visualized using waterfall plots to enhance clinical interpretability.

Conclusion

An interpretable Gradient Boosting model can reliably predict viral load suppression using routinely collected clinical data in resource-limited settings. The model’s predictions align with established clinical knowledge, offering a potential decision-support tool for identifying patients at risk of treatment failure at this single site, pending external validation in other cohorts and settings.

Keywords

HIV viral load suppression machine learning predictive modeling SHAP low-resource settings clinical decision support

Background

Viral load measurement is a critical parameter in the management of infectious diseases such as HIV, hepatitis, and COVID-19 (Coronavirus Disease 2019).^1,2 It quantifies the virus in a patient’s blood, aiding diagnosis, monitoring disease burden, and assessing treatment efficacy.^3,4 Consequently, the prediction of viral load is of high priority in low-resource settings where access to frequent laboratory testing may be poor.^5,6 In such contexts, predictive models can serve as valuable tools for estimating viral load levels, enabling healthcare providers to make appropriate treatment plans that improve patient outcomes.^7–9

HIV/AIDS (Human Immunodeficiency Virus/Acquired Immunodeficiency Syndrome) remains one of the most significant global health challenges, with an estimated 39 million people living with the disease worldwide as of 2022, including 1.5 million new infections annually.¹⁰ Sub-Saharan Africa bears the most significant burden, accounting for approximately 67% of all people living with HIV and 72% of AIDS-related deaths globally.¹¹ In Ethiopia, over 500,000 people are living with HIV, and the epidemic remains a leading cause of morbidity and mortality despite expanded access to antiretroviral therapy.^7,12 These stark disparities highlight the urgency of optimizing strategies for viral load monitoring and treatment management in resource-limited settings.

The global burden of HIV/AIDS, particularly in sub-Saharan Africa, which accounts for the majority of new infections and AIDS-related deaths, remains a major public health challenge.^10,13 Ethiopia, one of the most affected countries, has made great progress in expanding access to ART through its health system. However, there are a lot of challenges, such as not enough health infrastructures, unequally distributed resources, and the impossibility of monitoring patients regularly.¹² In this context, novel approaches to the management of diseases, including the use of AI (Artificial Intelligence) and ML, could improve the efficiency and effectiveness of HIV care.^14,15

In recent years, ML has become a subarea of AI that has grown as one of the strongest predictive modeling tools in healthcare. The ML algorithms can identify patterns and relationships from large datasets that traditional statistical methods cannot easily determine.^16–18 This capability is particularly relevant in the context of viral load prediction, where multiple factors interact in complex ways.¹⁴ However, the performance of ML models depends heavily on the quality of the data, the choice of algorithms, and the strategies employed to address common challenges such as class imbalance and missing data.^16,18

In many LMICs (Low- and Middle-Income Countries), including Ethiopia, data quality challenges are pervasive and multifaceted.^19,20 At the University of Gondar Comprehensive and Specialized Hospital, missing or incomplete data are common in both demographic and clinical fields. Examples include missing age or residence information, inconsistent documentation of ART initiation dates, and incomplete CD4 or viral load test results. Additionally, manual data entry errors and the lack of standardized formats in the EMR system contribute to inaccuracies in medical records. These issues hinder the robustness of ML models, which rely heavily on clean, complete datasets.²¹ Therefore, careful preprocessing, imputation strategies, and quality control mechanisms are necessary to mitigate the effects of poor data quality and improve model reliability.

Gondar University Comprehensive and Specialized Hospital is also a very active hub for the care and study of HIV-infected individuals located in the Amhara National Regional State. The ART Clinic, established in 2005, provides free services. The clinic started its work of enrollment in August 2005 and has accrued more than 15,000 patients, with more than 5,000 patients on active follow-up. This will be the source for developing and validating predictive models since a lot of data related to patient demographics, clinical history, immunological status, and treatment outcomes is stored in the EMR (Electronic Medical Records) system. Using this, researchers can derive knowledge concerning the factors associated with viral load suppression and develop tools to support clinical decision-making.

Even though ML is a very promising tool for viral load prediction, there are only a few comprehensive studies that compare different algorithms and imbalance resolution techniques on the subject. Most of the current literature is from high-income settings, where data availability and healthcare infrastructure differ significantly from those in LMICs.^14,15 This literature gap underlines the necessity for studies investigating the performance of ML models under resource-constrained settings where the challenges of poor data quality and class imbalance are mostly heightened.

This study will, therefore, seek to close this gap through a comparative performance analysis of the eight conventionally used machine learning algorithms, namely, Random Forest, Support Vector Machines, Gradient Boosting, Decision Tree, LightGBM, XGBoost, Logistic Regression, and k-nearest Neighbors on viral load prediction.

The findings of this study have important implications for public health practice and policy. Accurate viral load prediction can support the timely identification of patients at risk of treatment failure, enabling healthcare providers to intervene early and prevent adverse outcomes.²² Beyond this, the use of ML models reduces reliance on expensive and time-consuming laboratory tests, increasing access to and the sustainability of HIV care in resource-constrained settings.²³ The study will optimize AI models for viral load prediction to further advance the global effort to combat HIV/AIDS and improve the quality of life for individuals living with the virus.

Several recent studies from sub-Saharan Africa have demonstrated the potential of machine learning for predicting HIV outcomes in resource-limited settings.^24,25 These studies have applied machine learning models to predict retention and viral suppression among HIV treatment cohorts, achieving AUCs of 0.77-0.82. A study at the same Ethiopian institution as our current work, predicting virological failure using conventional statistical methods, though without comprehensive algorithm comparison or model interpretability analysis.⁷ Studies from Kenya have explored predictive modeling for loss-to-follow-up, while research from Guinea has examined determinants of viral suppression using traditional regression approaches.^26,27 Despite these contributions, few studies have comprehensively compared multiple algorithms with balanced datasets, addressed class imbalance systematically, or integrated model explainability techniques such as SHAP in LMIC contexts. This study builds upon and extends this growing body of literature by providing rigorous algorithm comparison, balanced data handling, and clinically interpretable predictions specifically tailored to Ethiopian HIV care settings.

This study advances previous research by comprehensively comparing multiple machine learning models using a balanced dataset within a real-world LMIC setting. Unlike prior studies that may have been constrained by imbalanced data or an over-reliance on traditional statistical methods, the use of a balanced dataset in this work ensures a robust and unbiased evaluation of model performance.^28,29 Furthermore, moving beyond a primary focus on accuracy, this research places significant emphasis on explainability through SHAP values, ensuring that the models are not only predictive but also interpretable and clinically relevant.

This study advances the field in four specific ways: (1) it provides the first comprehensive comparison of eight machine learning algorithms for viral load suppression prediction using a large Ethiopian cohort, establishing benchmark performance metrics; (2) it demonstrates that ensemble methods (particularly Gradient Boosting) can achieve clinically useful accuracy even with routinely collected data in resource-limited settings, without requiring expensive laboratory infrastructure; (3) it integrates model explainability (SHAP) to bridge the gap between ‘black box’ predictions and clinical interpretability, addressing a critical barrier to AI adoption in low-resource healthcare; and⁴ it offers a practical, deployable framework that identifies modifiable and non-modifiable risk factors, enabling targeted interventions. By explicitly linking algorithmic performance to clinical actionability, this work moves beyond proof-of-concept toward implementation-ready decision support for HIV care in settings where viral load testing is constrained.

Methods and materials

Study design and setting

This study adheres to the TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) statement for prediction model development and validation. A quantitative research approach has been adopted in this study, where machine learning techniques will be employed to predict the viral load status among PLHIV (People Living with Human Immunodeficiency Virus). This research was conducted at the University of Gondar Comprehensive and Specialized Hospital, one of the leading healthcare institutions in Gondar City, Amhara National Regional State, Ethiopia. Gondar City, located about 748 km northwest of Addis Ababa, has a population of 457,938 and is one of the largest hubs for health service delivery and research. This ART clinic was established in March 2005 and provides free ART services to over 7 million people in Gondar province and surrounding regions. At the time of this study, the clinic had enrolled 15,933 patients, of whom 5,481 were currently on treatment. This retrospective cohort study was conducted at the University of Gondar’s comprehensive and specialized hospital, which was deliberately chosen not only due to data availability but also because of its distinct demographic and epidemiological profile. This region has a high burden of HIV, with a substantial proportion of PLHIV receiving ART. Additionally, the area serves as a referral center for surrounding rural districts, providing a heterogeneous mix of urban and rural populations, which enhances the generalizability of the findings. The healthcare infrastructure in the area also allows for long-term follow-up and relatively comprehensive medical records, which are essential for a mortality prediction study.

The study included all eligible patients from the electronic medical records from March 2005 to December 2024. For this study, total population sampling was employed, rather than random sampling. The cohort size for this study is 4,152, representing all the eligible patients.

Data source and study population

The secondary data in this study were obtained from the EMR-ART at the University of Gondar Comprehensive and Specialized Hospital. The data were extracted from the electronic medical records of PLHIV who received ART at the University of Gondar’s comprehensive and specialized hospital between March 2005 and December 2024. Inclusion criteria were: (1) age ≥15 years at ART initiation, (2) confirmed HIV diagnosis, and (3) complete baseline clinical and laboratory data available at the time of ART initiation. Patients were excluded if they had incomplete baseline records, transferred in from another facility without a full clinical history, or were lost to follow-up within one month of ART initiation. The final cohort included 4,152 patients who met all eligibility criteria and were followed until death, loss to follow-up, or the end of the study period.

The extraction of the relevant information from the EMR database was done following a standardized procedure in which a query was run on the database in a structured manner to obtain information regarding all the patients with a confirmed diagnosis of HIV and initiation of ART between March 2005 and December 2024. The baseline information was obtained from the first clinical encounter during which the initiation of ART occurred, while the outcome and follow-up information were obtained from the last clinical encounter. After extraction, the information was exported in CSV format, replacing the names of the patients with unique study identifiers for confidentiality purposes. To ensure the accuracy of the extracted information, 5% of the extracted information was randomly checked against the source document by two independent researchers.

All persons living with HIV on ART who visited the hospital for care in its ART clinic formed the source population. This dataset covered various variables in demography, clinical setups, and immunological and treatment-related factors that would be necessary to perform a model predictive. The outcome variable was the viral load status, categorized into two classes: Suppressed and Unsuppressed. The independent variables were sociodemographic characteristics: age, sex, marital status, occupational status, religion, and place of residence. Other characteristics included clinical factors: duration on ART, WHO clinical stage, duration with HIV, and TB co-infection. Further hematologic and immunological factors, including baseline and current CD4 counts and viral load, were considered. Treatment-related factors such as adherence to ART, regimen line, initiation and discontinuation of TPT (Tuberculosis Preventive Therapy), and CPT (Cotrimoxazole Preventive Therapy) usage were added to the model for improved interpretability.

Regarding the handling of longitudinal data, this study employed a cross-sectional snapshot design using the most recent clinical encounter for each patient as the outcome assessment point, while incorporating key historical variables (e.g., duration on ART in months, baseline CD4 count) that capture cumulative treatment exposure and immunological history. This approach was deliberately chosen for three reasons: (1) clinical decision-making in resource-limited settings typically relies on current patient status and treatment history rather than full temporal trajectories; (2) the electronic medical record system at the study site does not consistently capture all intermediate visit data with complete temporal granularity, making comprehensive longitudinal modeling potentially unreliable; and (3) the primary objective was to develop a practical screening tool that could be deployed using routinely available cross-sectional data, maximizing real-world applicability.

Data preprocessing

To assess the mechanism of missingness, we conducted systematic analyses comparing patients with complete versus missing data across key variables. Using Little’s Missing Completely at Random (MCAR) test, we found evidence that data were not MCAR (p < 0.001), suggesting a combination of Missing at Random (MAR) and Missing Not at Random (MNAR) mechanisms. For example, missing CD4 counts were more common among patients with poor adherence documentation, indicating MAR. To address potential bias from mode imputation, we implemented multiple sensitivity analyses: (1) complete-case analysis restricted to patients with no missing data (n=2,847) yielded similar model performance (Gradient Boosting accuracy 75% vs. 76% with imputation), suggesting minimal imputation-induced bias; (2) we compared mode imputation against multiple imputation by chained equations (MICE) on a random subset, finding no significant differences in coefficient estimates or model performance; and (3) we created missingness indicator variables for features with >5% missingness to allow the model to learn potential informative missingness patterns. These analyses support that our imputation approach did not substantially bias the final model.

The data was preprocessed, following a structured data preprocessing pipeline to ensure the quality and reliability of the data. Machine learning requires a high-quality dataset for prediction. Due to this, handling the missing data during the pre-processing of the dataset is a crucial phase. To assess the extent of data quality issues in the dataset, we examined the proportion of missing values across all features. As illustrated in Figure 1, several variables demonstrated varying degrees of missingness. Notably, ‘Baseline CD4 Count’ and ‘Recent CD4 Count’ exhibited the highest levels of missing data, with 15.2% and 15.0% of values missing, respectively. Other variables, such as ‘Duration on ART in Months’ and ‘Functional Status’, had missing data ranging from 1% to 5%, while the majority of features had less than 2% missing values. Overall, the dataset contained approximately 1.7% missing data. To address these missing values, we applied imputation techniques tailored to the nature of each variable. For continuous variables, such as CD4 counts and ART duration, mean imputation was employed based on the assumption of approximately symmetric distributions with minimal outliers. For categorical variables, including marital status, education level, and functional status, mode imputation was used, assuming that the most frequent category reasonably represents the underlying distribution in the absence of systematic bias. This imputation strategy was selected to preserve the integrity of the dataset while minimizing information loss and ensuring the inclusion of a maximal number of observations in subsequent analyses. The Simple Imputer class of the scikit-learn module was used to fill in the missing values in the dataset.

Figure 1.

Percentage of missing data by feature.

Data pre-processing also includes encoding data, which is a crucial step. This study used one-hot and label encoding to encode categorical variables. Values with two or more category values are considered categorical if they are discrete and not continuous. In one-hot encoding, the categorical values are replaced by a number between 0 and 1.

Feature selection

From the EMR, we extract all relevant features and do correlation analysis to determine if there is a strongly correlated feature that will affect the model performance and introduce biases. The correlation heatmap illustrates the strength and direction of linear relationships among the variables in the dataset, with a specific focus on identifying factors associated with the viral load status, the primary outcome variable in this study. As shown in the heatmap, viral load status demonstrates modest positive correlations with Recent CD4 Count (r = 0.35), Recent_CD4_Category (r = 0.35), and Baseline CD4 Count (r = 0.24), suggesting that higher CD4 levels may be associated with better viral suppression outcomes. Additionally, weak positive correlations are observed with variables such as CPT Use (r = 0.05) and Duration on ART in Months (r = 0.11), indicating potential clinical relevance. Conversely, a slight negative correlation is noted between Age and viral load status (r = -0.07), implying that younger individuals may have more favorable viral outcomes (Figure 2). While most other features exhibit weak or negligible correlations with the outcome, this analysis provides initial insights into potentially influential predictors of viral load suppression, which will be further examined through multivariate modeling in subsequent stages of the research. Theoretically, the suppression status derives from the recent CD4 count and is directly related to the recent CD4 category, which is stated as low and high, derived directly from the count; we remove the recent CD4 count and the recent CD4 category from the final analysis. To prevent data leakage, we rigorously ensured that all predictor variables were based on information available before or at the time of viral load measurement, not after.

Figure 2.

Correlation heatmap analysis for features.

Data balancing

In this dataset, the dependent variable (viral load status) was almost evenly distributed, with suppressed cases representing 50.4% and unsuppressed cases 49.6% (Figure 3). This natural balance between the classes eliminates the need for synthetic oversampling techniques, allowing models to be trained directly on the authentic data distribution. The equitable representation of both outcomes supports the development of a robust classifier without inherent bias toward a majority class. This is particularly advantageous in clinical settings, as it enables the creation of predictive models that are reliable and generalizable, ensuring accurate assessment of viral load suppression for critical decision-making.

Figure 3.

Class distribution of output variable.

Machine learning models

For the estimation of viral load status, we evaluated eight ML algorithms, comprising both linear and ensemble-based models. Logistic regression was used as a baseline due to its interpretability and wide acceptance in clinical research. Ensemble methods such as random forest and gradient boosting (including its optimized variants like XGBoost) were selected for their ability to handle non-linear relationships and reduce overfitting through averaging or boosting techniques. Support vector machines, Naïve Bayes, and k-nearest neighbors were included to assess performance under different assumptions of class boundaries and data similarity. Decision trees were also used due to their intuitive rule-based structure, which aligns with the need for model interpretability in medical settings. These algorithms were chosen for their balance between predictive performance and interpretability, which is essential for clinical applicability and stakeholder trust. Each algorithm has its own assumptions: Logistic Regression assumes a logit linear relationship between predictors and outcome variable and independence of errors; Naive Bayes assumes conditional independence between features and outcome variable; SVM assumes separability and correct kernel choice (RBF kernel used after testing linear and polynomial kernel on data); tree-based algorithms (Random Forest, Gradient Boosting, Decision Tree, XGBoost) make no assumptions and can automatically deal with non-linear relationships. Deep learning models, such as neural networks, were not included in this study due to the relatively small dataset size, which could lead to overfitting and reduce generalizability. Furthermore, deep learning models often require extensive computational resources and hyperparameter tuning, which were beyond the current study’s scope. Before modeling, we performed feature selection using correlation analysis and mutual information to retain the most informative independent variables and reduce noise, thereby enhancing model performance and interpretability.

Model training and optimization

Model training and hyperparameter tuning were conducted in a standard computational environment using Python’s scikit-learn library. Training times ranged from approximately 3-8 minutes per algorithm, with simpler models (logistic regression, decision tree) converging faster than ensemble methods (Gradient Boosting, XGBoost) which required longer due to iterative boosting. Five-fold stratified cross-validation was employed during the training phase to prevent overfitting and ensure generalizability. A fixed random seed (42) was used for all models to enable exact reproducibility. The testing set (20% holdout) was never accessed during hyperparameter tuning or feature selection; it was used only for final model evaluation.

The dataset was split into training and testing subsets, with 80% allocated for training and 20% for testing. To optimize model performance, hyperparameter tuning was conducted using grid search for each model. For logistic regression, tuning focused on regularization strength (C), explored over a logarithmic scale from 10^-4 to 10³ and penalty type (both L1 and L2, which are for Lasso and Ridge regularization, respectively), while for Random Forest, these were several trees (between 50 to 500), maximum depth between 5 to 50, minimum samples per leaf from 1 to 10. In Gradient Boosting-based models, such as XGBoost and Gradient Boosting, important tunings were the learning rate, ranging from 0.001 to 0.3; estimators ranging from 50 to 500; and maximum depth between 3 to 15. For support vector machines (SVM), tuning was performed on kernel type (linear, polynomial, and RBF (Radial Basis Function), regularization parameter (C) ranging from 10^-3 to 10², and gamma values between 10^-4 to 10³ and decision trees were fine-tuned by varying max depth (3 to 20), minimum samples split (2 to 10), and evaluating both Gini impurity and entropy criteria.

KNN (k-Nearest Neighbors) tuned the number of neighbors from 1 to 50 and the distance metric by Euclidean, Manhattan, and Minkowski. Cross-validation was performed during model training to reduce overfitting. The use of k-fold cross-validation with k=5 was used for the training to ensure the model generalizes to unseen data. The study developed each model to have optimal predictive performance with generalizability and computational efficiency by systematically tuning hyperparameters and incorporating strategies to prevent overfitting.

Statistical analysis

Descriptive statistics were computed to characterize the study population, with means and standard deviations reported for continuous variables (e.g., age, CD4 counts) and frequencies with percentages for categorical variables (e.g., sex, marital status, WHO stage). Group comparisons between suppressed and unsuppressed viral load categories were performed using independent t-tests for normally distributed continuous variables and chi-square tests for categorical variables.

Evaluation metrics

Model performance was evaluated using a suite of metrics selected for clinical relevance and robustness. The Area Under the Receiver Operating Characteristic Curve (AUC-ROC) quantified overall discriminative power. Primary emphasis was placed on class-specific precision and recall to ensure high sensitivity and reliability for all patient groups. The F1-score was used as a primary summary metric to effectively balance these competing clinical priorities. To compare the performance of the eight machine learning models statistically, paired evaluation metrics were used across cross-validation folds. This approach ensures not only robust and fair comparison but also aligns the model selection process with both statistical validity and clinical relevance in HIV management under resource-constrained conditions.

Overall, the process involves collecting EMR data, preprocessing it, selecting and training models, optimizing performance, and evaluating results to identify the best model (Figure 4).

Figure 4.

Comprehensive machine learning modeling pipeline.

Implementation

All models were implemented in Python using machine learning libraries. The preprocessing, training, and testing of models are done on a Jupiter Notebook environment to ensure the reproducibility and transparency of the analysis. Results and methods have been duly recorded for future research and verification purposes.

Results

Descriptive statistics

The dataset consisted of 4,152 patients, with 50.4% having suppressed viral load and 49.6 % having unsuppressed viral load. The mean age of participants was 46.72 years (SD = 11.38), with the majority being female (59.51%) (Table 1).

Table 1.

Socio-demographic variables.

Variable	Category	Frequency	Percentage
Sex	F	2471	59.51
Sex	M	1681	40.49
Marital Status	Married	2700	65.03
	Divorced	698	16.81
	Never Married	437	10.53
	Widowed	317	7.63
Education level	Secondary Education	2145	51.66
	No Education	805	19.39
	Primary Education	789	19.00
	Higher Education	413	9.95
Residence	Urban	3177	76.52
Residence	Rural	975	23.48
Religion	Orthodox	3910	94.17
	Muslim	206	4.96
	Protestant	26	0.63
	Catholic	9	0.22
	Other	1	0.02

Regarding clinical characteristics, the median duration of ART was approximately 12.7 years (152.5 months), and most patients were in WHO clinical stage 4. The dataset exhibited a balanced class, with suppressed viral load being the predominant class (50.4%), and unsuppressed (49.6 %) viral load categories. The near-equal distribution of suppressed (50.4%) and unsuppressed (49.6%) viral load in this cohort differs from many routine HIV care settings where suppression rates typically exceed 80-90%. This unique distribution reflects the study’s sampling strategy and setting characteristics: (1) The cohort includes patients enrolled over 19 years (2005-2024), encompassing eras with less effective early ART regimens and higher failure rates; (2) as a tertiary referral center, the University of Gondar Hospital manages a disproportionate number of complex, treatment-experienced patients transferred from peripheral clinics after initial treatment failure; (3) the retrospective design captured all available patients rather than only those with optimal outcomes; and (4) the study period predates and overlaps with the global scale-up of viral load monitoring, meaning some patients may have had undetected viremia for extended periods. While this balanced distribution is advantageous for model training (avoiding class imbalance challenges), it may not reflect the current suppression rates in optimally managed cohorts, which should be considered when generalizing findings (Table 2).

Table 2.

Clinical-related variables.

Variable	Category	Frequency	Percentage
BMI	Normal	2607	62.79
	Overweight	733	17.65
	Underweight	630	15.17
	Obese	182	4.38
TB screening Result	No	4070	98.03
TB screening Result	Yes	82	1.97
Functional Status	Working	3861	92.99
	Ambulatory	163	3.93
	Bedridden	128	3.08
Viral Load Status	Suppressed	2092	50.4
Viral Load Status	Unsuppressed	2060	49.6
Adherence	Good	3611	86.97
	Poor	355	8.55
	Fair	186	4.48
Regimen Line	First Line	3811	91.79
	Second Line	319	7.68
	Third Line	22	0.53
TPT Started	Yes	2958	71.24
TPT Started	No	1194	28.76
CPT Use	Yes	3313	79.79
CPT Use	No	839	20.21

Model training results

In this study, the comparative evaluation of machine learning models revealed clear distinctions in performance across the selected algorithms. Gradient Boosting and Random Forest emerged as the top-performing models, achieving accuracies of 0.73 and 0.72, respectively, with consistently balanced precision, recall, and F1 scores (all around 0.70). This highlights their robustness and reliability in classifying the data effectively. Naive Bayes stood out for its exceptionally high recall (0.84), suggesting a strong ability to identify positive cases, albeit at the expense of lower precision (0.59), which may increase false positives. Decision Tree and XGBoost offered stable mid-range performance, with accuracies of 0.67 and 0.69 and balanced metrics, making them reasonable alternatives. Logistic Regression yielded moderate results (accuracy = 0.62), while SVM and KNN underperformed, both with accuracies of 0.54 and comparatively lower overall scores (Table 3). Collectively, these findings indicate that ensemble-based methods, particularly Gradient Boosting and Random Forest, are the most effective approaches for this dataset, providing a promising direction for building accurate and reliable predictive models. Beyond discriminative performance, we assessed the calibration of the optimized Gradient Boosting model and the agreement between predicted probabilities and observed frequencies of viral suppression. The Brier score, which measures the mean squared difference between predicted probabilities and actual outcomes, was 0.18 (range 0-1, with 0 indicating perfect calibration).

Table 3.

Model training results.

Model	Accuracy	Precision	Recall	F1 score
Logistic Regression	0.62	0.60	0.55	0.58
Random Forest	0.72	0.70	0.70	0.70
Gradient Boosting	0.73	0.70	0.70	0.70
Naive Bayes	0.66	0.59	0.84	0.69
SVM	0.54	0.51	0.55	0.53
KNN	0.54	0.50	0.58	0.54
Decision Tree	0.67	0.65	0.62	0.64
XGBoost	0.69	0.66	0.67	0.66

The ROC curves shown in Figure 5 illustrate the trade-off between the true positive rate and false positive rate for all models evaluated. The Gradient Boosting model achieved the highest area under the curve (AUC = 0.79), closely followed by Random Forest (AUC = 0.78) and XGBoost (AUC = 0.76), confirming their strong discriminative ability. Naive Bayes also performed well, with an AUC of 0.73, demonstrating competitive sensitivity in distinguishing between classes. In contrast, Logistic Regression (AUC = 0.66) and Decision Tree (AUC = 0.67) provided moderate results, while SVM (AUC = 0.57) and KNN (AUC = 0.56) performed poorly, with curves lying closer to the diagonal reference line, indicating limited predictive power. Overall, the ROC analysis reinforces the superiority of ensemble methods, particularly Gradient Boosting and Random Forest, as they consistently achieved higher AUC values and demonstrated better classification performance across thresholds compared to traditional algorithms.

Figure 5.

AUC-ROC curve results of models.

Examining class-specific performance for the optimized Gradient Boosting model, the confusion matrix (Figure 6) reveals a sensitivity (recall) of 0.65 for the unsuppressed class (211 correctly identified of 324 total unsuppressed cases) and 0.76 for the suppressed class (339 correctly identified of 447 suppressed cases). This indicates that the model correctly identifies approximately two-thirds of patients with unsuppressed viral load, those at highest risk of treatment failure and clinical progression, while misclassifying 35% as suppressed (false negatives). The higher sensitivity for suppressed cases reflects the model’s better performance on the majority class, even in a balanced dataset.

Figure 6.

Confusion matrix result of tuned gradient boosting model.

Feature importance

The feature importance analysis from the tuned Gradient Boosting model (Figure 7) highlights the variables that most strongly contributed to predicting viral load suppression status. Feature importance was calculated using the built-in feature importance attribute of the optimized Gradient Boosting model, which averages the reduction in impurity (Gini importance) across all trees for each feature. Baseline CD4 Category emerged as the most influential predictor, contributing more than half of the total importance score, underscoring the critical role of immunological status at treatment initiation. The second most significant predictor was Duration on ART in Months, indicating that longer treatment duration is strongly associated with viral load suppression outcomes. Other features, such as sex and age, contributed moderately, while factors like regimen line, CPT use, and marital status had relatively smaller impacts. Variables including WHO stage, adherence, BMI, educational level, TPT initiation, and TB screening result showed minimal importance in the model. These findings suggest that immunological markers and treatment duration are the most decisive factors in determining viral load suppression, while sociodemographic and clinical background factors play a comparatively limited role in prediction.

Figure 7.

Feature importance of our best performing model.

Model explainability

SHAP (SHapley Additive exPlanations) analysis was performed using the KernelExplainer with 1,000 background samples from the training set, as implemented in the SHAP Python library (version 0.44.0). The interpretation of the model using SHAP revealed that the most important features driving predictions were the duration on ART, baseline CD4 category, and patient age. The direction of impact demonstrated that longer ART duration and a higher baseline CD4 count were strongly associated with a decreased risk of the outcome, whereas advanced WHO clinical stage and male sex were associated with an increased risk. The analysis also highlighted the protective effect of initiating TPT. This feature importance ranking and directional analysis align with established clinical understanding, thereby enhancing the credibility and interpretability of the model’s decision-making process (Figure 8).

Figure 8.

SHAP analysis results.

The model’s prediction for a specific individual was further elucidated using a SHAP waterfall plot. The analysis begins with the baseline model output, E [f(X)] = -0.201, representing the average prediction across the dataset. For this patient, the most influential factors increasing their predicted risk were a WHO Stage of 4 (+1.3) and being male (Sex=1, +1.27). Their age of 53 also contributed significantly to a higher risk (+0.86). These factors were partially offset by a protective effect from a longer duration on ART (162 months), which decreased the risk (-0.41) (Figure 9). Contributions from other features, such as CPT Use and Baseline CD4 Category, had smaller positive effects. Ultimately, the cumulative effect of all these feature contributions shifted the model’s prediction from the population average to a final, higher value for this individual, clearly illustrating how their specific clinical and demographic profile led to the elevated risk score.

Figure 9.

Interpretation of an individual prediction using a SHAP waterfall plot.

Discussion

This study sought to develop and interpret a machine learning classifier for estimating HIV viral load suppression from regularly collected clinical information in a low-resource environment. The principal finding is that ensemble-based machine learning models, namely Gradient Boosting, can generate stable and clinically sound estimates of viral load status. The best-performing optimized Gradient Boosting model achieved a balanced accuracy of 76%, with superior performance in precision, recall, and F1-score measures. This performance, coupled with a model explainability evaluation via SHAP, is an actionable and interpretable tool consistent with existing clinical evidence, with great potential for the facilitation of HIV care management in resource-limited environments.

Higher performance of tree-based ensemble methods like Gradient Boosting and Random Forest aligns with existing literature on healthcare predictive modeling.^30,31 These models are specially adept at making inferences of complex, non-linear interactions between features, characteristic of clinical information, wherein patient outcomes are a function of a complex interaction between demographic, clinical, and treatment-related variables. Less fundamental models like Logistic Regression and KNN were found to perform poorly, as they may not be able to accurately represent such complex relationships. Naive Bayes’ high recall, while intriguing, came with low precision, rendering it less desirable for the clinical setting where a false positive equates to wasteful resource use.

A second foundation of this work’s value is its emphasis on model explainability. Feature importance analysis and SHAP plots provide beyond a “black box” prediction, clinically interpretable information. The observation that Baseline CD4 Category and Duration on ART (in months) were the strongest predictors is well-aligned with established principles of virology and immunology.^32–34 Higher baseline CD4 levels agree with an intact immune system upon initiation of treatment, and higher duration on ART is indicative of cumulative treatment efficacy and adherence, both of which are established predictors of viral suppression. Moreover, the SHAP summary plot supported the protective direction of these associations: longer ART duration and higher CD4 levels always pushed model prediction away from the unsuppressed class. The findings of male gender and advanced WHO Stage (Stage 4) as risk factors for unsuppressed viral load are also supported by epidemiologic studies that report differential treatment outcomes by stage of disease and gender.³⁵

The plot of a SHAP waterfall on one patient’s data is a lovely way to demonstrate the utility of the model for personalized clinical decision-making. For the example given, the model quantified how much the patient’s risk factors (WHO Stage 4 disease, male, old age) were offset by the beneficial effect of long ART duration. This granularity of explanation might help clinicians understand the “why” behind a model’s risk score and facilitate appropriate interventions. For instance, a patient who has a high-risk score because of poor adherence (a modifiable factor) would be managed differently from one whose risk is largely owing to poor baseline CD4 count (a non-modifiable factor).

From a clinical safety perspective, the model’s sensitivity of 0.65 for detecting unsuppressed viral load means that 35% of patients with treatment failure would be missed if the model were used as a standalone screening tool without confirmatory testing. In resource-limited settings where viral load testing is already constrained, this false negative rate could have serious consequences, including delayed regimen switching, disease progression, and increased transmission risk. However, when positioned as a triage tool, flagging high-risk patients for priority viral load testing rather than replacing testing entirely, even moderate sensitivity can optimize resource allocation. Patients predicted as unsuppressed (regardless of true status) would receive confirmatory testing, while those predicted as suppressed with high confidence might have testing deferred. Future work should focus on threshold optimization to prioritize sensitivity for the unsuppressed class, potentially accepting lower precision to minimize missed treatment failures. Incorporation of such a model into EMR systems of low-resource clinic settings can be an early warning system. With automated risk score calculation during the encounter of the patient, the model has the potential to notify clinicians to flag patients at increased risk of viral load suppression for priority counseling regarding adherence, escalated follow-up, or enhanced clinical monitoring before treatment failure. This pre-emptive strategy may make the best of finite healthcare resources and also enhance the overall outcomes of therapy. That the model depends upon variables generally found in EMRs for ART clinics makes it most feasible for deployment without new, expensive testing.

Beyond clinical utility, this work carries important implications for HIV policy and health system strengthening in low-resource settings. The model demonstrates that predictive analytics can optimize scarce laboratory resources by prioritizing confirmatory testing for patients at the highest risk of unsuppressed viral load, potentially improving efficiency in settings like Ethiopia, where testing coverage is constrained by supply chain and capacity limitations. The identification of baseline CD4 category and ART duration as dominant predictors reinforces the policy priority of early diagnosis and treatment initiation, while the finding that male sex and advanced WHO stage predict unsuppressed viral load highlights persistent disparities requiring targeted interventions such as male-friendly services and enhanced support for late presenters. Furthermore, integrating validated predictive models into electronic medical records could enable automated risk stratification at scale, supporting WHO’s differentiated service delivery framework by directing intensified support to high-risk patients while reducing visits for low-risk individuals. Finally, the interpretability of our model (via SHAP) addresses key policy concerns regarding AI transparency, allowing regulators and program managers to audit automated decisions and ensure alignment with clinical guidelines and ethical principles as digital health investments expand across sub-Saharan Africa.

Translating this model into routine clinical use requires a structured four-phase pathway tailored to resource-limited settings: (1) technical integration, embedding the model within existing EMR systems (SmartCare/OpenMRS) with automated data extraction and real-time risk score display; (2) clinical workflow integration, training staff on risk score interpretation, establishing triage protocols (e.g., <50% predicted probability prioritizes viral load testing), and developing patient-friendly counseling explanations; (3) evaluation and refinement, conducting prospective cohort studies to assess impact on outcomes, monitoring performance drift with quarterly recalibration, and iteratively refining thresholds based on user feedback; and (4) scale-up, packaging implementation toolkits for dissemination to other Ethiopian ART clinics, establishing learning collaboratives, and advocating for national policy endorsement within differentiated service delivery guidelines. This phased approach acknowledges resource constraints and human factors, moving beyond technical validation toward sustainable clinical integration (Figure 10).

Figure 10.

Proposed clinical workflow integration of viral load prediction model.

Limitations and future directions

While having supportive results, this study has several limitations. First, the data are derived from a single tertiary hospital in Ethiopia and hence may have limited generalizability to other populations with their own unique demographic and epidemiological characteristics. External verification with datasets from other regions is required. Second, while the dataset was balanced for the outcome variable, it may not have contained all relevant predictors, e.g., certain socio-economic variables, psychosocial stressors, or genetic markers. Future research would be well-advised to incorporate these variables to further enhance predictive power. Third, the model’s performance, while good, indicates that there remains some unexplained variance, and viral load suppression must therefore rely, at least in part, on factors yet to be captured in the current feature set.

Future studies would need to focus on three areas: 1) Multi-center verification to test the strength and transportability of the model to a variety of sub-Saharan African health settings. 2) Prospective studies of implementation to quantify the real-world impact of deploying the model into clinical practice on patient outcomes and utilization of resources. 3) Exploration of temporal modeling techniques to estimate viral load suppression risk dynamically through time, rather than at a snapshot in time, providing yet another more persuasive tool for long-term management of patients.

While this single-site study limits immediate generalizability, the University of Gondar cohort possesses characteristics, an urban-rural mix, a 19-year treatment era span, and typical LMIC data quality challenges that enhance its utility as a benchmark for similar settings. To advance external validity strategically, we propose a three-phase approach: (1) internal temporal validation confirming stable performance (AUC-ROC 0.78 vs. 0.79); (2) geographic validation through planned collaborations with two other Ethiopian ART clinics; and (3) cross-national validation via open-source model sharing for external testing across sub-Saharan Africa. The model’s reliance on routinely collected variables standard to national HIV programs facilitates this validation agenda, and we invite external testing to establish generalizability bounds.

Third, the balanced proportion of unsuppressed and suppressed viral loads in this cohort (50.4% unsuppressed, 49.6% suppressed) is substantially different from the suppression proportions commonly reported in well-managed HIV cohorts (80-90%). This is because of the study’s unique features: the setting of a tertiary centre for managing complicated and treatment-experienced cases, the study period of 19 years from 2005 to 2024, including years with less effective ART regimens, and the inclusion of all cases rather than those with optimal outcomes. Thus, model performance might vary in settings with higher proportions of baseline suppression.

Fourth, despite our sensitivity analyses, the 15% missingness in CD4 count measurements may cause information bias. Little’s MCAR test was performed to determine the missing data mechanism. The results showed that the data were not missing completely at random because the missing CD4 count was more common in patients with poor adherence documentation.

Conclusion

In conclusion, the study was successful in the development of an interpretable machine learning model in predicting viral load suppression using routine clinical data collected from one tertiary hospital in Ethiopia. The Gradient Boosting model showed superior classification performance in the cohort (76% accuracy, 0.79 AUC-ROC), as well as providing insights that agreed with established knowledge in the field; namely, that the longer the time on ART and the higher the CD4 at the start of ART, the higher the odds of viral load suppression. Nevertheless, there were significant shortcomings in the study. The model’s performance explained only 76%, with the remainder (24%) unexplained, and the significant false negative rate of 35% in the unsuppressed viral load suggests that the tool is not an alternative to confirmatory testing. Furthermore, the results are based on a single site and patient population (tertiary referral site and 19-year enrolment period) and need to be validated before consideration for wider implementation. Nevertheless, within these limitations, the model represents a proof-of-concept for how interpretable machine learning can help address the disconnect between predictive accuracy and actionability in resource-limited settings. Ultimately, these types of tools may have the potential to assist healthcare providers in optimizing HIV management, although this is for future research and not a recommendation.

Footnotes

Acknowledgments

We would like to express our heartfelt thanks to the University of Gondar Hospital for cooperating and permitting the use of the data. Moreover, we would like to give our special appreciation to Debre Markos University for its contribution. Finally, thanks to all who made significant contributions to the success of this study.

ORCID iDs

Abraham Keffale Mengistu

Andualem Fentahun Senishaw

Andualem Enyew Gedefaw

Ethical considerations

Ethical approval for this study was obtained from the Institutional Review Board (IRB) of Debre Markos University, College of Medicine and Health Sciences (Approval Number: HSC/R/C/Ser/Co/123/25). Since all data were anonymized before analysis, patient confidentiality was strictly maintained. Access to the dataset was restricted to authorized researchers only to comply with ethical guidelines and data protection policies. The study was conducted in accordance with the principles of the Declaration of Helsinki and adhered to all relevant national regulations and ethical guidelines.

Consent to participate

Because this study involved retrospective analysis of anonymized clinical data with no direct patient contact or intervention, the IRB granted a waiver of informed consent. All data were de-identified before extraction, and no patient identifiers were included in the analysis dataset.

Author contributions

A.K.M. conceptualized the study, and A.E.G. was involved in design, analysis, interpretation, report, and manuscript writing. A.W.S. and M.B.M. edited the manuscript for clarity and correctness. All the authors read and approved the final manuscript.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

Data will be available upon request from the corresponding author. The code and experiments used for this research are available on the following public GitHub repository. ().

Use of artificial intelligence

The authors declare that no artificial intelligence (AI) tools or large language models were used in the generation, writing, or substantive editing of this manuscript. The human authors produced all content, analyses, and interpretations. Standard grammar and spell-checking tools were used solely for language refinement and did not contribute to intellectual content or scientific decision-making.

Appendix

References

Guterres

. Viral load: We need a new look at an old problem? Journal of medical virology 2023; 95(8): e29061. https://doi.org/10.1002/jmv.29061

Marin

. The Importance of Viral Load in Human Body Fluids and its Role in Infectious Diseases, Journal of Virology and Mycology 2023; 12(4). https://doi.org/10.35248/2161-0517.23.12.276

Lakshmanan

Liu

. Impact of Point-of-Care Testing on Diagnosis, Treatment, and Surveillance of Vaccine-Preventable Viral Infections. In: Diagnostics. Multidisciplinary Digital Publishing Institute (MDPI), 2025. https://doi.org/10.3390/diagnostics15020123

Rouet

Rouzioux

. The measurement of HIV-1 viral load in resource-limited settings: How and where? Clin Lab 2007; 53: 135–148.

Ochodo

Olwanda

Deeks

, et al. Point-of-care viral load tests to detect high HIV viral load in people living with HIV/AIDS attending health facilities. In: Cochrane Database of Systematic Reviews. John Wiley and Sons Ltd, 2022. https://doi.org/10.1002/14651858.CD013208.pub2

Greig

Du Cros

Klarkowski

, et al. Viral load testing in a resource-limited setting: Quality control is critical. J Int AIDS Soc 2011; 14(1): 23. https://doi.org/10.1186/1758-2652-14-23

Mamo

Yilma

Fekadie

, et al. Machine learning to predict virological failure among HIV patients on antiretroviral therapy in the University of Gondar Comprehensive and Specialized Hospital, in Amhara Region, Ethiopia, 2022. BMC Med Inform Decis Mak 2023; 23(1). https://doi.org/10.1186/s12911-023-02167-7

Dixon

Sattar

Moros

, et al. Unveiling the Influence of AI Predictive Analytics on Patient Outcomes: A Comprehensive Narrative Review. Cureus 2024; 16: e59954. https://doi.org/10.7759/cureus.59954

Maskew

Sharpey-Schafer

De Voux

, et al. Applying machine learning and predictive modeling to retention and viral suppression in South African HIV treatment cohorts. Sci Rep 2022; 12(1): 12715. https://doi.org/10.1038/s41598-022-16062-0

10.

Payagala

Pozniak

. The global burden of HIV. Clin Dermatol 2024; 42(2): 119–127. https://doi.org/10.1016/j.clindermatol.2024.02.001

11.

Kippen

Nzimande

Gareta

, et al. The viral load monitoring cascade in HIV treatment programmes in sub-Saharan Africa: a systematic review. BMC Public Health 2024; 24(1): 2603. https://doi.org/10.1186/s12889-024-20013-x

12.

Kitaw

Haile

. Virological outcomes of antiretroviral therapy and its determinants among HIV patients in Ethiopia: Implications for achieving the 95–95–95 target. PLoS One 2025; 20(1): e0313481. https://doi.org/10.1371/journal.pone.0313481

13.

Moyo

Murewanhema

, et al. Key populations and Sub-Saharan Africa’s HIV response. In: Frontiers in Public Health. Frontiers Media S.A, 2023. https://doi.org/10.3389/fpubh.2023.1079990

14.

Marcus

Sewell

Balzer

, et al. Artificial Intelligence and Machine Learning for HIV Prevention: Emerging Approaches to Ending the Epidemic. In: Current HIV/AIDS Reports. Springer, 2020, pp. 171–179. https://doi.org/10.1007/s11904-020-00490-6

15.

Xiang

Fujimoto

, et al. Application of artificial intelligence and machine learning for HIV prevention interventions. Lancet HIV 2021; 9: e54–e62. https://doi.org/10.1016/S2352-3018(21)00247-2

16.

Pal

A Comparative Analysis of Machine Learning Algorithms for Predictive Analytics Healthcare. International Journal of Scientific research and management 2024; 72: 10–25.

17.

Javaid

Haleem

Pratap Singh

, et al. Significance of machine learning in healthcare: Features, pillars and applications. International Journal of Intelligent Networks 2022; 3: 58–73. https://doi.org/10.1016/j.ijin.2022.05.002

18.

Alowais

Alghamdi

Alsuhebany

, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. In: BMC Medical Education. BioMed Central Ltd, 2023. https://doi.org/10.1186/s12909-023-04698-z

19.

Chekol

Ketemaw

Endale

, et al. Data quality and associated factors of routine health information system among health centers of West Gojjam Zone, northwest Ethiopia, 2021. Frontiers in Health Services 2023; 3: 1059611. https://doi.org/10.3389/frhs.2023.1059611

20.

Gebretsadik

Belachew

Gebrekidan

, et al. Health information system in primary health care units of the Central Zone, Tigray, Northern Ethiopia. BMC Med Inform Decis Mak 2025; 25(1): 233. https://doi.org/10.1186/s12911-025-03078-5

21.

Abebe

Legesse

Godie

, et al. Health Management Information System Data Quality and its associated factors in Addis Ababa Public Hospitals, Ethiopia, 2022. A cross-sectional study 2025. [Internet] Available from. https://doi.org/10.1101/2025.01.19.25320816

22.

Tucker

Reyes Nieva

Schiffer

, et al. Using a Health Information Exchange to Characterize Changes in HIV Viral Load Suppression and Disparities During the COVID-19 Pandemic in New York City. Open Forum Infect Dis 2023; 10(12): ofad584. https://doi.org/10.1093/ofid/ofad584

23.

Xie

Kadota

, et al. Prevention of adverse HIV treatment outcomes: machine learning to enable proactive support of people at risk of HIV care disengagement in Tanzania. BMJ Open 2024; 14(9): e088782. https://doi.org/10.1136/bmjopen-2024-088782

24.

Ebulue

Ekkeh

Ebulue

, et al. Machine learning insights into HIV outbreak predictions in Sub-Saharan Africa. International Medical Science Research Journal 2024; 4: 558–578. https://doi.org/10.51594/imsrj.v4i5.1121

25.

Mutai

McSharry

Ngaruye

, et al. Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa. BMC Med Res Methodol 2021; 21(1): 159. https://doi.org/10.1186/s12874-021-01346-2

26.

Yehadji

Gray

Vicente

, et al. Development of machine learning algorithms to predict viral load suppression among HIV patients in Conakry (Guinea). Front Artif Intell 2025; 8: 8. https://doi.org/10.3389/frai.2025.1446876

27.

Endebu

Taye

Deressa

. Development of a machine learning prediction model for loss to follow-up in HIV care using routine electronic medical records in a low-resource setting. BMC Med Inform Decis Mak 2025; 25(1): 192. https://doi.org/10.1186/s12911-025-03030-7

28.

Zuhair

Babar

Ali

, et al. Exploring the Impact of Artificial Intelligence on Global Health and Enhancing Healthcare in Developing Nations. Journal of Primary Care and Community Health 2024; 15: 21501319241245847, SAGE Publications Inc. https://doi.org/10.1177/21501319241245847

29.

Al Meslamani

Sobrino

de la Fuente

. Machine learning in infectious diseases: potential applications and limitations. Annals of Medicine 2024; 56: 2362869, Taylor and Francis Ltd. https://doi.org/10.1080/07853890.2024.2362869

30.

Wang

Zhang

Mai

, et al. An effective multi-step feature selection framework for clinical outcome prediction using electronic medical records. BMC Med Inform Decis Mak 2025; 25(1): 84. https://doi.org/10.1186/s12911-025-02922-y

31.

Mienye

Sun

. A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects. IEEE Access 2022; 16: 1–99149. https://doi.org/10.1109/ACCESS.2022.3207287

32.

Gebrerufael

Asfaw

. Predictors of change in CD4 cell count over time for HIV/AIDS patients on ART follow-up in northern Ethiopia: a retrospective longitudinal study. BMC Immunol 2024; 25(1): 64. https://doi.org/10.1186/s12865-024-00659-3

33.

Birhan

Gezie

Teshome

, et al. Predictors of CD4 count changes over time among children who initiated highly active antiretroviral therapy in Ethiopia. Trop Med Health 2020; 48(1): 37. https://doi.org/10.1186/s41182-020-00224-9

34.

Phillips

Staszewski

Weber

, et al. HIV Viral Load Response to Antiretroviral Therapy According to the Baseline CD4 Cell Count and Viral Load. JAMA 2001; 286: 2560–2567. https://doi.org/10.1001/jama.286.20.2560

35.

Wisaksana

Hartantri

Hutajulu

. Risk Factors Associated with Unsuppressed Viral Load in People Living with HIV Receiving Antiretroviral Treatment in Jawa Barat. Indonesia. HIV/AIDS - Research and Palliative Care 2024; 16: 1–7. https://doi.org/10.2147/HIV.S407681

Machine learning-based classification of HIV viral load suppression in low-resource settings

Abstract

Objective

Methods

Results

Conclusion

Keywords

Background

Methods and materials

Study design and setting

Data source and study population

Data preprocessing

Feature selection

Data balancing

Machine learning models

Model training and optimization

Statistical analysis

Evaluation metrics

Implementation

Results

Descriptive statistics

Model training results

Feature importance

Model explainability

Discussion

Limitations and future directions

Conclusion

Footnotes

Acknowledgments

ORCID iDs

Ethical considerations

Consent to participate

Author contributions

Funding

Declaration of conflicting interests

Data Availability Statement

Use of artificial intelligence

Appendix

References