Abstract
Background:
Although tuberculosis is highly prevalent in low- and middle-income countries, millions of cases remain undetected using current diagnostic methods. To address this problem, researchers have proposed prediction rules.
Objective:
We analyzed existing prediction rules for the diagnosis of pulmonary tuberculosis and identified factors with a moderate to high strength of association with the disease.
Methods:
We conducted a comprehensive search of relevant databases (MEDLINE/PubMed, Cochrane Library, Science Direct, Global Health for Reports, and Google Scholar) up to 14 November 2022. Studies that developed diagnostic algorithms for pulmonary tuberculosis in adults from low and middle-income countries were included. Two reviewers performed study screening, data extraction, and quality assessment. The study quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2. We performed a narrative synthesis.
Results:
Of the 26 articles selected, only half included human immune deficiency virus-positive patients. In symptomatic human immune deficiency virus patients, radiographic findings and body mass index were strong predictors of pulmonary tuberculosis, with an odds ratio of >4. However, in human immune deficiency virus-negative individuals, the biomarkers showed a moderate association with the disease. In symptomatic human immune deficiency virus patients, a C-reactive protein level ⩾10 mg/L had a sensitivity and specificity of 93% and 40%, respectively, whereas a trial of antibiotics had a specificity of 86% and a sensitivity of 43%. In smear-negative patients, anti-tuberculosis treatment showed a sensitivity of 52% and a specificity of 63%.
Conclusions:
The performance of predictors and diagnostic algorithms differs among patient subgroups, such as in human immune deficiency virus-positive patients, radiographic findings, and body mass index were strong predictors of pulmonary tuberculosis. However, in human immune deficiency virus-negative individuals, the biomarkers showed a moderate association with the disease. A few models have reached the World Health Organization’s recommendation. Therefore, more work should be done to strengthen the predictive models for tuberculosis screening in the future, and they should be developed rigorously, considering the heterogeneity of the population in clinical work.
Introduction
Tuberculosis (TB) continues to be highly prevalent in low- and middle-income countries (LMICs) where resources for TB screening and diagnosis are limited. 1 However, approximately 3.1 million people are missed with the current approach to TB case detection. 1 Therefore, reaching missing TB cases is critically important to save millions of lives and end TB. 2
Currently, a molecular test for TB, namely, the Ge-neXpert Dx System (Cepheid, CA, USA), was co-developed by Cepheid, Inc. (Sunnyvale, CA, USA), and the Foundation for Innovative New Diagnostics. 3 This test has improved TB diagnosis; however, its access is limited in primary healthcare settings. 4 Truenat, a new molecular diagnostic tool for TB, has not been widely implemented in many developing countries. 5 Smear microscopy has low sensitivity and fails to detect nearly half of all TB cases. 6 The availability of radiography and specialist staff is limited. In such cases, a risk prediction rule helps identify patients at high risk of developing TB and thus requires additional diagnostic testing or consideration of anti-TB treatment.7,8 In 2021, prediction rules, such as the chest radiograph abnormality score, were proposed by the World Health Organization (WHO) to increase the TB case detection rate. 9
Several risk factors, such as HIV infection, silicosis, diabetes mellitus, kidney disease, and low body weight, affect the development of TB. 10 Statistically combining their effects can produce a more powerful risk prediction than considering one risk factor at a time.11,12
Several clinical prediction rules are available for the diagnosis of TB.13–15 These rules include factors related to the patient’s clinical findings, history, and investigation results. Pinto et al. 16 conducted a systematic review to assess the diagnostic accuracy and reproducibility of chest radiograph scoring systems for the diagnosis of pulmonary TB and reported that the scoring system appears useful for ruling out pulmonary TB in hospital settings. However, such a scoring system cannot be applied to peripheral health facilities, where radiography is scarce. The quality and characteristics of the prediction models for pulmonary TB have been reviewed elsewhere. 17 However, studies evaluating the accuracy of the symptoms or combinations of symptoms were excluded. Such algorithms are important to identify easily applicable and readily available predictors for future model development and validation. A recent systematic review investigated the utility of a clinical scoring system in improving TB case findings. 18 It showed variable sensitivity and specificity but did not demonstrate the power of the predictors and the effectiveness of clinical scoring.
A standardized scoring system for pulmonary TB using readily available, measurable, and widely generalizable predictors is useful for improving TB case detection. Such tools should be robust and simple to implement in low-resource settings. 19 Synthesizing appropriate algorithms for different clinical and epidemiological settings is important for TB screening and diagnosis. In this study, we examined the performance of prediction rules for diagnosing pulmonary TB in adult patients attending health facilities. We also identified common predictors of pulmonary TB with moderate-to-strong predictive ability.
Methods
Study design and population
We conducted a systematic review of observational studies that developed clinical algorithms for diagnosing pulmonary TB. We classified the algorithms into model based and criteria based. Model-based algorithms were developed using two or more predictors independently associated with pulmonary TB in multivariable logistic regression, classification trees, neural networks, or vector machine learning. 20 Criteria-based algorithms were developed empirically (without the use of a mathematical model) using signs and symptoms of TB, with some including laboratory and chest X-ray (CXR) findings.
This systematic review was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. 21 This study was registered in the Open Science Framework (OSF, https://osf.io/4y7sz/).
Data sources and search strategy
We conducted a comprehensive literature search in MED-LINE/PubMed, Cochrane Library, Science Direct, Global Health for reports, and Google Scholar for studies published from 1 January 1990 to 14 November 2022. The literature search was conducted from 15 April to 14 November 2022. We used optimized search filters developed to identify diagnostic prediction studies in PubMed. 22 The search filter had a sensitivity of 0.95, ranging from 0.94 to 0.97. To address our review question, we added search strings related to the diagnosis and pulmonary TB. We used the following search terms: (“Stratification” OR “ROC Curve”[Mesh] OR “Dis-crimination” OR “Discriminate” OR “c-statistic” OR “c statistic” OR “Area under the curve” OR “AUC” OR “Cali-bration” OR “Indices” OR “Algorithm” OR “Multivariable”) AND (“Diagnosis” OR “diagnosis” [MeSH]) AND (“Pulmonary TB” OR “PTB” OR “TB” [Mesh] OR “TB”).
The results were limited to the English language based on the study team’s ability. We conducted forward and backward citation searches in Google Scholar using the references of the included articles to find other potentially relevant studies, including gray literature. We searched clinical trial registers, conference proceedings, and gray literature databases for unpublished studies. All search results were managed using EndNote, version 5.
Inclusion and exclusion criteria
We used the population, index test, reference test, and diagnosis of interest framework for diagnostic accuracy studies 23 to guide the comparison of algorithms. Any original study that developed and/or validated a diagnostic score/algorithm for pulmonary TB in adults (aged ⩾ 15 years) seeking health care was included in this review. We only included studies conducted in LMICs as defined by the World Bank in 2022. 24 High-income countries were excluded because of the exceptionally low incidence of TB and the remarkably high detection rate of TB cases. Studies on extrapulmonary TB and drug-resistant TB were excluded. We also excluded surveys because the presumed TB cases in the community were different from those presented to the health facility.
Study selection
Using the inclusion and exclusion criteria, two authors independently screened the titles and abstracts of the retrieved studies. Disagreements were resolved through a consensus discussion. Duplicate articles were excluded from the analysis. Eligible citations were selected for full-text review. Full-text articles that met all criteria for this review were retained for quality assessment, data extraction, and subsequent analyses.
Assessment of study quality
GB Gebregergs and G Berhe independently assessed study quality using the Quality Assessment of Diagnostic Accuracy Studies-2, a revised tool comprising four domains: patient selection, index test, reference standard, and flow and timing. 25 As recommended, each study was graded as “high,” “low,” or “unclear” for risk of bias or concerns about applicability for each domain.
If all signaling questions for a domain are answered “yes,” the risk of bias is considered low. If any of the signaling questions are answered “no,” the risk of bias is considered high.
Disagreements between the two reviewers were resolved by discussion.
Data extraction and analysis
A data extraction format was created using an Excel spreadsheet. We adopted the CHARMS (Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies) checklist 26 to develop the data extraction format. In case we were unable to obtain all the required information from the available reports, we contacted the corresponding authors.
Our primary interest was to identify parameters that significantly predicted active pulmonary TB along with their measure of effect. We classified the strength of association into three categories: weak (OR ⩽ 2.0, moderate (OR: 2.1–3.0), and strong (OR > 3.0). The secondary outcome of interest was the performance of the algorithm. The most available algorithms were symptom screening, serial screening (symptom screening followed by chest radiography), and biomarker screening. A combination of these algorithms has also been studied. We also calculated the number needed to screen (NNS) to detect one person with TB disease as the reciprocal of the prevalence of newly diagnosed TB. 27 Case–control studies were excluded from NNS calculations. We presented the effect size (odds ratio) of predictors and the diagnostic accuracy of different algorithms for symptomatic individuals, regardless of symptoms, and for HIV-positive and HIV-negative individuals. In the presence of two or more studies, we used a median estimate to avoid small-study effects. 28
We only performed a narrative synthesis. We did not pool the estimates because the scoring systems were heterogeneous in terms of predictors and derivation methods.
Results
Study selection and characteristics
Of the 4873 citations identified, 4860 unique articles remained after removing duplicates. After screening the titles and abstracts, 82 articles met the inclusion criteria, and their full texts were retrieved. After a full-text review, 56 articles were excluded for various reasons, and 26 articles7,29–53 were included in the systematic review (Figure 1).

PRISMA flow diagram for included and excluded studies.
In all, 16 (61.5%) studies7,30,31,33,35,37,38,40–42,44,46,47,50–52 were from middle-income countries, 8 studies29,32,34,39,43,48,49,53 were from low-income countries, and the remaining 2 studies were multi-country studies. In total, 24 (92.2%) articles7,29–35,37–52 were from high-burden countries (HBCs), and 2 studies36,53 were from both HBCs and non-HBCs.
All included studies were observational. Ascertainment of pulmonary TB was performed using culture in 20 studies,30,35–53 and GeneXpert in six studies according to the WHO standard. 54 In all, 13 studies29,30,34,35,37,38,40–43,45,46,49 focused solely on HIV-positive patients, while 12 studies7,31,32,36,39,44,47,48,50–53 included participants, regardless of their HIV status.
Six studies43,44,48,51–53 exclusively enrolled patients with smear-negative results. The median sample size was 575 (IQR: 345–1048). The details of the included studies are presented in Table 1.
Characteristics of the included studies.
MC: multi-country; NA: not applicable; TB: tuberculosis; HIV: human immune deficiency virus.
Quality assessment
The QUADAS-2 results for risk of bias and applicability concerns are summarized in Figure 2. Only seven studies (27%) had a considerable risk of bias for patient selection and two (7.7%) for flow and timing, indicating an overall good quality in these areas. All studies were considered to have low applicability concerns for the index test and reference standard.

Summary of risk of bias and applicability concern with QUADAS-2.
The algorithm was derived using a mathematical mo-del in fifteen studies. Among the model-based prediction studies, seven39,41,46,48,49,51,53 focused on model development, one employed independent external validation, 38 and seven7,30,33,35,37,41,52 combined developments with internal validation. Among the validation methods, two were temporal30,37; three were data splits31,41,52; and two were geographical.7,35 None of the prediction models considered expert opinions when selecting predictors. Five studies (33.3%) with multiple variables had an event-per-variable ratio <10. Six articles used the Transparent Reporting of a Multivariable Prediction Model for In-dividual Prognosis or Diagnosis (TRIPOD) checklist.
Seven criteria-based algorithms reported accuracy tests and 12 model-based algorithms reported model classification (Table 2). Among the 15 model-based studies, 13 reported areas under the receiver operating characteristic curve (AUC), and the median AUC was 0.82 (IQR: 0.78–0.84). The maximum (0.92) AUC was reported by Yu et al., 33 and the lowest (0.69) AUC was noted by Mello et al. 52
Methods reported in prediction studies for diagnosis of pulmonary TB.
AUC: area under the receiver operating characteristic curve; EPV: event per variable; OR: odds ratio; IQR: interquartile range.
Predictors included in model-based predictions
To identify the most reliable and consistent predictors of pulmonary TB across different models, we focused on studies with at least 10 events per variable to ensure sufficient data for accurate analysis (Table 3).
Strength of the predictors of pulmonary TB in diagnostic models.
OR: odds ratio; ART: anti-retroviral therapy; WHO: World Health Organization.
In symptomatic HIV patients, cavity, hilar lymphadenopathy, and pleural effusion were strong predictors (OR > 3.0), while fever and cervical lymphadenopathy 37 also showed significant predictive power. For HIV patients, regardless of symptoms, the number of WHO, TB symptoms, body temperature, hemoglobin, 35 CD4 count < 200 cell/μL, 46 and body mass index35,46 were moderate to strong predictors.
Biochemical indicators predicted pulmonary TB in HIV-negative people. 33 These included mean corpuscular volume, erythrocyte sedimentation rate, albumin level, adenosine deaminase level, monocyte/high-density lipoprotein ratio, and high-sensitivity CRP/lymphocyte ratio, with moderate to high strength of association. Among smear-negative cases, the absence of sputum and the presence of a cavity estimated pulmonary TB with moderate (OR = 2.0, 95% CI: 1.1–3.3) to strong (OR = 6.3, 95% CI: 2.6–10.0) predictive power, respectively.
Lung cavities or a miliary pattern on chest radiography were particularly strong predictors of pulmonary TB in patients with unknown HIV status, with odds ratios of 8.0 and 5.6, respectively. 50 The number and duration of WHO, TB symptoms, HIV infection, and diabetes mellitus were additional predictors of pulmonary TB. 7
Diagnostic accuracy of clinical algorithms
Table 4 describes the performance and NNS of a selected population, using culture as the gold standard.
Diagnostic accuracy of clinical algorithms for pulmonary TB among a variety of patients using culture as a reference test.
CI: confidence interval; §: risk score; NA: not applicable because it is a case-control study; TBSS: tuberculosis sign and symptom; TBSSR: tuberculosis sign, symptom and risk factors; ESR: erythrocyte sedimentation rate; CRP: C-reactive protein; LAM: lipoarabinomannan assay; CXR: chest radiograph; PPV: positive predictive value; NPV: negative predictive value; NNS: numbers needed to screen.
For symptomatic HIV patients, a combined approach using a trial of antibiotics and a C-reactive protein (CRP) level ⩾10 mg/L demonstrated high sensitivity (94.0%, 95% CI: 86.0%–98%) in identifying pulmonary TB, but with lower specificity (37.0%, 95% CI: 28%–46%). However, a trial of antibiotics individually displayed lower sensitivities (43%) and higher specificity (86%). For people living with HIV, a CD4 count below 200 cells/mm3 yielded an NNS of 6 (0–26). Similarly, a trial of antibiotics had a low NNS of 2 (0–8). A risk score combining TBSSR and hemoglobin levels demonstrated high sensitivity (88%) and low specificity (55%) for identifying TB in both symptomatic and asymptomatic HIV patients. 35 Urine LAM positivity sh-owed high specificity (82%, 95% CI: 89%–98%) and low sensitivity (31%; 95% CI: 23–39). Furthermore, the duration of cough for at least 2 weeks had an NNS of 19, ranging from 11 to 26.45,46
In HIV-negative individuals, a risk score combining multiple biomarkers demonstrated significantly higher diagnostic accuracy than that relying on CRP levels alone. While a CRP level of 10 mg/L offered moderate sensitivity (72.6%) and specificity (71.1%), 36 the multi-biomarker risk score increased sensitivity to 84.1% and specificity to 86.4%. 33
Discussion
This systematic review shows the strength of the predictors included in diagnostic models for pulmonary TB in a variety of patients. Different algorithms/risk scores have been developed for different clinical and epidemiological settings. Diagnostic accuracies and NNS are shown.
Literature indicates that TB symptoms vary between HIV-infected and non-infected individuals.27,55 For example, many HIV-infected and culture-positive TB patients do not cough, and some may even be asymptomatic. In this review, typical CXR findings and cervical lymphadenopathy were powerful covariates (OR > 3) among symptomatic HIV pati-ents. However, those on early ART (<3 months), high body temperature (>37.5°C), low BMI (less than 18.5 kg/m2), and low CD4 count; (<200 cells/mm3) had higher odds of pulmonary TB in the presence of asymptomatic HIV patients in screening practice.
The WHO-recommended four-symptom screening for patients with HIV includes current cough, fever, weight loss, or night sweats. 56 However, these symptoms may not be sufficiently sensitive to detect all TB cases. In line with this, one study indicated that symptom-based screening missed approximately 25% of active TB cases among HIV-infected adults. 57
Smear-negative pulmonary TB is more common in HIV-infected patients, leading to a delayed diagnosis. 58 Among smear-negative presumptive cases, the presence of a cavity was a powerful diagnostic predictor. This might be the reason for the routine use of CXR as a screening method for smear-negative individuals before GeneXpert and culture, due to resource constraints. 56 In this population, age, respiratory rate, absence of sputum production, and eosinophilia were moderately strong and should be considered candidate predictors for future predictions.
In this review, diabetes mellitus and HIV had moderate power in patients whose HIV status was unknown. Bacterial burden and silicosis influence the estimation of pulmonary TB. 10 However, none of the reviewed articles included them in the prediction models. Therefore, these risk factors should be considered in future studies.
In all the populations mentioned, the odds ratio from a series of studies could be different because of differences in the standard errors or sample sizes of the studies. 59 For example, the closer the odds ratio is to unity, the more likely it is to be explained by methodological difficulties such as confounding, misclassification, or other sources of bias. 60 Therefore, prediction rules must be rigorously developed to contribute to clinical practice and decision-making. 61 However, in the present review, more than half of the included studies did not report missing data and approximately 33% of the model-based studies had fewer than 10 events per variable ratio. Therefore, their results had a higher risk of selection bias and overfitting. Adherence to methodological standards such as TRIPOD is essential to avoid such problems. 20
According to Porta M, an odds ratio alone is not sufficient for discrimination. 62 Even risk factors strongly associated with a disease may have a low ability to discriminate between cases and non-cases. Sensitivity and 1-specificity should be combined using AUC to assess the discriminatory accuracy of a variable. 62 In this review, the AUC for model-based predictions/algorithms showed a 5 percentage point increment over criteria-based prediction. This may be due to the synergistic value of the predictors in the estimation of pulmonary TB.
According to WHO recommendations, any screening algorithm for TB should have a sensitivity of at least 90% and a specificity of 70%. 19 However, the urgency to improve case detection has led to flexibility in the performance criteria for new TB tests. For example, some stakeholders prefer high sensitivity as a prerequisite, whereas others require modest improvements if they are to be accessible to more people. 63
The target population for TB screening, that is, symptomatic, asymptomatic, smear negative, HIV positive, or HIV negative, should be identified. In symptomatic HIV patients, a CRP level ⩾10 mg/L increased sensitivity by 50 percentage points and decreased specificity by 46 percentage points compared to the antibiotic trial. However, the composite algorithms of the two did not improve. In addition, in terms of performance for diagnosing pulmonary TB, CXR findings and CD4 count had lower sensitivity than CRP, and the opposite is true when it comes to specificity.
In symptomatic and asymptomatic individuals with HIV, a risk score derived from TB signs and symptoms, risk factors, and hemoglobin levels was more sensitive and less specific. Conversely, urine LAM positivity showed the lowest sensitivity and specificity.
In this population, CRP level ⩾10 mg/L had a sensitivity and specificity of approximately 80% and 70%, respectively. The specificity of the test was similar in HIV-negative individuals, but the sensitivity was 6 percentage points lower in this group. However, a previous systematic review of the diagnostic accuracy of CRP found a high pooled sensitivity (93%) and moderate specificity (60%) for the detection of pulmonary TB, with no differences according to HIV status. 64 This may be due to differences in sample size. In smear-negative patients, anti-TB treatment had the lowest sensitivity and specificity, while a risk score derived from TB symptoms, risk factors, and typical CXR findings had moderate sensitivity (70%) and high specificity (82%). This suggests a need to move beyond sputum smear tests for diagnosing smear-negative patients. A risk score that includes CXR findings offers a potentially better alternative, although further research might be needed to improve its specificity.
The rationale for when to use a test requires judgment about whom to assess, the cost of the test, and the errors that occur if it does not accurately classify patients. 59 Currently available TB diagnostics do not have an “ideal algorithm.” However, an algorithm that achieves the lowest NNS and highest PPV, and whose validity is least affected by site-specific variation, is preferable. 65 In symptomatic HIV patients, lower and higher NNS were documented for antibiotic trials and risk scores from TBSSR, CD4 count, and ART, respectively. In symptomatic and asymptomatic HIV patients, urine LAM positivity had an NNS of 2. However, if prolonged cough is used as the initial screening rule for this population, 19 people should be evaluated to detect one person with active TB. This increases the number of healthy individuals undergoing diagnostic testing and exposes them to unplanned out-of-pocket costs.
This review provides a deeper insight into diagnostic prediction and demonstrates the wide range of predictors that influence the diagnosis of pulmonary TB in a resource-limited setting. However, this study has some limitations. The included studies focused on adults who sought healthcare; therefore, the results may not apply to people who do not seek healthcare. Finally, we restricted the inclusion of studies written in English. Therefore, some studies may have been overlooked.
Conclusions
The performance of predictors and diagnostic algorithms varies among patient subgroups. For example, in HIV patients, radiographic findings and body mass index were strong predictors of pulmonary TB. However, in HIV-negative individuals, the biomarkers showed a moderate association with the disease. The results showed that few models had reached the WHO’s recommendation, as any screening algorithm for TB should have at least a sensitivity of 90% and a specificity of 70%. Therefore, more work should be done to strengthen the predictive models for TB screening in the future, and they should be developed rigorously, considering the heterogeneity of the population in clinical work.
Supplemental Material
sj-doc-1-smo-10.1177_20503121241243238 – Supplemental material for Predictors contributing to the estimation of pulmonary tuberculosis among adults in a resource-limited setting: A systematic review of diagnostic predictions
Supplemental material, sj-doc-1-smo-10.1177_20503121241243238 for Predictors contributing to the estimation of pulmonary tuberculosis among adults in a resource-limited setting: A systematic review of diagnostic predictions by Gebremedhin Berhe Gebregergs, Gebretsadik Berhe, Kibrom Gebreslasie Gebrehiwot and Afework Mulugeta in SAGE Open Medicine
Footnotes
Acknowledgements
We would like to thank the College of Health Sciences, Mekelle University, and all management staff for facilitating and providing an internet connection for the review. I am deeply grateful to Mr. Hailu Abraha for editing the manuscript and improving its clarity and readability.
Author contributions
GBG and GB designed and conceptualized the study. GBG, GB, and KGG searched for, screened, and extracted data. GBG analyzed the data and wrote the first draft of the manuscript. GBG, GB, KGG, and AM reviewed and edited the manuscript. All authors have approved the final manuscript for publication.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Ethical approval
This study did not involve human participants and ethical approval was not required.
Informed consent
Not required.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
