Sage Journals: Discover world-class research

Abstract

Objective

Early and accurate diagnosis of pulmonary tuberculosis (PTB) is critical for disease control. This study aimed to develop and validate a clinical prediction model to differentiate adult inpatients with suspected PTB.

Methods

A retrospective cohort of 602 adults admitted to Changsha Central Hospital (January–October 2019) with suspected PTB was randomly divided into a modeling group (n = 421) and an internal validation group (n = 181). Univariate and multivariate logistic regression analyses in the modeling cohort identified independent clinical and radiological predictors. Model performance was assessed by the area under the receiver operating characteristic curve (AUC), calibration (Hosmer–Lemeshow test), and decision curve analysis. External validation was performed using 213 similar patients from Hunan Chest Hospital (January–October 2019).

Results

Five independent predictors were retained: age (years), erythrocyte sedimentation rate (mm/h), upper-lobe or dorsal-segment infiltration on chest imaging, presence of multifocal polymorphic pulmonary lesions, and pleural thickening (all p < 0.05). In the modeling cohort, AUC = 0.977 (95% confidence interval (CI): 0.964–0.989); at the optimal cutoff probability of 0.611, sensitivity was 95.9% and specificity 89.4%. Calibration was satisfactory in both internal (Hosmer–Lemeshow p = 0.815) and external (p = 0.973) validation cohorts. Decision curve analysis demonstrated net clinical benefit across relevant threshold probabilities.

Conclusions

The proposed five-variable model exhibited excellent discrimination, calibration, and clinical utility in both internal and external cohorts. Its simplicity and reliance on routinely available data make it a practical adjunct for clinicians in diagnosing PTB among adults with suspected disease.

Keywords

Tuberculosis pulmonary diagnosis clinical prediction models adults

Introduction

Tuberculosis is a chronic infectious disease caused by an infection with Mycobacterium tuberculosis (MTB), affecting approximately a quarter of the world's population.¹ In 2020, China ranked as the second-highest tuberculosis burden country worldwide.¹ Pulmonary tuberculosis, which refers to tuberculous lesions occurring in the lung tissue, trachea, bronchi, and pleura, is the most predominant type of tuberculosis, accounting for 80% to 90% of all tuberculosis in all organs.² Pulmonary tuberculosis is diagnosed based on clinicoradiological correlation with microbiological evidence of MTB or pathological findings.^1,3 However, in 2020, only 59% of the 4.8 million patients with pulmonary tuberculosis worldwide were diagnosed pathogenically.¹ Moreover, most patients are reluctant to undergo invasive examinations and further pathological investigations.^4–7 The use of molecular rapid diagnostic tests recommended by the World Health Organization to detect MTB is not yet widespread in primary care settings in China. Thus, a comprehensive model of multiple indicators such as demographic data, symptoms, signs, chest imaging, and laboratory findings could prove valuable in the early diagnosis of pulmonary tuberculosis.

A clinical prediction model involves leveraging the relationship between various factors and subsequent outcomes to estimate the probability of an individual presently having a disease or experiencing a future outcome based on a mathematical formula.⁸ Depending on the problem being studied, the model can be divided into diagnostic and prognostic. Moreover, it is widely used for screening high-risk groups, individualized disease diagnosis, treatment, and prevention.^8–10 Previous studies used the clinical prediction model to screen patients for suspected pulmonary tuberculosis and rapidly perform respiratory isolation and further confirmatory tests, including molecular techniques.^11,12 However, systematic reviews have shown that the methodological quality and reporting of prediction studies remains deficient.¹³ Most existing prediction models for tuberculosis are used in specific settings, such as HIV clinics^14–16 and contact surveys.^17,18 Conversely, prediction models for estimating adult pulmonary tuberculosis prevalence remains poorly reported and validated, and are not useful for tuberculosis screening.^19,20 Moreover, the larger the variations across regions and population, the more pronounced the impact on the predictive efficacy of clinical prediction model. With the increasing interest in the development and validation of clinical prediction models, it is critical to consider how these tools can best be built and implemented in clinical practice to improve the treatment process and patient prognosis.^21,22

Thus, we aimed to develop a diagnostic clinical prediction model for suspected pulmonary tuberculosis and fully validate its predictive efficacy to rapidly identify patients with suspected pulmonary tuberculosis, shorten the diagnosis time, and improve the diagnosis rate, allowing for rapid diagnosis and effective treatment, reducing transmission, and providing a basis for referral and confirmatory testing in primary care settings.

Methods

Study participants and setting

We consecutively included 602 adult inpatients between January and October 2019 at Changsha Central Hospital, and randomly divided into the modeling (n = 421) and internal validation (n = 181) group at a ratio of 7:3 (Figure 1). Additionally, 213 adult inpatients between January and October 2019 at Hunan Chest Hospital were included as the external validation group.

Figure 1.

Flow chart of the study population. (a) The study population flow at Changsha Central Hospital, while (b) depicts the study population flow at Hunan Chest Hospital. The exclusion criteria were as follows: (1) those who were already receiving antituberculosis drugs before admission, (2) those who were diagnosed with antituberculosis after admission, and (3) those with incomplete clinical information.

The inclusion criteria are as follows: (1) age ≥ 15 years; (2) patients with suspected pulmonary tuberculosis: identified according to the 2017 China pulmonary tuberculosis diagnostic criteria (WS 288-2017)³ and the 2019 World Health Organization guidelines for tuberculosis.²³ The identification was based on set of clinical symptoms, including cough for more than 2 weeks, coughing sputum or blood in sputum or hemoptysis, chest pain, fatigue, loss of appetite, weight loss, fever and night sweats, and with active pulmonary tuberculosis imaging features.

The exclusion criteria were as follows: (1) those who were already receiving antituberculosis drugs before admission, (2) those who were diagnosed with antituberculosis after admission, and (3) those with incomplete clinical information.

For the sample size estimation, a minimum of 10 positive outcome events per predictor variable was required for the logistic regression analysis.^24,25

According to the diagnostic criteria for pulmonary tuberculosis (WS 288-2017),³ a patient is considered to be diagnosed if one of the following conditions is met: (1) two sputum smears positive for acid fast bacilli; (2) one sputum smear positive for acid fast bacilli in conjunction with imaging features of active pulmonary tuberculosis; (3) one sputum smear positive for acid fast bacilli in conjunction with one sputum positive for mycobacterial culture; (4) imaging features of active pulmonary tuberculosis in conjunction with at least two negative sputum smears with positive mycobacterial culture; (5) imaging features of active pulmonary tuberculosis in conjunction with positive MTB nucleic acid test; and (6) positive lung histopathology for tuberculosis.

Data collection methods

Clinical data were collected from the modeling as well as the internal and external validation groups, which include: (1) demographic data: age, sex, and occupation; (2) risk factors: history of pulmonary tuberculosis or exposure, smoking, malnutrition, HIV infection, history of diabetes, history of pneumoconiosis, and use of glucocorticoids/immunosuppressants; (3) typical pulmonary tuberculosis symptoms: cough and coughing sputum for 2 weeks or longer (or blood in sputum or hemoptysis), chest pain, night sweats, fatigue, intermittent or persistent afternoon fever, loss of appetite, and weight loss; (4) laboratory tests: neutrophil-to-lymphocyte ratio (NLR), hemoglobin, albumin, erythrocyte sedimentation rate (ESR), purified protein derivative (PPD) skin test; and (5) pulmonary imaging: infiltration of the upper lobe or dorsal segment of the lung, multifocal polymorphic lesions with multidensity in the lungs, pulmonary cavity, pleural thickening, and mediastinal and hilar lymphadenopathy. After removing missing information, the final predictive variables included the following: age, sex, past history of pulmonary tuberculosis, history of exposure to pulmonary tuberculosis, smoking, history of diabetes, history of pneumoconiosis, use of glucocorticoids/immunosuppressants, cough and coughing sputum for 2 weeks or more, blood in sputum or hemoptysis, chest pain, night sweats, fatigue, intermittent or persistent afternoon fever, loss of appetite, weight loss, NLR, hemoglobin, albumin, ESR, infiltration of the upper lobe or dorsal segment of the lung, multifocal polymorphic lesions with multidensity in the lungs, pulmonary cavity, pleural thickening, and mediastinal and hilar lymphadenopathy.

Statistical analysis

EmpowerStats version 4.1 was used to analyze the data for clinical characteristics. The R software version 4.1.2 was used to perform a single factor unconditional logistic regression analysis to initially screen single indicators that were predictive of the diagnosis of pulmonary tuberculosis. Subsequently, a multifactor unconditional logistic regression analysis was performed on indicators that were significant in the single factor analysis and clinically indicated with p < 0.05 for statistical significance. The β coefficients of the independent predictors in the multifactor logistic regression analysis were used to build the prediction model, and alignment diagrams were drawn using the R software version 4.1.2 rms package, with an alignment diagram as the presentation of the model. Using the R software version 4.1.2 pROC package, the receiver operating characteristic (ROC) curves were plotted. The area under the ROC curve (AUC) was used to evaluate the discriminatory ability of the model, and the cut-off values and corresponding sensitivity and specificity were determined using the Youden index. Finally, the predictive efficacy of the model was evaluated through internal and external validations. The calibration curve was used to evaluate the agreement between observed and expected outcomes, and the decision curve analysis method was used to evaluate the clinical usefulness of the prediction model. The calibration curves were plotted using the R software version 4.1.2 rms and foreign packages, and the decision analysis curves were plotted using the rmda, rms, and foreign packages.

Results

Clinical characteristics of the three groups of patients

Measurement data with a normal distribution were expressed as mean ± standard deviation, and those with a skewed distribution were expressed as median (maximum, minimum). The enumeration data were described by the number of cases and constituent ratio. Table 1 shows the clinical characteristic across the three groups of patients. The modeling group (n = 421) comprised 298 males (70.78%) and 123 females (29.22%), with ages ranging from 15 to 93 years. Majority of the cases consisted of 317 (75.30%) individuals with pulmonary tuberculosis, while 104 (24.70%) were nontuberculosis-related cases, which include 62 (59.62%) pneumonia, 17 (16.35%) chronic obstructive pulmonary disease, 3 (2.88%) chronic bronchitis, 12 (11.54%) lung cancer, 5 (4.81%) bronchiectasis, 4 (3.84%) lung abscess, and 1 (0.96%) pulmonary embolism. The internal validation group (n = 181) comprised 118 males (65.19%) and 63 females (34.81%), with ages ranging from 16 to 94 years. Among these, 134 (74.03%) comprise those with pulmonary tuberculosis and 47 (25.97%) with nontuberculosis diseases, including 27 (57.45%) cases of pneumonia, 6 (12.76%) chronic obstructive pulmonary disease, 4 (8.51%) chronic bronchitis, 6 (12.76%) lung cancer, 2 (4.26%) bronchiectasis, and 2 (4.26%) lung abscess. The external validation group (n = 213) comprised of 142 males (66.67%) and 71 females (33.33%), with ages ranging from 15 to 87 years. Of these, 109 (51.17%) and 104 (48.83%) were pulmonary tuberculosis and nontuberculosis cases, respectively. Of the nontuberculosis cases, 61 (58.65%) were pneumonia, 22 (21.15%) chronic obstructive pulmonary disease, 3 (2.88%) chronic bronchitis, 15 (14.42%) lung cancer, 2 (1.92%) lung abscess, and 1(0.96%) interstitial lung disease.

Table 1.

Analysis of the clinical characteristics of the three groups of patients.

Predictive factors	Modeling group (n = 421)	Internal validation group (n = 181)	External validation group (n = 213)
	x̅ ± S or M (Min, Max)	x̅ ± S or M (Min, Max)	x̅ ± S or M (Min, Max)
Age	52.88 ± 18.87	50.01 ± 17.81	52.22 ± 18.38
NLR	4.189 (0.20,122.00)	3.91(0.37,36.65)	3.11(0.66,39.38)
ESR	42.00(1.00,150.00)	38.00(2.00,143.00)	33.00(1.00,119.00)
Hemoglobin	118.98 ± 19.53	119.89 ± 20.65	120.19 ± 16.34
Albumin	35.40 ± 6.97	36.50 ± 7.16	38.43 ± 6.00
	N (%)	N (%)	N (%)
Age groups
1 15–29 years	76 (18.05%)	31 (17.13%)	39 (18.31%)
2 30–44 years	53 (12.59%)	36 (19.89%)	26 (12.21%)
3 45–59 years	108 (25.65%)	49 (27.07%)	62 (29.11%)
4 ≥ 60 years	184 (43.71%)	65 (35.91%)	86 (40.37%)
Sex
0 Male	298 (70.78%)	118 (65.19%)	142 (66.67%)
1 Female	123 (29.22%)	63 (34.81%)	71 (33.33%)
Past history of pulmonary tuberculosis
0 No	361 (85.75%)	151 (83.43%)	181 (84.98%)
1 Yes	60 (14.25%)	30 (16.57%)	32 (15.02%)
History of exposure to pulmonary tuberculosis
0 No	383 (90.97%)	162 (89.50%)	198 (92.96%)
1 Yes	38 (9.03%)	19 (10.50%)	15 (7.04%)
Smoking
0 No	233 (55.34%)	111 (61.33%)	127 (59.62%)
1 Yes	188 (44.66%)	70 (38.67%)	86 (40.38%)
History of diabetes
0 No	326 (77.43%)	149 (82.32%)	182 (85.45%)
1 Yes	95 (22.57%)	32 (17.68%)	31 (14.55%)
History of pneumoconiosis
0 No	411 (97.62%)	178 (98.34%)	210 (98.59%)
1 Yes	10 (2.38%)	3 (1.66%)	3 (1.41%)
Use of glucocorticoids/ immunosuppressants
0 No	401 (95.25%)	173 (95.58%)	205 (96.24%)
1 Yes	20 (4.75%)	8 (4.42%)	8 (3.76%)
Cough and coughing sputum for ≥ 2 weeks
0 No	41 (9.74%)	17 (9.39%)	14 (6.57%)
1 Yes	380 (90.26%)	164 (90.61%)	199 (93.43%)
Blood in the sputum or hemoptysis
0 No	358 (85.04%)	135 (74.59%)	170 (79.81%)
1 Yes	63 (14.96%)	46 (25.41%)	43 (20.19%)
Fever
0 No	299 (71.02%)	126 (69.61%)	164 (77.00%)
1 Yes	122 (28.98%)	55 (30.39%)	49 (23.00%)
Night sweats
0 No	349 (82.90%)	154 (85.08%)	175 (82.16%)
1 Yes	72 (17.10%)	27 (14.92%)	38 (17.84%)
Fatigue
0 No	321 (76.25%)	141 (77.90%)	176 (82.63%)
1 Yes	100 (23.75%)	40 (22.10%)	37 (17.37%)
Loss of appetite
0 No	214 (50.83%)	104 (57.46%)	101 (47.42%)
1 Yes	207 (49.17%)	77 (42.54%)	112 (52.58%)
Weight loss
0 No	305 (72.45%)	146 (80.66%)	180 (84.51%)
1 Yes	116 (27.55%)	35 (19.34%)	33 (15.49%)
Chest pain
0 No	359 (85.27%)	161 (88.95%)	170 (79.81%)
1 Yes	62 (14.73%)	20 (11.05%)	43 (20.19%)
Infiltration of the upper lobe or dorsal segment of the lung
0 No	144 (34.20%)	56 (30.94%)	101 (47.42%)
1 Yes	277 (65.80%)	125 (69.06%)	112 (52.58%)
Multifocal polymorphic lesions with multidensity in the lungs
0 No	109 (25.89%)	44 (24.31%)	99 (46.48%)
1 Yes	312 (74.11%)	137 (75.69%)	114 (53.52%)
Pulmonary cavity
0 No	221 (52.49%)	108 (59.67%)	147 (69.01%)
1 Yes	200 (47.51%)	73 (40.33%)	66 (30.99%)
Pleural thickening
0 No	291 (69.12%)	129 (71.27%)	167 (78.40%)
1 Yes	130 (30.88%)	52 (28.73%)	46 (21.60%)
Hilar lymphadenopathy
0 No	360 (85.51%)	156 (86.19%)	182 (85.45%)
1 Yes	61 (14.49%)	25 (13.81%)	31 (14.55%)
Mediastinal lymphadenopathy
0 No	279 (66.27%)	128 (70.72%)	163 (76.53%)
1 Yes	142 (33.73%)	53 (29.28%)	50 (23.47%)
Pulmonary tuberculosis
0 No	104 (24.70%)	47 (25.97%)	104 (48.83%)
1 Yes	317 (75.30%)	134 (74.03%)	109 (51.17%)

ESR: erythrocyte sedimentation rate; NLR: neutrophil-to-lymphocyte ratio.

Single-factor analysis of predictors of pulmonary tuberculosis

This study employed a single-factor unconditional logistic regression analysis, and the results indicated that age, NLR, ESR, hemoglobin, albumin, infiltration of the upper lobe or dorsal segment of the lung, multifocal polymorphic lesions with multiple densities in the lungs, pulmonary cavity, and pleural thickening were predictors of pulmonary tuberculosis (p < 0.05), as shown in Table 2.

Table 2.

Results of single factor logistic regression analysis.

Predictive factors (No.)	OR (95% CI)	Z-value	p-value
Age (v1)	0.75 (0.60–0.93)	−2.68	0.007
Sex (v2)	0.60 (0.38–0.96)	−2.13	0.033
History of pulmonary tuberculosis (v3)	0.89 (0.48–1.65)	−0.38	0.703
History of exposure to pulmonary tuberculosis (v4)	1.83 (0.74–4.52)	1.32	0.187
Smoking (v5)	1.40 (0.89–2.20)	1.46	0.144
History of diabetes (v6)	1.53 (0.87–2.71)	1.47	0.141
History of pneumoconiosis (v7)	1.32 (0.28–6.32)	0.35	0.728
Use of glucocorticoids/immunosuppressants (v8)	3.07 (0.70–13.46)	1.49	0.137
Cough and coughing sputum for ≥ 2 weeks (v9)	0.84 (0.39–1.83)	−0.43	0.667
Blood in the sputum or hemoptysis (v10)	1.18 (0.62–2.23)	0.50	0.621
Fever (v11)	1.39 (0.84–2.32)	1.28	0.202
Night sweats (v12)	1.18 (0.64–2.16)	0.54	0.592
Fatigue (v13)	1.82 (1.02–3.23)	2.03	0.043
Loss of appetite (v14)	0.86 (0.56–1.35)	0.65	0.517
Weight loss (v15)	2.54 (1.42–4.54)	3.13	0.002
Chest pain (v16)	0.59 (0.33–1.05)	−1.80	0.072
NLR (q1)	1.15 (0.06–0.21)	3.63	<0.001
ESR (q2)	1.03 (0.02–0.04)	6.58	<0.001
Hemoglobin (q3)	0.98 (−0.04–-0.01)	-3.86	<0.001
Albumin (q4)	0.86 (−0.040.19–-0.11)	-6.86	<0.001
Infiltration of the upper lobe or dorsal segment of the lung (v17)	15.33 (8.88–26.47)	9.79	<0.001
Multifocal polymorphic lesions with multidensity in the lungs (v18)	61.07 (31.50–118.39)	12.18	<0.001
Pulmonary cavity (v19)	7.61 (4.27–13.56)	6.89	<0.001
Pleural thickening (v20)	6.52 (3.17–13.39)	5.10	<0.001
Hilar lymphadenopathy (v21)	0.83 (0.45–1.52)	-0.62	0.536
Mediastinal lymphadenopathy (v22)	1.62 (0.99–2.66)	1.92	0.055

CI: confidence interval; ESR: erythrocyte sedimentation rate; NLR, neutrophil-to-lymphocyte ratio; OR: odds ratio.

Multifactor analysis of predictors of pulmonary tuberculosis

This study performed a multifactor unconditional logistic regression analysis on indicators that were significant in both the single-factor analysis and those that were clinically significant. The results indicated that age, ESR, infiltration of the upper lobe or dorsal segment of the lung, multifocal polymorphic lesions with multiple densities in the lungs, and pleural thickening were independent predictors of pulmonary tuberculosis (p < 0.05), as shown in Table 3.

Table 3.

Results of multifactor logistic regression analysis.

Predictive factors	β	OR (95%CI)	Z-value	p-value
Constants	−3.64		−4.25	<0.001
Age	−0.47	0.63 (0.40–0.95)	−2.14	0.032
ESR	0.04	1.04 (1.02–1.06)	4.38	<0.001
Infiltration of the upper lobe or dorsal segment of the lung	3.61	37.03 (13.65–124.67)	6.49	<0.001
Multifocal polymorphic lesions with multidensity in the lungs	4.20	66.39 (35.45–213.47)	7.86	<0.001
Pleural thickening	1.65	5.20 (1.64–19.81)	2.63	0.009

CI: confidence interval; ESR: erythrocyte sedimentation rate; OR: odds ratio.

Development of diagnostic clinical predictive models for suspected pulmonary tuberculosis in adults

Based on the β coefficients of the independent predictors in the multifactor logistic regression analysis of pulmonary tuberculosis predictors, a diagnostic prediction model has been developed with the mathematical formula: P = 1/{1 + exp[–(–3.64–0.47 Age+0.04 ESR+3.61 infiltration of the upper lobe or dorsal segment of the lung+4.20 multifocal polymorphic lesions with multidensity in the lungs+1.65 pleural thickening)]}, followed by a nomogram as the presentation of the model (Figure 2). The ROC curve analysis was conducted on the modeling group, resulting in an AUC of 0.977. The Youden index determined a cut-off value of 0.611, where a score ≥ 0.611 and < 0.611 was considered as a high and low probability of pulmonary tuberculosis, respectively. The analysis revealed a sensitivity of 0.959 and a specificity of 0.894 (Figure 3). The calibration curve was evaluated for accuracy, and it fitted well (S: p-value =0.842), as shown in Figure 4. The decision curve analysis (DCA) was used to evaluate the clinical usefulness of the prediction model, yielding a good result (Figure 5).

Figure 2.

Nomogram of the diagnostic prediction model. v1 represents age, q2 represents erythrocyte sedimentation rate (ESR), v17 represents infiltration of the upper lobe or dorsal segment of the lung, v18 represents multifocal multidensity polymorphic lesions of the lung, v20 represents pleural thickening. Points assigned for age categories were: 15–30 years (21 points), 30–45 years (14 points), 45–60 years (7 points), and ≥60 years (0 point); ESR in 0–20 mm/h corresponded to a point of 0–12.5, 20–40 mm/h to a point of 12.5–25, 40–60 mm/h to a point of 25–37.5, and so forth; the presence of infiltration of the upper lobe or dorsal segment scored 55 points, otherwise 0; the presence of multifocal multidensity polymorphic lesions scored 63 points, otherwise 0; the presence of pleural thickening scored 25 points, otherwise 0; by adding up the points for each predictor and obtaining the total points, the predicted probability of pulmonary tuberculosis can be obtained.

Figure 3.

ROC curve and cut-off value for modeling group. The horizontal and vertical coordinates indicate 1-specificity and sensitivity, respectively. The AUC indicates the area under the ROC curve, which is 0.977. An AUC of 0.50–0.70, 0.71–0.90, and >0.90 signifies low, moderate, and high discriminatory power, respectively. The cut-off value (specificity, sensitivity) determined by the Youden index is 0.611 (0.894, 0.959) at the upper left tangent. ROC: receiver operating characteristic.

Figure 4.

Calibration curve for modeling group. The horizontal coordinate is the predicted probability and the vertical coordinate is the observed probability. The intercept is 0 and the slope is 1, which is consistent with good calibration. The dashed line represents perfect calibration and the solid line corresponds to the calibration of this model in the modeling group, showing that the calibration curve highly overlaps with the perfect calibration curve. S: p-value is 0.842, and S: p > 0.05 is statistically significant.

Figure 5.

Decision analysis curve for modeling group. The decision curve analysis compared the net benefits of different treatment strategies. The horizontal coordinate indicates the high risk threshold, that is, equal benefit of with and without treatment at that probability, and the vertical coordinate indicates the net benefit, calculated as the total benefit (treatment of true pulmonary tuberculosis patients) minus the total harm (treatment of false positive pulmonary tuberculosis patients). The gray line represents the net benefit of predicting treatment for all patients with the disease, the horizontal solid line represents the net benefit of predicting no treatment for all patients without the disease, and the red line shows the decision analysis curve for the model in the modeling group. The farther the model from the two reference lines, the better the clinical usefulness of the model.

Validation of diagnostic clinical predictive models for suspected pulmonary tuberculosis in adults

The predictive efficacy of the model was evaluated through internal and external validations, and the AUC was used to evaluate the discriminatory ability of the model, yielding an AUC of 0.990 and 0.985 for the internal and external validation groups, respectively (Figures 6 and 7). The calibration curves were evaluated for accuracy, and the calibration curves for the internal and external validation fitted well (S: p-values = 0.815 and 0.973, respectively), as shown in Figures 8 and 9. The DCA was used to evaluate the clinical usefulness of this prediction model, which was good, as shown in Figures 10 and 11.

Figure 6.

ROC curve for the internal validation group. The horizontal and vertical coordinates indicate the 1-specificity and sensitivity, respectively. The AUC is the area under the ROC curve, which was 0.990. The upper left tangent indicates the cut-off value (specificity, sensitivity) determined by the Youden index, which was 0.842(0.957, 0.963). ROC: receiver operating characteristic.

Figure 7.

ROC curve for the external validation group. The horizontal and vertical coordinates indicate 1-specificity and sensitivity, respectively. The AUC indicates the area under the ROC curve, which was 0.985. The upper left tangent indicates the cut-off value (specificity, sensitivity) determined by the Youden index, which was 0.878 (0.990, 0.890). ROC: receiver operating characteristic.

Figure 8.

Calibration curve for the internal validation group. The horizontal and vertical coordinates indicate the predicted and observed probability, respectively. The intercept was 0 and the slope was 1, which was consistent with good calibration. The dashed line represents perfect calibration, the solid line corresponds to the calibration of this model in the internal validation, and the S: p-value was 0.815, which was statistically significant.

Figure 9.

Calibration curve for external validation group. The horizontal and vertical coordinates indicate the predicted and observed probability, respectively. The intercept was 0 and the slope was 1, which was consistent with good calibration. The dashed line represents perfect calibration, the solid line corresponds to the calibration of this model in the external validation, and the S: p-value was 0.973, which was statistically significant.

Figure 10.

Decision analysis curve for the internal validation group. The decision curve analysis compared the net benefits across different treatment strategies. The horizontal coordinate indicates the high risk threshold, that is, equal benefit of with and without treatment at that probability. The vertical coordinate indicates the net benefit, calculated as the total benefit (treatment of true pulmonary tuberculosis patients) minus the total harm (treatment of false-positive pulmonary tuberculosis patients). The gray line represents the net benefit of predicting treatment for all patients with the disease, while the horizontal solid line represents the net benefit of predicting no treatment for all patients without the disease. The red line shows the decision analysis curve for the internal module; the farther the model from the two reference lines, the better the clinical usefulness of the model.

Figure 11.

Decision analysis curve for the external validation group. The decision curve analysis compared the net benefits across different treatment strategies. The horizontal coordinate indicates the high risk threshold, that is, equal benefit of with and without treatment at that probability, and the vertical coordinate indicates the net benefit, calculated as the total benefit (treatment of true pulmonary tuberculosis patients) minus the total harm (treatment of false-positive pulmonary tuberculosis patients). The gray line represents the net benefit of predicting treatment for all patients with the disease, and the horizontal solid line represents the net benefit of predicting no treatment for all patients without the disease. The red line shows the decision analysis curve for the external module; the farther the model from the two reference lines the better the clinical usefulness of the model.

Discussion

Tuberculosis is a preventable and curable disease, and approximately 85% of patients with tuberculosis can be successfully cured through 6 months of drug treatment,¹ while reducing the continued spread of infection. However, the current tuberculosis prevention and treatment remains far from satisfactory, and tuberculosis still ranking among the leading causes of death worldwide, and its incidence continuous to rise. This is largely attributed to the difficulty in diagnosing pulmonary tuberculosis, owing to the following reasons. First, most patients show atypical clinical symptoms, especially older patients.⁷ Second, the PPD skin test and γ-interferon release test are influenced by a variety of factors related to the immune status of the body and are mostly used to detect tuberculosis infection,^26,27 however, it can only indicate the presence of latent tuberculosis infection and have limited value in assessing the activity of pulmonary tuberculosis.²⁸ Third, there exists a prevailing pessimism or outright rejection of MTB antibody testing both domestically and internationally.²⁹ Fourth, atypical pulmonary tuberculosis is increasing and its imaging manifestations are becoming more diverse owing to the misuse of antibiotics and the increasing number of immunosuppressed patients.³⁰ Last, smear microscopy has low sensitivity and specificity, and the MTB culture may take several weeks to provide results, while molecular techniques are relatively expensive and not yet widely accessible in low- and middle-income countries, especially in primary care hospitals. Moreover, pathology diagnosis is mostly combined with invasive examination, which are not easily accepted by patients.¹ Hence, further studies are required to significantly progress in the diagnosis of pulmonary tuberculosis to ensure early detection, diagnosis, and treatment.

Clinical prediction models have been widely used in the screening of high-risk groups, individualized disease diagnosis, and treatment and prevention. Thus, the establishment of a clinical prediction models with good predictive efficacy is essential to improve the treatment process and patient prognosis. The current status of pulmonary tuberculosis remains far from satisfactory, and a large proportion of patients with pulmonary tuberculosis are being omitted and misdiagnosed. There is an urgent need to develop a rapid, simple, and accurate clinical prediction model to guide clinicians in the early detection, diagnosis, and treatment to reduce morbidity and mortality. Hence, several studies have been conducted to establish and validate diagnostic clinical prediction models for pulmonary tuberculosis. Solari et al.³¹ found that age, history of pulmonary tuberculosis, weight loss, pulmonary cavity, upper lobe infiltration, and pulmonary miliary, were independent predictors of pulmonary tuberculosis using information collected from patients with respiratory symptoms in the emergency department of a hospital in areas with a high prevalence of pulmonary tuberculosis. In sub-Saharan Africa, Baik et al.³² developed a prediction model for active pulmonary tuberculosis comprising six variables: age, sex, HIV infection, history of diabetes, number of typical tuberculosis symptoms, and duration of symptoms for more than 14 days. Chen et al.³³ found that age, hemoglobin, lymphocyte count, γ-interferon release test, weight loss, night sweats, polymorphic lesions in the lungs, and foci of pulmonary calcification were associated with pulmonary tuberculosis. Among HIV-infected patients, eight statistically significant differences were observed between patients with pulmonary tuberculosis and those with nontuberculous lung infections, including fever, highest body temperature, ESR, cervical lymphadenopathy, hilar and/or mediastinum lymphadenopathy, pulmonary cavity, pleural effusion, and pulmonary miliary nodules.¹⁴ The results of a study by Hanifa et al.³⁴ on HIV patients showed that predictors of pulmonary tuberculosis included antiretroviral treatment status, body mass index (BMI), CD4+ T cell count, and the number of typical tuberculosis symptoms. Saunders et al.¹⁸ found that among adult pulmonary tuberculosis contacts, the predictors of pulmonary tuberculosis included age, sex, BMI, history of previous pulmonary tuberculosis, index of continuous exposure, lower household socioeconomic status, indoor air pollution, and fewer windows in the room in which they lived. A systematic review found that infiltration of the upper lobe of the lung and pulmonary cavity was significantly associated with pulmonary tuberculosis.³⁵ Some studies have suggested that a history of pneumoconiosis,³⁶ NLR,³⁷ and pleural thickening³⁸ are some significant predictors of pulmonary tuberculosis. Studies have also suggested that C-reactive protein level at the first visit is not an independent predictor of pulmonary tuberculosis.¹⁶

Our study included 26 predictors associated with pulmonary tuberculosis, and a diagnostic clinical prediction model for suspected pulmonary tuberculosis in adults has been developed using statistical methods, which included five independent predictors: age, ESR, infiltration of the upper lobe or dorsal segment of the lung, multifocal polymorphic lesions with multiple densities in the lungs, and pleural thickening. The results of the present study differ from those of the aforementioned studies. These five factors, including the three imaging indicators, are simple and clinically accessible, underscoring the indispensable role of imaging in the early diagnosis of pulmonary tuberculosis. All typical pulmonary tuberculosis symptoms in this study were eliminated in the single factor analysis, suggesting that the contribution of symptoms to the early diagnosis of pulmonary tuberculosis was not significant and may be related to the following reasons. First, some patients were asymptomatic or had no obvious symptoms at the onset of the disease, but were admitted with imaging findings suggestive of pulmonary lesions as the main complaint. Second, most of the nontuberculosis patients included had pneumonia, acute exacerbation of chronic obstructive pulmonary disease, and lung cancer. Additionally, they exhibit symptoms such as chronic cough, coughing sputum, hemoptysis or blood in sputum, and fever, which were similar to those of patients with pulmonary tuberculosis. Some studies have suggested that malnutrition,¹ BMI,³⁴ PPD skin test,¹⁵ and HIV infection³² are independent predictors of pulmonary tuberculosis. Due to the fact that height was not recorded in the medical record system of Changsha Central Hospital, that PPD skin test results were not fully recorded and that the number of HIV cases was considerably small, these three indicators were not included in this study. Moreover, age was found to be statistically insignificant when used as a continuous variable. After classifying age into four groups and subsequently performing single and multifactor logistic regression analysis, the results showed that age was an independent predictor of pulmonary tuberculosis diagnosis, which was similar to the findings of Solari et al.³¹ and Baik et al.³² Thus, there is a need to consider whether continuous variables such as hemoglobin, albumin, and ESR need to be transformed into classified variables prior to statistical analysis, which can be trialed in subsequent studies to build new prediction models. In addition, the results of single-factor analysis suggested that a history of diabetes was not statistically significant. Given that a history of diabetes is one of the risk factors for tuberculosis proposed by the World Health Organization and that pulmonary tuberculosis is commonly associated with diabetes in clinical practice, the inclusion of diabetes in the multifactor analysis for modeling led to a better predictive performance than a model that included only indicators that were meaningful for the single-factor analysis.

The discriminatory ability of this simple clinical prediction model was evaluated by an AUC (AUC=0.982) and was broadly similar, or superior to the previously published prediction models that incorporated parameters including HIV infection (AUC=0.82),³² diabetes (AUC=0.84),³⁹ BMI (AUC=0.79),³⁴ hemoglobin (AUC=0.89),⁴⁰ CD4+ T cell count (AUC=0.79),³⁴ PPD skin test (AUC=0.8751),¹⁵ and chest imaging features (AUC=0.84).³⁹ Notably, some prediction models have been explicitly designed for use in HIV-positive populations,^14,15 where pulmonary tuberculosis tends to be more difficult to diagnose. Hence, there is a risk that these prediction models may have a limited ability to be generalized to HIV-negative populations. Other prediction models have been developed for tuberculosis contacts,^17,18 and most of these contacts are asymptomatic for pulmonary tuberculosis, which has implications for generalization. A systematic review highlighted that the existing clinical prediction models for pulmonary tuberculosis lacked or had inadequate validation.^19,20 Chen et al.⁴¹ established a diagnostic model for pathogen-negative first-treatment pulmonary tuberculosis patients, which showed good discriminatory ability; however, clinical usefulness evaluation and external validation were not performed. Our model was evaluated for accuracy in both the internal and external validation groups, and the results showed that the calibration curves fit well with S: p-values of 0.815 and 0.973, respectively, and both performed well in the clinical usefulness evaluation. The clinical prediction model showed good discriminatory power, accuracy, and clinical usefulness, and the external validation results indicated that the model had some generalizability. Our ultimate goal was to develop a diagnostic clinical prediction model with good predictive performance to rapidly identify patients with suspected pulmonary tuberculosis, shorten the diagnosis time, enable rapid diagnosis and effective treatment, reduce transmission, and provide a basis for referral from primary care hospitals for confirmatory testing and even diagnostic antituberculosis treatment to guide clinical practice.

Limitations and prospects

Our study has certain limitations. Some of the factors that may be relevant to the prediction of pulmonary tuberculosis were not included in our study owing to missing data. The data were collected at specialized tuberculosis hospitals, where the number of pulmonary tuberculosis cases was significantly higher than that of nontuberculosis cases, and no further case information was collected from other populations to ensure a natural population for the sample. Additionally, the positive outcome event of our study was only a definite diagnosis of pulmonary tuberculosis, so clinically diagnosed pulmonary tuberculosis cases were excluded from the study design.

This study proposes the following prospects: the model's predictive performance can be enhanced through prospective data collection. Ongoing data gathering can facilitate the development of new diagnostic predictive models, by incorporating indicators, which may hold significance but were not yet considered in this study. These comprise the data of clinically diagnosed pulmonary tuberculosis inpatients, PPD skin test, BMI, and immune-related indicators, including lymphocyte count, biomarkers related to transcriptomics, proteomics, and metabolomics.

Conclusions

Our study developed a prediction model comprising age, erythrocyte sedimentation rate, infiltration of the upper lobe or dorsal segment of the lung, multifocal polymorphic lesions with multiple densities in the lungs, and pleural thickening to predict pulmonary tuberculosis in adults suspected of having pulmonary tuberculosis. This model serves as a valuable tool to assist clinicians in the decision-making process.

Footnotes

List of abbreviations

Acknowledgements

We thank all patients and their families involved in this study. We would like to thank Mr Quan Zhou and Mrs Xueqing Zhang of the First People's Hospital of Changde City for their help in the revision process.

ORCID iDs

Daiyan Fu

Tian Luo

Zhiyi He

Ethics approval

This study was approved by the Ethics Committee of the Hunan Provincial People's Hospital (NO.2021-31), Changsha Central Hospital (NO.2022-S0132), and Hunan Chest Hospital (NO.2022-031).

Consent for publication

Not applicable.

Contributorship

ZH, DF, and TL conceived and designed the study. DF and TL drafted the manuscript. DF, TL, JC, MX, XY, and HT collected the data. TL, JC, MX, XY, and HT analyzed the data. XY and HT prepared figures and tables. ZH, AD, and RH edited and revised the manuscript. ZH approved the final version of this manuscript. All authors have read and approved the final manuscript. The study we carried out strictly complies with the Declaration of Helsinki.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Changsha Science and Technology Plan Project, Natural Science Foundation of Hunan Province, Innovation Platform Project of Department of Science and Technology of Hunan Province, Key Project of Hunan Provincial Department of Education (grant number No. kq2004112, No. 2022JJ70094, No. 2023Sk4056, No. 20A298).

Informed consent

Due to the nature of the retrospective study and the anonymous processing of data prior to analysis, the Ethics Committee of the Hunan Provincial People's Hospital, Changsha Central Hospital and Hunan Chest Hospital approved the waiver of informed consent.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Availability of data and materials

The datasets used and/or analyzed during the study are available from the corresponding author upon reasonable request.

References

World Health Organizaiton. Global tuberculosis report. 2021. https://www.who.int/publications/i/item/9789240037021. (Accessed 1 Nov 2021).

Chinese Medical Association, Chinese Medical Journals Publishing House, Chinese Society of General Practice, Infection Group of Chinese Thoracic Society, Editorial Board of Chinese Journal of General Practitioners of Chinese Medical Association, Expert Group of Guidelines for Primary Care of Respiratory System Disease. Guideline for primary care of pulmonary tuberculosis (2018). Zhonghua Quan Ke Yi Shi Za Zhi 2019; 18: 709–717.

National Health and Family Planning Commission of the People's Republic of China. Diagnosis for pulmonary tuberculosis (WS 288-2017). 2017. http://www.nhc.gov.cn/ewebeditor/uploadfile/2017/11/20171128164254246. (Accessed 20 Feb 2020).

Technical Guidance Group of the Fifth National Tuberculosis Epidemiological Survey, The Office of the Fifth National Tuberculosis Epidemiological Survey. The fifth national tuberculosis epidemiological survey in 2010. Zhongguo Fang Lao Za Zhi 2012; 34: 485–508.

Desalegn

Kitila

Balcha

, et al. Misdiagnosis of pulmonary tuberculosis and associated factors in peripheral laboratories: a retrospective study, Addis Ababa, Ethiopia. BMC Res Notes 2018; 11: 291.

Tang

Lin

Guo

, et al. Key determinants of misdiagnosis of tracheobronchial tuberculosis among senile patients in contemporary clinical practice: a retrospective analysis. World J Clin Cases 2021; 9: 7330–7339.

Hikone

Ainoda

Sakamoto

, et al. Clinical characteristics of elderly pulmonary tuberculosis in an acute-care general hospital in Tokyo, Japan: a 12-year retrospective study. J Infect Chemother 2020; 26: 245–250.

Zhou

Wang

, et al. In-depth mining of clinical data: the construction of clinical prediction model with R. Ann Transl Med 2019; 7: 796.

Van Smeden

Reitsma

Riley

, et al. Clinical prediction models: diagnosis versus prognosis. J Clin Epidemiol 2021; 132: 142–145.

10.

Dai

Yang

Shen

. The power of clinical data empowered by clinical prediction model: an R tutorial. Ann Transl Med 2020; 8: 77.

11.

Chitnis

Davis

Schecter

, et al. Review of nucleic acid amplification tests and clinical prediction rules for diagnosis of tuberculosis in acute care facilities. Infect Control Hosp Epidemiol 2015; 36: 1215–1225.

12.

Deo

Singh

Jha

, et al. Predicting the impact of patient and private provider behavior on diagnostic delay for pulmonary tuberculosis patients in India: a simulation modeling study. PLoS Med 2020; 17: e1003039.

13.

Cowley

Farewell

Maguire

, et al. Methodological standards for the development and evaluation of clinical prediction rules: a review of the literature. Diagn Progn Res 2019; 3: 16.

14.

Ouyang

Yuan

Chen

, et al. The development and validation of a diagnostic scoring system to differentiate pulmonary tuberculosis from non-tuberculosis pulmonary infections in HIV-infected patients with severe immune suppression. BMC Infect Dis 2021; 21: 863.

15.

Coimbra

Maruza

Mde

, et al. Validating a scoring system for the diagnosis of smear-negative pulmonary tuberculosis in HIV-infected adults. PLoS ONE 2014; 9: e95828.

16.

Boyles

Nduna

Pitsi

, et al. A clinical prediction score including trial of antibiotics and C-reactive protein to improve the diagnosis of tuberculosis in ambulatory people with HIV. Open Forum Infect Dis 2020; 7: ofz543.

17.

Nordio

Huang

, et al. Two clinical prediction tools to improve tuberculosis contact investigation. Clin Infect Dis 2020; 71: e338–e350.

18.

Saunders

Wingfield

Tovar

, et al. A score to predict and stratify risk of tuberculosis in adult contacts of tuberculosis index cases: a prospective derivation and external validation cohort study. Lancet Infect Dis 2017; 17: 1190–1199.

19.

van Wyk

Lin

Claassens

. A systematic review of prediction models for prevalent pulmonary tuberculosis in adults. Int J Tuberc Lung Dis 2017; 21: 405–411.

20.

Jensen

Rudolf

Wejse

. Utility of a clinical scoring system in prioritizing TB investigations—a systematic review. Expert Rev Anti Infect Ther 2019; 17: 475–488.

21.

Wallace

Johansen

. Clinical prediction rules: challenges, barriers, and promise. Ann Fam Med 2018; 16: 390–392.

22.

Han

. Clinical prediction model construction and evaluation: a challenge for clinical researchers. Ann Transl Med 2020; 8: 74.

23.

World Health Organization. WHO guidelines on tuberculosis infection prevention and control: 2019 update. Geneva: World Health Organization, 2019.

24.

Wynants

Bouwmeester

Moons

, et al. A simulation study of sample size demonstrated the importance of the number of events per variable to develop prediction models in clustered data. J Clin Epidemiol 2015; 68: 1406–1414.

25.

Riley

Ensor

Snell

KIE

, et al. Calculating the sample size required for developing a clinical prediction model. Br Med J 2020; 368: m441.

26.

Migliori

Matteelli

, et al. Clinical standards for the diagnosis, treatment and prevention of TB infection. Int J Tuberc Lung Dis 2022; 26: 190–205.

27.

Zhou

Luo

, et al. Interferon-γ release assays or tuberculin skin test for detection and management of latent tuberculosis infection: a systematic review and meta-analysis. Lancet Infect Dis 2020; 20: 1457–1469.

28.

Clinical Research Center for Infectious Disease TTPsHoS, Editorial Board of Chinese Journal of Antituberculosis National. Expert consensus on a standard of activity judgment of pulmonary tuberculosis and its clinical implementation. Chin J Antituberc 2020; 42: 301–307.

29.

Melkie

Arias

Farroni

, et al. The role of antibodies in tuberculosis diagnosis, prophylaxis and therapy: a review from the ESGMYC study group. Eur Respir Rev 2022; 31: 210218.

30.

Zeng

Zhai

Wáng

YXJ

, et al. Illustration of a number of atypical computed tomography manifestations of active pulmonary tuberculosis. Quant Imaging Med Surg 2021; 11: 1651–1667.

31.

Solari

Acuna-Villaorduna

Soto

, et al. A clinical prediction rule for pulmonary tuberculosis in emergency departments. Int J Tuberc Lung Dis 2008; 12: 619–624.

32.

Baik

Rickman

Hanrahan

, et al. A clinical score for identifying active tuberculosis while awaiting microbiological results: development and validation of a multivariable prediction model in sub-Saharan Africa. PLoS Med 2020; 17: e1003420.

33.

Chen

, et al. Screening of long non-coding RNAs biomarkers for the diagnosis of tuberculosis and preliminary construction of a clinical diagnosis model. Front Microbiol 2022; 13: 774663.

34.

Hanifa

Fielding

Chihota

, et al. A clinical scoring system to prioritise investigation for tuberculosis among adults attending HIV clinics in South Africa. PLoS ONE 2017; 12: e0181519.

35.

Pinto

Pai

Dheda

, et al. Scoring systems using chest radiographic features for the diagnosis of pulmonary tuberculosis in adults: a systematic review. Eur Respir J 2013; 42: 480–494.

36.

Jin

Fan

Pang

, et al. Risk of active pulmonary tuberculosis among patients with coal workers’ pneumoconiosis: a case-control study in China. Biomed Environ Sci 2018; 31: 448–453.

37.

Berhane

Melku

Amsalu

, et al. The role of neutrophil to lymphocyte count ratio in the differential diagnosis of pulmonary tuberculosis and bacterial community-acquired pneumonia: a cross-sectional study at Ayder and Mekelle Hospitals, Ethiopia. Clin Lab 2019; 65: 527–533.

38.

Wang

, et al. A preliminary investigation on a deep learning convolutional neural networks based pulmonary tuberculosis CT diagnostic model. Zhonghua Jie He He Hu Xi Za Zhi 2021; 44: 450–455.

39.

Delory

Ferrand

Grall

, et al. Score for pulmonary tuberculosis in patients with clinical presumption of tuberculosis in a low-prevalence area. Int J Tuberc Lung Dis 2017; 21: 1272–1279.

40.

Liao

Bai

, et al. Long noncoding RNA and predictive model to improve diagnosis of clinically diagnosed pulmonary tuberculosis. J Clin Microbiol 2020; 58: e01973–19.

41.

Chen

Liu

Chen

, et al. Establishment and preliminary evaluation of a diagnostic model for the new patients with pathogen-negative pulmonary tuberculosis. Chin J Antituberc 2020; 42: 266–271.

Development and validation of a diagnostic clinical prediction model for suspected pulmonary tuberculosis in adults: A retrospective analysis

Abstract

Objective

Methods

Results

Conclusions

Keywords

Introduction

Methods

Study participants and setting

Data collection methods

Statistical analysis

Results

Clinical characteristics of the three groups of patients

Single-factor analysis of predictors of pulmonary tuberculosis

Multifactor analysis of predictors of pulmonary tuberculosis

Development of diagnostic clinical predictive models for suspected pulmonary tuberculosis in adults

Validation of diagnostic clinical predictive models for suspected pulmonary tuberculosis in adults

Discussion

Limitations and prospects

Conclusions

Footnotes

List of abbreviations

Acknowledgements

ORCID iDs

Ethics approval

Consent for publication

Contributorship

Funding

Informed consent

Declaration of conflicting interests

Availability of data and materials

References