Abstract
This study was sought to evaluate the interobserver agreement for interpreting the chest radiograph of patients with suspected acute pulmonary embolism (PE). The chest radiographs of 300 patients with clinically suspected acute PE were reviewed by 4 radiologists. Observers assessed the chest radiographic abnormalities and classified the chest radiograph as normal or abnormal. We found that the overall interobserver agreement was good for the exclusion of any pleural or parenchymal abnormality (k = 0.6; 95% CI: 0.56-0.64) but fair (k = 0.28; 95% CI: 0.17-0.40) between junior radiologists when evaluating supine chest radiographs. The level of interobserver agreement for the interpretation of the chest radiograph as consistent or not with PE was fair (k = 0.24; 95% CI: 0.19-0.29), regardless of the observer experience. In conclusion, chest radiography may be reliably used for targeting patients with suspected acute PE for different subsequent diagnostic investigations.
Introduction
The positive yield rates in patients with suspected acute pulmonary embolism (PE) has diminished over the last decades due to the poor choice in the selection of patients undergoing computed tomography pulmonary angiography (CTPA). 1,2 The use of a pretest Bayesian approach including the interpretation of the chest radiographic findings has been increasingly encouraged to improve the risk stratification strategies for patients with suspected PE. 1,3,4 Despite its variable sensitivity or specificity to rule out PE or in PE, the chest radiograph may suggest other diagnoses that might clinically mimic acute PE, such as pneumothorax, pneumonia, or rib fracture. 5 –7 There are also some cases in which the interpretation of the chest radiograph may corroborate the clinical suspicion of PE, particularly when typical signs are identified in patients with comparison chest radiographs without other underlying cardiorespiratory disease. 8 Importantly, the chest radiograph can be a valuable triage tool in deciding an appropriate technique for imaging PE. 3,4,9
Thus, some clinical algorithms still include the chest radiograph to assess the pretest probability of acute PE in hemodynamically stable patients. A recent survey has also shown that the chest radiograph still often precedes CTPA when acute PE is suspected. 4,10 –12 The American College of Radiology recommended a posterior and lateral chest radiograph as an important initial study. 5,13 Therefore, it would be important to know how much clinicians may rely on the radiologist’s interpretation of the chest radiograph of patients with suspected PE.
This study was sought to assess the interobserver agreement in the interpretation of the chest radiograph in patients with suspected PE.
Methods
Study Population
This study had institutional review board approval and informed patient consent was waived. We identified our study population by reviewing the clinic and radiology information database search program (Ebit-Aet) between December 2006 and January 2008. Among 537 patients who underwent CTPA for clinically suspected acute PE, a subgroup of 300 consecutive patients (including 217 inpatients and 83 outpatients) also evaluated by chest radiography within the same day constituted our study population (152 women and 148 men with a mean age of 72 years [range, 14-98 years]). Both anteroposterior supine (n = 239/300, 80%) and posteroanterior upright (n = 61/300, 20%) chest radiographs were included for this study. The presence or absence of PE was defined on the basis of the CTPA findings: positive for 110 (36.7%) of 300 patients of the study cohort and positive for 66 (28%) of 237 patients with no chest radiograph. No patients had history and CT findings of chronic PE.
The chest radiographs were retrieved from the PACS system and reviewed by the observers participating in this study.
Interpretation of the Chest Radiograph
Chest radiographs were independently evaluated by 4 observers: 2 seniors (MZ and MD with 30 and 10 years of experience in interpreting chest radiographs, respectively) and 2 juniors (MQ and FF with 4 and 3 years of experience in interpreting chest radiographs, respectively) radiologists. No information except that patients had been referred for chest radiography because of suspected acute PE, nor previous chest radiographs for comparison was provided to the observers.
Observers were asked to rate each chest radiograph as normal or abnormal (ie, showing pleural or parenchymal abnormality). The chest radiographs with consolidation, infiltrates (including signs of interstitial lung disease or pulmonary oedema), atelectasis, pleural effusion, and pneumothorax were considered abnormal. The assessment of both cardiomegaly and signs of chronic obstructive pulmonary disease was not considered reliable as the majority of the chest radiographs were acquired in supine position. Observers were also asked to record other chest radiographic findings such as the enlargement of a major pulmonary artery (Fleischner sign), a focal oligemia (Westermark sign), or an elevated hemidiaphragm. No specific diagnostic criteria were provided to the observers, with the exception of the ones for the Fleischner and the Westermark signs, which had to be assessed according to the criteria reported by Miniati et al. 11 In addition, the observers were given the possibility of recording any other finding (eg, fibrothorax, postsurgical findings, etc) that could be of significance. Finally, the observers had to classify the chest radiographic findings as “consistent” or “not consistent” for acute PE.
Data Analysis
The Cohen unweighted kappa coefficient of agreement (k) was used to quantify the interobserver agreement. The confidence intervals for agreement among the 4 observers were estimated by the 67 bootstrap percentile method. Interobserver agreement was classified as poor (k = .00−.20), fair (k = .21−.40), moderate (k = .41−.60), good (k = .61−.80), or excellent (k = .81-1.00). 14 Interobserver agreement was also assessed for supine and upright chest radiographs separately.
Sensitivity was defined as the proportion of patients with positive chest radiographic findings among patients with acute PE on CTPA. Specificity was defined as the proportion of patients with negative chest radiographic findings among patients without acute PE on CTPA.
A P value of less than .05 was taken to indicate statistical significance. Both SAS (Release 8.2) and MedCalc (version 9.5.2.0) were used for statistical calculation.
Results
The chest radiographic findings of prevalence, their sensitivity and specifcity, and the corresponding levels of interobserver agreement are, respectively, given in Tables 1, 2, and 3. Interobserver agreement was obtained only for findings listed in the tables as the frequency of additional abnormalities such as pneumonectomy (1 case), fibrothorax (1 case), and aortic disease (1 case) was too scarce to perform any specific statistical analysis. No pneumothorax was recorded by any observer.
Chest Radiographic Findings as Assessed by Each Observer in 300 Patients With Suspected Acute Pulmonary Embolism.
Abbreviation: PE, pulmonary embolism.
Sensitivity and Specificity of Each Chest Radiographic Finding.
Abbreviation: PE, pulmonary embolism.
Overall Interobserver, Intersenior Radiologists, and Interjunior Radiologists Agreement.a
Abbreviation: PE, pulmonary embolism.
a Numbers refer to kappa values and values within parentheses are the corresponding 95% confidence 285 intervals.
The global judgment of the chest radiographic findings as “consistent with PE” was highly specific (mean specificity 90.7%) but fairly sensitive (mean sensitivity 30.4%) for all the observers. Similarly, the mean specificity of the Fleischner and the Westermark signs was, respectively, 79.8% and 93.3%, whereas their mean sensitivity was, respectively, 34.9% and 21.1%.
The overall interobserver agreement was good (k = .6; 95% CI: 0.56-0.64) for the interpretation of the chest radiograph as positive or negative for any pleural or parenchymal abnormality, whereas it was only fair (k = .24; 95% CI: 0.19−0.29) for the assessment of the chest radiographic findings as consistent or not with PE. The highest levels of agreement were recorded for the assessment of pleural effusion (k = .50; 95% CI: 0.45-0.54) and parenchymal infiltrates (k = .51; 95% CI: 0.46−0.57), whereas the lowest levels were found for the elevated hemidiaphragm (k = .35; 95% CI: 0.31−0.4), the Fleischner sign (k = .22; 95% CI = 0.18−0.27), and the Westermak sign (k = .24; 95% CI: 0.19−0.28).
The levels of interobserver agreement between paired senior (averaged k = .35) and paired junior (averaged k = .28) radiologists were similar. However, the greatest difference between paired senior and paired junior radiologists was observed for the evaluation of the chest radiograph as normal or abnormal (k = .62 vs k = .33). Such a poorer agreement between junior radiologists seemed linked to the patient position as it substantially increased when evaluating only upright chest radiographs (k = .50; 95% CI: 0.28-0.72) and further decreased for supine ones (k = .28; 95% CI: 0.17-0.40). The agreement between senior radiologists was similar for both supine (k = .61; 95% CI: 0.51-0.71) and upright (k = .65; 95% CI: 0.45-0.84) chest radiographs.
Discussion
Several studies evaluated the accuracy of the chest radiography in patients with suspected acute PE, but no studies have assessed the inherent interobserver variability. Estimating properly interobserver agreement is an important insight into a test’s usefulness and may disclose strengths and expose weaknesses of the test that are not readily apparent from more conventional diagnostic accuracy studies. 6,15,16
Although chest radiography is the technique that most commonly is misinterpreted by the observers in the context of the emergency department, our findings show that the interobserver agreement for the exclusion of any pleural or parenchymal abnormality was good. 17,18 Such an exclusion may allow the use of ventilation/perfusion (V/Q) scintigraphy instead of CTPA for evaluating hemodynamically stable patients with suspected acute PE, as an abnormal chest radiograph would make V/Q scan more difficult to interpret. 3,19 The use of the results of the chest radiography as a simple triage mechanism has recently shown a number of advantages, such as the reduction in radiation exposure due to the increased number of V/Q scanning (instead of CTPA) without a significant change in the rate of indeterminate interpretations. 4,9 It is also worth to highlight that the sole Q scintigraphy combined with chest radiography can provide diagnostic accuracy similar to V/Q scintigraphy. Elimination of the ventilation scintigram reduces cost, further reduces radiation dose, and makes scintigraphy easier to provide on call by reducing the technical complexity. 3,20
However, our findings disclosed that even the simple exclusion of any chest radiographic abnormality may require the evaluation by experienced radiologists whether the chest radiograph has been acquired with the patient in supine position. Such factor seemed indeed the major contributor of the fair level of agreement between junior radiologists.
Of note, we observed a substantial discrepancy between the proportion of patients evaluated by supine chest radiography in our study and that recently reported by Stein et al (80% vs 42%). 5 This might be explained by some differences about the study population itself (ie, less healthy in our study) or the department protocols. Nevertheless, according to our results, every effort should be therefore taken to get an upright chest radiograph if this has to be used for a triage tool in deciding the more appropriate subsequent technique for imaging patients with suspected PE.
It has recently been shown that radiologists looking for PE on CTPA often may be inattentive to other findings as a result of observer bias. Interpreters of CTPA are often biased to exclude PE and may overlook ancillary findings not responsible for chest pain or shortness of breath. Such biases were generally less frequent with interpretation of the chest radiograph. 5 In this regard, we found that interobserver agreement for the assessment of both pleural effusion and parenchymal infiltrates such as pulmonary oedema was good, regardless of the degree of the observer experience. However, the interobserver agreement was poor for the assessment of the elevated hemidiaphragm, the Westermark, and the Fleischner sign, mirroring the one recorded for the overall consistency with PE. The frequency of both Westermark and the Fleischner signs was higher than expected among all the observers’ evaluations. However, it is possible that this frequency increased because of the study context as the observers knew to evaluate the chest radiographs of patients with suspected PE. More surprisingly, we found that the assessment of either consolidation or atelectasis was subject to a substantial interobserver agreement only between senior radiologists. We do not have any precise explanation for it, but such a finding should be not overlooked since both of these 2 abnormalities were shown to be associated with a final diagnosis of PE more strongly than other chest radiographic findings. 11
Diagnostic criteria for categorizing a chest radiograph as normal slightly varied across prior investigations. 21 For example, a chest radiograph with an enlarged heart was regarded abnormal by the Prospective Investigative Study of Acute Pulmonary Embolism Diagnosis (PISAPED) criteria, whereas such a finding was not scored in our study as the majority of the chest radiographs were acquired in supine position and the assessment of the heart size was indeed not considered reliable. 3 Nevertheless, our study included cases that are representative of those encountered in everyday clinical practice (including those evaluated only by supine chest radiography) and observers who are not all expert radiologists. However, a greater number of observers would be needed to properly fulfill the requisites of an observer agreement study.
Our study does have other limitations. First, a training session for the observers might have standardized the radiologic review process, increasing the interobserver agreement. Nevertheless, the lack of training in our study has likely allowed to capture a true expression of the degree of interobserver agreement that can be encountered in routine practice. Second, the observers were only supplied with general information that the patient was suspected of having acute PE, but knowledge of specific clinical data (eg, prolonged bed rest, history of fever, etc) as in routine practice might have changed the levels of agreement. No prior chest radiographs were available for comparison, as we thought that such availability would have biased the interobserver agreement.
In conclusion, we have shown that there is good agreement among radiologists for the exclusion of chest radiographic abnormality in cases with clinically suspected acute PE. Supine decubitus may influence such an assessment when chest radiography is interpreted by less experienced observers, and this should be taken into account when using chest radiography in targeting hemodynamically stable patients with suspected acute PE for different subsequent investigations.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
