Abstract
Background and aims:
Chest high-resolution computed tomography (HRCT) is the central diagnostic tool in discerning idiopathic pulmonary fibrosis (IPF) from other interstitial lung disease (ILDs). In 2018, new guidelines were published and the nomenclature for HRCT interpretation was changed. We sought to evaluate how clinicians’ interpretation would change based on reading HRCTs under the framework of the old
Materials and methods:
We collated HRCTs from 50 random cases evaluated in the Inova Fairfax ILD clinic. Six ILD experts were provided the deidentified HRCTs. They were all instructed to independently provide two reads of each HRCT, based on the old and the new guidelines.
Results:
The kappa statistic for concordance for HRCT reads under old guidelines was 0.5, while for the new guidelines it was 0.38. Under the framework of the old guidelines, there were 22 HRCTs with unanimous consensus reads, while only 15 with the new guidelines. There were 12 HRCTs read unanimously as usual interstitial pneumonia (UIP) pattern based on both the old and the new guidelines. Ten HRCTs were read as a possible UIP pattern based on the old guidelines and were classified in nine cases as probable UIP and one indeterminate based on the new guidelines. Of the 28 inconsistent UIP HRCTs (old guidelines), 25 were read as alternative diagnosis suggested, two were read as indeterminate and one as probable UIP.
Conclusion:
Implementation of the new guidelines to categorize HRCTs in ILD patients appears to be associated with greater inter-interpreter variability. How or whether new guidelines improve the care and management of ILD patients remains unclear.
Keywords
Introduction
Idiopathic pulmonary fibrosis (IPF) is a chronic, progressive, fibrosing interstitial lung disease of unknown etiology occurring predominantly in elderly patients. 1 Accurately discerning IPF from other forms of interstitial lung disease (ILD) is exceedingly important as it has a different prognosis and treatment paradigm. The central diagnostic tool is high-resolution computed tomography of the chest (HRCT). In most cases, depending on the clinical circumstance, the HRCT can be diagnostic with no further work-up necessary. In some cases, when there is diagnostic uncertainty, a surgical lung biopsy may be required. When tissue is obtained, the histopathological pattern found in IPF is described as usual interstitial pneumonia (UIP). A UIP pattern can, however, be seen in other forms of ILD; therefore, a lung biopsy is not the gold standard for making the diagnosis. 1 The gold standard for the diagnosis is the multidisciplinary meeting where a discussion of each case takes place with representation from Pulmonary, Thoracic Radiology and Pathology in order to attain a consensus diagnosis.2,3
The term “UIP” has now been adopted as radiologic nomenclature where an imaging UIP pattern has been described. This is characterized by honeycombing, sub-pleural basilar reticulation, with or without traction bronchiectasis/bronchiolectasis and absence of inconsistent features (cysts, consolidation, alveolar infiltrates). 2 However, many patients do not demonstrate all of these features and various “shades” of a UIP pattern have been described. These descriptions were provided in a consensus statement from 2011, and updated in 2018.1,2 In the prior guidelines, there were three categories which were changed to four categories in the most recent iteration (see Table 1). The description of UIP is essentially unchanged between the two guidelines statements, while the prior “inconsistent UIP pattern” has been replaced with “alternative diagnosis suggested”. The categorization of a “possible” UIP has been dropped and two new categories of a “probable” and “indeterminate” UIP pattern have been added. How many patients in the prior categorization would change categories in the new characterization and how this might affect their further work-up and management remains to be determined.
The objective of this current study was to categorize the HRCTs of ILD patients under the framework of both the new and the old guidelines by a group of Pulmonologists and a Radiologist blinded to any clinical information. In addition to establishing a consensus read and determining changes in categorization, another goal was to determine the level of concordance among the readers for the old and new categorizations.
Methods
We performed a review of patients with ILD who were seen at the Inova Fairfax Hospital Interstitial Lung Disease Clinic from 2012 to 2019. Patients qualified for inclusion if they had a high-resolution computed tomography (CT) of the chest available and accessible within the electronic medical record (Epic) system and if a firm diagnosis had been established. The clinic maintains a database of all patients evaluated, with data entered prospectively at the time of their initial evaluation. Patients with ILD were diagnosed based on their clinical presentation, HRCT scan appearance, and, where there was any element of doubt, surgical lung biopsy. Patient demographics and data at or near the time of presentation was collected and analyzed. Data collated included age, sex, lung function tests and final diagnosis.
Deidentified representative images from eight levels of each patient’s axial HRCT images, equidistant from each other and encompassing the thorax from the apex to the lung bases, were captured and copied into individual PowerPoint presentations. Six clinicians (five Pulmonologists and one Thoracic Radiologist) were provided the deidentified HRCTs and entered their individual reads blinded to each other’s interpretation and to all clinical information. The six physicians were all ILD experts based at five ILD centers spanning three different countries (Brazil, France and the USA). Each participant interpreted the HRCTs and was asked to evaluate for the presence of the following elements: honeycombing, subpleural reticulation, traction bronchiectasis, traction bronchiolectasis, mosaicism, ground glass opacification and emphysema, as well as information pertaining to the distribution of the abnormalities. Each reader was then asked to categorize the HRCT pattern based on (i) the 2011 guidelines and (ii) the 2018 guidelines, with interpretation entered into an Excel-based data capture form. Consensus reads were decided by majority; in cases where there was an equal split, the Thoracic Radiologist’s read was used as the “tiebreaker”. Local institutional review board approval was obtained.
Statistical considerations and data analysis
All demographic and pulmonary function data is presented as mean or median, as the mean ±SD or the median (range), depending on the distribution, and according with ranges if continuous, or as frequencies if categorical. Group comparisons were performed using Student’s
The kappa concordance among all readers was calculated for both the old and the new guidelines, as well as the individual elements, including honeycombing, subpleural reticulation, traction bronchiectasis, traction bronchiolectasis, mosaicism, ground glass opacification and emphysema. Concordance rates for type of UIP pattern are also reported, representing how many of the participants were in agreement for any particular read.
Results
We collated HRCTs from 50 random cases evaluated in the Inova Fairfax ILD clinic. Select baseline demographics, clinical characteristics, consensus CT reads and final diagnosis of the 50 cases are shown in Figure 1. The mean age of the cohort was 65.2 years (range: 26–86), with 29 males (58%). Twenty of the cases underwent a confirmatory video-assisted thoracoscopic lung biopsy (40%). The final diagnosis was IPF in 14 cases (28%), connective tissue disease-related ILD in 11 (22%), non-specific interstitial pneumonitis in five (10%), chronic hypersensitivity pneumonitis in five (10%), four unclassifiable (8%), two sarcoidosis (4%), acute interstitial pneumonia in two (4%) and miscellaneous conditions constituted the remainder.

Select baseline demographics, clinical characteristics, consensus computed tomography reads and final diagnoses of the 50 cases.
Consensus reads
The kappa statistic for concordance for the HRCT reads under the old guidelines and the new guidelines, as well as the individual attributes of the HRCT, are shown in Table 2. Figure 2 demonstrates the reads of the six readers for each of the 50 cases interpreted under the guise of the old [Figure 2(a)] and the new [Figure 2(b)] guidelines. Under the framework of the old guidelines, there were 22 HRCTs where there were unanimous consensus reads (17 inconsistent, four possible and one UIP), while with the new guidelines, there were 13 HRCTs with unanimous consensus reads (11 alternative diagnosis suggested, one UIP and one probable UIP).
Kappa statistic for concordance of high-resolution computed tomography reads among the six readers.
LCL, low confidence level; SE, standard error; UCL, upper confidence level.

High resolution computed tomography interpretations of the six readers for each of the 50 cases under the guise of the old (a) and the new (b) guidelines. Each row of the two columns represents the same sequence of the 50 individual cases.
In terms of overlap between the old and the new guidelines, there were 12 HRCTs with consensus reads as a UIP pattern based on both the old and the new guidelines. Ten HRCTs were read as a possible UIP pattern based on the old guidelines, of which nine were read as probable UIP and one indeterminate based on the new guidelines. Of the 28 inconsistent UIP HRCTs (old guidelines), 25 were read as alternative diagnosis suggested and two were read as indeterminate and one as probable UIP. There were only three cases that by consensus were recategorized as indeterminate UIP (one was a prior possible and two were from the inconsistent categories). Cross-over between the old and the new guidelines is shown in Figure 1.
Individual reads
Under the guise of the old guidelines, there were four cases with individual interpretations that covered the spectrum of all three possible reads; that is, interpreted independently as either a UIP pattern, a possible UIP pattern or an inconsistent UIP pattern by the six readers. Collation of the six independent interpretations under the framework of the new guidelines revealed 20 cases with at least three different reads and one case with four different reads.
For the old guidelines, there were five cases where our Thoracic Radiologist (reader #6) was in the minority and hence did not concur with the consensus reads (one UIP, two possible and two inconsistent by final consensus). For the new guidelines, there were nine cases where our Thoracic Radiologist was in the minority and hence did not concur with the consensus reads (one UIP, four probable and four alternative diagnosis suggested by final consensus). Interestingly, our Thoracic Radiologist categorized five of these nine as indeterminate (two probable and three alternative by consensus). For the old guidelines, there were nine cases where there was a split vote, decided upon by our Thoracic Radiologists read (five UIP and four inconsistent UIP by final consensus), while for the new guidelines, there were three cases with a split vote (two UIP and one indeterminate by final consensus).
CT feature reads
Interpretation of the individual HRCT features that drove the UIP categorizations are demonstrated through concordance heat maps in Figure 3.

Radiographic concordance heat maps of the individual high-resolution computed tomography features for each of the 50 cases. (a) Honeycombing, (b) traction bronchiectasis and (c) traction bronchiolectasis.
Honeycombing
There were three cases with honeycombing agreed upon unanimously, seven cases with 5/6 readers in agreement, five cases with four readers in agreement, one case of an even split, two cases where two readers felt that honeycombing was present and eight cases where honeycombing was interpreted by one reader [Figure 3(a)]. The total number of individual reads between all six readers for all 50 cases that did not concur with the consensus read of honeycombing was 32.
Traction bronchiectasis
With regard to traction bronchiectasis, there were four unanimous “no” and 22 unanimous “yes” reads. Where there was not unanimous consensus, there were eight cases where 5/6 readers agreed on the presence of honeycombing, seven where there were 4/6 readers in agreement and two cases where there was an equal split. There were one and six cases where two readers and one reader, respectively, felt there was traction bronchiectasis present [Figure 3(b)]. The total number of individual reads between all six readers for all 50 cases that did not concur with the consensus read of traction bronchiectasis was 36.
Traction bronchiolectasis
For traction bronchiolectasis, there were four unanimous “no” cases and three unanimous “yes” cases. There was agreement between 5/6 readers in 14 cases, 11 cases with 4/6 readers, 10 cases of an even split, five cases of 2/6 and three cases where one of six readers felt there was evidence of traction bronchiolectasis [Figure 3(c)]. The total number of individual reads between all six readers for all 50 cases that did not concur with the consensus read of traction bronchiolectasis was 79, consistent with traction bronchiolectasis having the lowest kappa statistic of these three parameters.
IPF diagnosis
There were 14 patients with a final diagnosis of IPF. Of these, four required a confirmatory video-asssisted thoracoscopic surgical (VATS) biopsy. The consensus reads in these patients by the old and the new guidelines were UIP in seven, possible/probable UIP in three and inconsistent UIP/alternate diagnosis suggested in four.
Discussion
Our study lends insight into how a group of expert ILD physicians including one Thoracic Radiologist categorize patients based on the old and the new guidelines and how patients change categories. We demonstrate considerable variation in the interpretation of HRCTs of the chest with less than half the cases having unanimous consensus interpretation by both the old and the new guidelines. While the kappa statistic under the guise of the old guidelines fell into the moderate concordance range (0.5)
There are over 150 causes of interstitial lung disease and discerning one from the next represents a conundrum for many clinicians as well as expert ILD physicians. The HRCT is the central diagnostic study around which the diagnosis and further management often hinges. In order to homogenize HRCT reads as much as possible, the 2011 diagnostic guidelines delineated three categories denoting patterns of injury: a UIP pattern, a possible UIP pattern and an inconsistent UIP pattern. 1 In the most recent, 2018, guidelines, these three categories were replaced by four, two of which are essentially the same as the old; UIP remains UIP, while “inconsistent UIP” has been replaced by “alternative diagnosis suggested”. 2 This latter category includes the same radiographic patterns as the old “inconsistent UIP” group, with the name change predicated by the recognition that some of these “inconsistent” UIP cases may demonstrate pathologic UIP when biopsied. Therefore, the major change with the categorization between the old and the new guidelines is the dissolution of the “possible UIP” group and the introduction of “probable” and “indeterminate” UIP categories. As per the 2018 guidelines statement, the indeterminate category “should be assigned when HRCT demonstrates features of fibrosis but does not meet UIP or probable UIP criteria and does not explicitly suggest an alternative diagnosis”. 2
It is not too surprising that most of the consensus recategorizations under the new guidelines were predictable with all the UIPs remaining UIP and most of the possible UIPs and inconsistent UIPs transitioning to probable UIP and alternative diagnosis suggested, respectively. It is interesting that the new indeterminate category emerged as the consensus read in only three cases, despite each of the individual readers reclassifying patients into this category in eleven, four, twelve, five, two and eight cases respectively. The fact that most of the individual indeterminate reads were “outvoted” based on the consensus reads underscores the value of the multidisciplinary discussion. Most of the indeterminate reads were from prior possible UIPs reads which were recategorized as indeterminate (three, three, eight, four, one and three for readers #1–6). Despite this, there was only one case where the consensus opinion resulted in recategorization from a possible to indeterminate category. This highlights some difficulties in the interpretation of this newly defined category. The one concern with this category is that a reading of indeterminate might foster more lung biopsies. For example HRCT pattern with subpleural reticulation and no traction bronchiectasis/bronchiolectasis by the old criteria would be regarded as “possible UIP”, but by the new criteria would be deemed an “indeterminate UIP” pattern.
Our kappa statistic of concordance is not too dissimilar to what has been described previously among a group of Thoracic Radiology experts.3–5 However, our study included predominantly Pulmonologists and just one Thoracic Radiologist. We feel that this replicates clinical practice more closely, where clinicians typically have access to just one Thoracic Radiologist (if at all) and are more inclined to discuss cases among themselves. Our study goes a step further in evaluating the kappa concordance of individual morphologic features that drive the categorization. The one radiographic finding where there appeared to be the most difficulty in obtaining consensus was the presence of traction bronchiolectasis as demonstrated by a kappa statistic of only 0.12. Our findings do underscore the considerable variation and how this might ultimately impact individual patient diagnoses. It also perhaps underscores the need for central adjudication of HRCTs in IPF clinical trials. In the emerging era of clinical trial inclusiveness for fibrotic lung disease, this might be more important for subgroup analyses rather than as an exclusionary hurdle.
There are a few limitations to our study. First, as a matter of ease of transfer and interpretation of data only eight representative levels of each patient’s HRCT was provided to the readers. How interpretations might have changed with the ability to scrutinize all levels is uncertain though previous studies have shown that non-contiguous sampling of the thorax on CT yields similar results to evaluation of contiguous images throughout the thorax.6,7 Moreover, when choosing the eight levels, we tried to be as certain as possible that no feature was being overlooked and we were comfortable that in all cases the salient features were captured. 5 No clinical information was provided for the readers to contextualize their interpretation. This was by intent so as not to bias their objective evaluation of the HRCTs. However, this does not necessarily reflect clinical practice, where invariably some or all clinical information is available. The readers had the opportunity to provide their reads by the old and the new guidelines in the same sitting. An alternative approach might have been to ask them to provide reads by the old guidelines, and then send them the reblinded HRCTs a few weeks later. We did this by intent as we were not seeking to test their intraindividual concordance, but rather we sought to gauge their characterization based on the same interpretation.
In conclusion, through our independent blinded interpretations of 50 random cases referred to an ILD center, we have demonstrated considerable variation in how HRCTs are read by a group of Pulmonologists and a Thoracic Radiologist with expertise in ILD. We have lent insight into differences in interpretation based on the old
Supplemental Material
Author_Response – Supplemental material for HRCT evaluation of patients with interstitial lung disease: comparison of the 2018 and 2011 diagnostic guidelines
Supplemental material, Author_Response for HRCT evaluation of patients with interstitial lung disease: comparison of the 2018 and 2011 diagnostic guidelines by Steven D. Nathan, Jean Pastre, Inga Ksovreli, Scott Barnett, Christopher King, Shambhu Aryal, Kareem Ahmad, Cesar Fukuda, Vijaya Ramalingam and Jonathan H. Chung in Therapeutic Advances in Respiratory Disease
Supplemental Material
Reviewer_1_v.1 – Supplemental material for HRCT evaluation of patients with interstitial lung disease: comparison of the 2018 and 2011 diagnostic guidelines
Supplemental material, Reviewer_1_v.1 for HRCT evaluation of patients with interstitial lung disease: comparison of the 2018 and 2011 diagnostic guidelines by Steven D. Nathan, Jean Pastre, Inga Ksovreli, Scott Barnett, Christopher King, Shambhu Aryal, Kareem Ahmad, Cesar Fukuda, Vijaya Ramalingam and Jonathan H. Chung in Therapeutic Advances in Respiratory Disease
Supplemental Material
Reviewer_2_v.1 – Supplemental material for HRCT evaluation of patients with interstitial lung disease: comparison of the 2018 and 2011 diagnostic guidelines
Supplemental material, Reviewer_2_v.1 for HRCT evaluation of patients with interstitial lung disease: comparison of the 2018 and 2011 diagnostic guidelines by Steven D. Nathan, Jean Pastre, Inga Ksovreli, Scott Barnett, Christopher King, Shambhu Aryal, Kareem Ahmad, Cesar Fukuda, Vijaya Ramalingam and Jonathan H. Chung in Therapeutic Advances in Respiratory Disease
Supplemental Material
Reviewer_3_v.1 – Supplemental material for HRCT evaluation of patients with interstitial lung disease: comparison of the 2018 and 2011 diagnostic guidelines
Supplemental material, Reviewer_3_v.1 for HRCT evaluation of patients with interstitial lung disease: comparison of the 2018 and 2011 diagnostic guidelines by Steven D. Nathan, Jean Pastre, Inga Ksovreli, Scott Barnett, Christopher King, Shambhu Aryal, Kareem Ahmad, Cesar Fukuda, Vijaya Ramalingam and Jonathan H. Chung in Therapeutic Advances in Respiratory Disease
Footnotes
Author contribution(s)
Conflict of interest statement
The following authors declare their conflict of interest: SDN is a consultant for Actelion, Bellerophon, Roche-Genentech, Boerhinger-Ingelheim, Pliant, Merck, United Therapeutics and Bayer Pharmaceuticals; he is also on the Speakers’ Bureau for Roche-Genentech, Boerhinger-Ingelheim, and Bayer Pharmaceuticals; JP has served as a consultant for Boerhinger-Ingelheim; CSK has served on an advisory board for Boerhinher-Ingelheim, Actelion and United Therapeutics; he is on the speakers’ bureau for Genentech; he has served as a consultant for the France Foundation. Other authors have no competing interest.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Supplemental material
The reviews of this paper are available via the supplemental material section.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
