Abstract
Abstract
Keywords
INTRODUCTION
Despite the established criteria for the clinical diagnosis of Parkinson plus syndrome, accurate diagnosis is often challenging. Accurate diagnosis of parkinsonian disorders is important not only for patient care but also for clinical research. For differential diagnosis among idiopathic Parkinson’s disease (IPD), progressive supranuclear palsy (PSP), and multiple system atrophy (MSA), a number of MRI measurements have been developed [1–8] based on pathological features, such as atrophy of the midbrain in PSP [9, 10]. However, these quantitative tools are often difficult to apply, and they are not always practical in clinical settings because they require special tools or instruments.
In PSP, the brainstem structures look like a hummingbird with a beak in midsagittal magnetic resonance (MR) images; together, these features are called the ‘hummingbird’ sign (HBS) or ‘penguin silhouette’ sign [3, 4]. The HBS has been recognized as useful in discriminating PSP from IPD or MSA [1, 11]. However, except for a report in which a profile of the third ventricle floor was studied in IPD and PSP, no clear criteria have been identified for the HBS. Despite the wide recognition of the HBS in PSP, there are no consensus criteria for the HBS. We developed a new visual rating scale for the HBS, in which four characteristics features of the HBS are parameterized. We compared our new rating scale to the traditional method (inter-ocular estimation the HBS by physicians) with regard to inter-rater reliability (IRR) and diagnostic validity.
METHODS
Subjects
We reviewed MR images of 133 patients who were diagnosed as having IPD (n = 93) or PSP (n = 40) in the movement disorders clinic at Hallym University Sacred Heart Hospital from November 2010 to May 2014. The clinical diagnoses of all of the patients were made by two movement disorders specialists (Kim YJ and Ma HI) according to established consensus criteria during the regular follow-up period. PSP was diagnosed according to the clinical research criteria for PSP provided by the National Institute of Neurological Disorders and Stroke (NINDS) and the Society for PSP(SPSP) [12]. IPD was diagnosed according to the UK Parkinson’s Disease Brain Bank’s clinical criteria for the diagnosis of Parkinson’s disease [13]. Both possible and probable diagnoses were included in this study. Demographic data including gender, age at onset, age at MRI, and disease duration were collected from medical records. This study was approved by the Institutional Review Board at Hallym University Sacred Heart Hospital.
Brain MR imaging
MRI was conducted in all participants using a 3-Tesla MRI system (Philips MR system Achieva, release 3.2.1.1, 3 T MRI). Our standardized brain MR imaging protocol includes T1-weighted 3D MPRAGE images (1 × 1 × 1 mm3 resolution), T2-weighted axial and sagittal images, axial FLAIR images, and axial GE images. The T2-weighted sagittal images were used for medical assessment before the patients participated in this study. To avoid any possible bias of each rater due to the use of the T2-weighted sagittal images included in our standardized MRI protocol, we used reconstructed T1-weighted midsagittal images. T1-weighted 3D MPRAGE image files in DICOM format were reconstructed to T1-weighted sagittal images using OsiriX medical imaging software, version 3.3.2 (www.osirix-viewer.com) [14] on a MacBook Air (1.8 GHz, Intel Core i5; Apple Computers, Inc., Cupertino, California). This software is designed for the navigation and visualization of multimodal and multidimensional images and offers multiplanar reconstruction. A midsagittal T1-weighted image from each patient was transferred to a PowerPoint presentation for the highest resolution, and this image was used for the visual rating of the HBS. The images of all of the patients were shuffled, and the raters were blinded to clinical information such as diagnosisand age.
Development of the HBS rating scale (Fig. 1)
We developed the HBS rating scale (HBS-RS), a visual rating scale for the shape of the midbrain along the midsagittal plane, which reflects characteristic features of the HBS and critical observations from previous literature [1, 15]. The HBS-RS comprises 4 items, and each item has a weighted score from 0 to 2 points. The 4 items include: item #1) contour of the third ventricle floor, 0 = convex, 1 = flat, or 2 = concave; item #2) shape of the beak, 0 = short, 1 = long but thick, or 2 = long and thin; item #3) shape of the midbrain along the midsagittal plane (alternatively, the shape of the hummingbird head), 0 = inverse trapezoid, 1 = parallelogram, or 2 = trapezoid; and item #4) degree of midbrain atrophy, as judged visually by the midbrain to pontine ratio, 0 = definitely over half, 1 = approximately half, or 2 = definitely less than half (Fig. 1).
Assessment of inter-rater reliability of simple visual identification of the HBS and the HBS-RS
Two movement disorders specialists (Kim YJ and Ma HI) participated in the visual rating and were blinded to clinical information. The assessment of the HBS and the HBS-RS was performed in two steps. First, the two raters assessed the existence of the HBS in a midsagittal T1 image of all of the patients based upon their personal experience, knowledge and rationale for the identification of the HBS, without any discussion. Second, 2 weeks after the first step, the two raters were instructed on the HBS-RS (Fig. 1) and rated the 4 items of the HBS-RS by reviewing the same midsagittal T1 images. To avoid the possible flaw of obtaining higher IRR scores after training, there was no training session for the HBS-RS. The two raters were not involved in the development of the items and scores of the HBS-RS.
Statistical analysis
The IRR for the existence of the HBS and for the HBS-RS was tested using Cohen’s kappa (κ) values. The κ values were interpreted as a measure of the strength of agreement, as follows: <0.20, poor; 0.21–0.40, fair; 0.41–0.60, moderate; 0.61–0.80, good; and 0.81–1.00, very good [16]. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and receiver operating characteristic (ROC) curve were used to measure diagnostic validity. The ROC curve was used to validate the usefulness of the HBS-RS in the evaluation of PSP diagnosis. The ROC curve is a plot of the test sensitivity VS. 1-specificity. The area under the curve (AUC) indicates the accuracy of the test variable (HBS or HBS-RS as a continuous variable) in diagnosing PSP as a classification variable. The AUCs were used to compare the diagnostic accuracy of the cut-off values. The AUC provided a 95% confidence interval. The pairwise comparison of ROC curves was tested with the method of Delong et al. [17].
A P-value of <0.05 was considered significant in all tests. These statistical analyses were performed using the IBM SPSS statistics software ver. 21.0. The AUCs, the pairwise comparisons of the ROC curves and the weighted κ values were calculated using the MedCalc Statistical Software version 13.2.2 (MedCalc Software bvba, Ostend, Belgium; http://www.medcalc.org; 2014).
RESULTS
The demographic and clinical features of all of the patients are described in Table 1. The age at disease onset and the age at MR imaging were significantly older in the PSP group than in the IPD group (P < 0.001). However, the disease duration at MR imaging was not different between the patients with PSP and the patients with IPD (P = 0.586).
The IRR of the HBS showed moderate strength of agreement between the two raters (Cohen’s κ = 0.596, P < 0.001).The IRR of each item of the HBS-RS showed moderate-to-good agreement and κ values ranging from 0.479 to 0.766. The IRR was observed to be highest for item #1 (κ=0.766) and lowest for item #3 (κ=0.479). The composite scores for items #1 and #2; items #1, #2, and #4; and the total HBS-RS scores also showed good agreement between the two raters (κ=0.735, 0.710, and 0.666, respectively, P < 0.001 for all) (Table 2). Although the κ values for each item and the composite scores for the items were higher for the HBS-RS than for when the HBS was evaluated based on personal experience, there were no statistically significant differences due to the overlapping 95% confidence intervals of the κ values.
We evaluated the diagnostic validity of the HBS-RS using sensitivity, specificity, PPV, NPV and AUC. As expected, the diagnostic validity varied according to the cut-off values of each item or of the composite scores of the HBS-RS. However, the overall trends in the differences in these indices between the two raters were similar (Table 3). Compared to the sensitivities of the HBS based upon the experience of the individual raters (65.0 and 77.5), at a low cut-off (0 VS. 1 or higher), the sensitivities of each item of the HBS-RS were higher, ranging from 85.0 to 92.5 (Table 3). The specificities of the HBS based upon the individual raters’ experience were 67.7 and 59.1 for the two raters. However, in the HBS-RS, the specificities reached higher than 80% for both raters when using the composite scores (sum of items #1, #2, and #4 with a cut-off of 6 points; sum of all 4 items with a cut-off of 8 points). The ROC curve for the total HBS-RS score showed fair diagnostic accuracy for PSP in both raters (AUC = 0.76 and 0.72), and the difference in the AUC between the raters was not statistically significant (P = 0.217). The AUC values tended to be higher for the HBS-RS than for the HBS, and the difference was statistically significant for rater 2(Fig. 2).
We next applied the HBS-RS to investigate whether a positive diagnosis of the HBS is influenced by age in IPD and PSP because the size of the midbrain was found to be decreased in healthy elderly controls [18]. To determine the effect of age on the presence of the HBS in PSP or IPD, we performed multivariate analysis for the diagnosis of the HBS, as determined by the HBS-RS at three different cut-offs (item #1: ≥2 VS. 0 or 1; composite score #1+#2+#4: ≥5 VS. less than 5; composite score #1+#2+#3+#4: ≥6 VS. less than 6), using age, disease duration, and IPD/PSP diagnosis as covariates. The results showed that age (OR from 1.062 to 1.138) as well as the diagnosis of IPD/PSP (OR from 2.806 to 4.245) contribute to the identification of the HBS based on the HBS-RS (Esup 1).
DISCUSSION
To the best of our knowledge, this is the first time that the IRR of the HBS has been assessed. Using the HBS-RS, which reflects characteristic features of the HBS in midsagittal MR images of the brainstem, with weighted answers from 0 to 2, we could not only calculate the sensitivity and specificity for the diagnosis of PSP but also adjust the diagnostic validity using a cut-off score to achieve higher specificity. By applying the HBS-RS in our patients, we found that age at MRI study contributes to the identification of the HBS. For differential diagnosis among IPD, PSP, and MSA, a number of MRI measurements or indices have been developed and shown to be useful [1, 18]. However, these quantitative measurements are not always practical because significant effort and specialized tools are required. Therefore, simplified visual assessments with good IRR are expected to be more useful in clinical practice. Although the presence of the HBS in a midsagittal MR image supports the diagnosis of PSP, there has not been a clear consensus regarding the definition of the HBS. We used four characteristic features of the HBS as the items for the HBS-RS. Among these, item #1 (contour of the third ventricle floor) was adopted from a previous study. Midbrain atrophy is a well-known feature of PSP and has been reported to be independent of the profile of the third ventricle floor [1–5, 15]. We tried to make a simple visual assessment of midbrain atrophy in item #4. Item #2 (the length and thickness of the hummingbird beak) and item #3 (the head shape of the hummingbird) are newly developed rating items. For item #2, by using not only the length but also the thickness of the beak, we tried to avoid any possible redundancy with item #1. We believe there is no redundancy between items #1 and #2 because the sensitivity and specificity varied depending on the cut-off value after summation of the scores of items #1 and #2. The IRR of item #1 was the highest among the 4 items, and the IRR of item #2 was comparable to that of item #1. The IRR of item #3, although statistically significant, was lower compared with the IRRs of the other items because of the difficulty in discriminating between parallelogram and trapezoid shapes. For this reason, we did not add item #3 into the analysis of the composite scores. The IRRs of the composite scores were also as good as the IRR of item #1. We believe that the higher sensitivities of each item in the HBS-RS at the cut-off of 0 VS. 1 or 2 than those of the HBS as determined by individual experience suggest that each item of the HBS-RS reflects characteristic features of the HBS. Although Cohen’s κ values were higher for item #1, item #2 and the composite scores of the HBS-RS than that those of simple visual rating based on personal experience, we failed to show statistical significance between them. These findings may suggest that the HBS-RS does not enhance IRR statistically and that item #1, which was derived from a previous study, may be sufficient for good IRR. However, the two raters have communicated more than is typical because they have worked together for 10 years, which may have led to higher IRR in the HBS by simple visual rating based on personal experience. A validation study of the HBS-RS by another research group is needed.
Using weighted scores, we could compare the diagnostic validity of the HBS-RS between the two raters. The overall trend toward a change in the diagnostic validity of each item of the HBS-RS at a particular cut-off score was similar between the two raters. Moreover, using ROC curve analysis of the HBS-RS total score, we could confirm that there was no difference between the two raters in identifying the HBS using the HBS-RS. In addition, we believe that the HBS-RS has advantages over the simple visual rating of the HBS because one can adjust the diagnostic validity, such as the sensitivity and specificity, by forming a composite score for each item. In addition, compared to simply assessment of the HBS, the HBS-RS could be used in research of various diseases and by different researchers as a parametric but easily measurable tool. We did not indicate the best cut-off point of the composite score of the HBS-RS to determine the existence of the HBS because the best cut-off point should be decided based on the evaluation and consensus of multiple researchers. The profile of the third ventricle floor in a midsagittal image was reported to have higher diagnostic validity for distinguishing PSP from IPD than other axial imaging features, such as midbrain atrophy, abnormal tegmental hyperintensity, or the AP diameter of the midbrain [15]. We also believe that among the HBS-RS items, item #1 (contour of the third ventricle floor) is the most important feature because it showed the highest IRR and good diagnostic validity at a cut-off point (concave VS. flat or convex). Given that the moderate sensitivity and specificity of item #1 are comparable to those of the composite score of items #1, #2, and #4 at a cut-off of 4 or 5, one can argue against the advantage of using the composite score of the HBS-RS. However, using composite scores, we could increase the diagnostic specificity to higher than 80% at the highest cut-off for comparable IRRs and AUCs, which could not be achieved using the rating of any individual item. We think that the higher specificity using composite scores from the HBS-RS has important clinical implications. The HBS-RS can be used to more accurately communicate among researchers and to identify the HBS depending on the purpose of a study. In a multi-center clinical study of PSP, the determination of the HBS by the HBS-RS at a cut-off can be used as an adjunctive biomarker by customizing the diagnostic sensitivity and specificity. Midbrain atrophy has been reported in vascular parkinsonism and vascular dementia as well as in normal elderly persons. Normal pressure hydrocephalus patients can show findings similar to the HBS, and this needs to be investigated. Therefore, the HBS may not be as specific for PSP as expected. Based on stringent criteria (i.e., a cut-off with higher specificity), we could estimate the prevalence of the HBS not only in PSP but also in other neurodegenerative disorders.
Our study has some limitations. We used reconstructed images using 3D volumetric T1 MR images. In our hospital, routine MRI includes T2 sagittal images. Because both raters are experienced with T2 sagittal MR images, we used reconstructed T1 images to avoid personal bias. Although the reconstructed T1-weighted sagittal images provided us with the best midsagittal images, these images might have been distorted while being reconstructed. In item #2, the definition of a long or thin beak is not clear. Our study may be criticized because we included possible PSP. However, we do not think that this is critical because the purpose of this study was to evaluate whether the HBS-RS enhances the IRR and diagnostic validity and not to determine the prevalence of the HBS in PSP or IPD. With regard to the diagnostic validity of the HBS-RS, the critique of possible circularity cannot be completely avoided. In addition, the clinical diagnoses were not pathologically confirmed. Further study in pathologically confirmed cases may be needed. Future studies in different populations of patients by other groups are needed. We did not include MSA-parkinsonism patients because the number of MSA patients with MR images taken using the same protocol and with no olivopontocerebellar atrophy in our center was small. Diagnostic validity may have varied if MSA patients were included. In conclusion, the HBS-RS is a simple and measurable visual assessment tool to identify the HBS, with higher IRR and adjustable diagnostic validity for PSP.
FINANCIAL DISCLOSURE/CONFLICTS OF INTEREST
Nothing to disclose.
Footnotes
ACKNOWLEDGMENTS
This research was supported by the basic ScienceResearch Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education, Science and Technology (NRF-2013R1A1A4A01007783).
