Sage Journals: Discover world-class research

Abstract

Keywords

diagnostic accuracy nasolaryngoscopy telemedicine reliability

I was interested to read the papers by Miller and colleagues published in April 2020 issue of Ear, Nose & Throat Journal.¹ Telemedicine is an increasingly prevalent component of medical practice. In otolaryngology, there is the potential for telemedicine services to be performed in conjunction with device use, such as with a nasolaryngoscope. The authors aimed to evaluate the reliability of remote examinations of the upper airway through an iPhone recording using a coupling device attached to a nasopharyngolaryngoscopy (NPL). The NPL was performed using a coupling device attached to a smartphone to record the examination. A second, remote otolaryngologist then evaluated the recorded examination. Both otolaryngologists evaluated findings of anatomic sites including nasopharynx, oropharynx, base of tongue, larynx including subsites of epiglottis, arytenoids, aryepiglottic folds, false vocal cords, true vocal cords, patency of airway, and diagnostic impression, all of which were documented through a survey. Results of the survey were evaluated through inter-rater agreement using the κ statistic. They mentioned that 45 patients underwent an NPL. The inter-rater agreement for overall diagnosis was 0.74 with 80% agreement, rated as “good.” Other anatomic subsites with “good” or better inter-rater agreement were nasopharynx (0.75), oropharynx (0.75), and true vocal cords (0.71), with strong percentage agreement of 89%, 91%, and 87%, respectively.

I want to congratulate the authors for this successful article and make some contributions. The main purpose of my letter is to mention methodological limitations of κ statistic to assess reliability (agreement). First, κ statistic depends on the prevalence in each category. It is possible to have the prevalence of concordant cells equal to 90% and discordant cells to 10%, however, get different κ-coefficient value (0.44 as moderate vs 0.81 as very good), respectively (Table 1). κ statistic also depends on the number of categories.^2

-8 I should mention that applying the weighted kappa would be a good choice to assess intra-rater agreement. However, Fleiss’ kappa is suggested to assess inter-rater agreement. Briefly, for quantitative variable, intraclass correlation coefficient should be used, and for qualitative variables, weighted kappa should be used.^2

-8 They concluded that a telemedicine device for NPL use demonstrates strong diagnostic accuracy across providers and good overall evaluation. It is crucial to know that accuracy (validity) and reliability (agreement) are 2 completely different methodological issues. To make it brief, any conclusion on accuracy and reliability should take into account correct statistical approach. Otherwise, misinterpretation may occur.

Table 1.

Limitation of κ Statistic to Assess Agreement Between 2 Observers With Different Prevalence in the 2 Categories.^a

Observer 2		Observer 1
Observer 2		Yes	No	Total (%)
Situation (a)	Yes	85	5	90
	No	5	5	10
K = 0.44 (moderate)	Total	90	10	100
Situation (b)	Yes	45	5	50
	No	5	45	50
K = 0.81 (very good)	Total	50	50	100

^aBold indicates frequency of concordant cells.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Siamak Sabour

References

Miller

Buzi

Williams

, et al. Reliability and accuracy of remote fiberoptic nasopharyngolaryngoscopy in the pediatric population. Ear Nose Throat J. 2020:145561320919109. doi:10.1177/0145561320919109

Szklo

Nieto

F.J

. Epidemiology Beyond the Basics. 3rd ed. Jones and Bartlett; 2014:313–343.

Sabour

. Reproducibility of the external surface position in left-breast DIBH radiotherapy with spirometer-based monitoring: methodological mistake. J Appl Clin Med Phys. 2014;15(4):4909. doi:10.1120/jacmp.v15i4.4909

Sabour

. Reproducibility of semi-automatic coronary plaque quantification in coronary CT angiography with sub-mSv radiation dose; common mistakes. J Cardiovasc Comput Tomogr. 2016;10(5):e21–e22. doi:10.1016/j.jcct.2016.07.002

Sabour

. A common mistake in assessing the diagnostic value of a test: failure to account for statistical and methodologic issues. J Nucl Med. 2017;58(7):1182–1183. doi:10.2967/jnumed.115.156745

Sabour

Ghassemi

. Accuracy, validity, and reliability of the infrared optical head tracker (IOHT). Invest Ophthalmol Vis Sci. 2012;53(8):4776. doi:10.1167/iovs.12-10324

Naderi

Sabour

. Reproducibility of diagnostic criteria associated with atypical breast cytology: a methodological issue. Cytopathology. 2018;29(4):396. doi:10.1111/cyt.12560

Sabour

. Reproducibility of endometrial cytology by the Osaki Study Group method: methodological issues. Cytopathology. 2017;28(5):441–442. doi:10.1111/cyt.12447

Remote Fiber-Optic Nasopharyngolaryngoscopy in the Pediatric Population: Methodological Issues on Reliability and Accuracy

Abstract

Keywords

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iD

References