Sage Journals: Discover world-class research

Abstract

Otitis media refers to the inflammation of the middle ear. Visual examination of middle ear tympanic membrane is required using a standard otoscope device for the diagnosis of ear diseases. Experts visually evaluate the tympanic membrane as part of their clinical examination. However, visual inspection introduces human errors and results in limited diversity among the observers. Meanwhile, accurate diagnosis is quite challenging for non-specialists or general practitioners, as it depends on the observer’s expertize through otoscopy. Our proposed approach uses hybrid Inception-ViT model, that incorporates multi-scaled fine-grained features in vision transformer, and our research evaluates its performance against several convolutional networks. This study presents a versatile and comprehensive architecture that automatically extracts fine-grained feature patches using an inception module, followed by a vision transformer encoder to classify tympanic membrane. A total of 956 otoscope images were employed to train vision transformer based neural network architecture to classify tympanic membrane into five categories of ear diseases, covering most common ear diagnosis (Normal, Acute Otitis Media, Chronic Suppurative Otitis Media, Earwax, Other). To address the issue of limited dataset, we make use of image augmentations like flip, rotate etc. whereas weighted cross entropy loss is used to address class-imbalance issue. Our proposed approach results in improved accuracy of 83.05% for tympanic membrane detection. This methodology could potentially be utilized in future’s clinical otological decision support systems to enhance diagnostic accuracy for physicians and minimize the total incidence of misdiagnosis. The final results are comparable to those of leading clinical specialists and provide operator-independent diagnosis of otitis media with higher accuracy.

Keywords

tympanic membrane classification otolaryngology fine-grained features vision transformer multi-head attention

Get full access to this article

View all access options for this article.

References

Rosenfeld

Shin

Schwartz

et al. Clinical practice guideline: otitis media with effusion (update). Otolaryngol Head Neck Surg 2016; 154: S1–S41.

Coleman

Cervin

. Probiotics in the treatment of otitis media. The past, the present and the future. Int J Pediatr Otorhinolaryngol 2019; 116: 135–140.

Worrall

. Acute otitis media. Can Fam Physician 2007; 53: 2147.

Edetanlen

Saheeb

. Otitis media with effusion in nigerian children with cleft palate: incidence and risk factors. Br J Oral Maxillofac Surg 2019; 57: 36–40.

Di Francesco

Barros

Ramos

. Otitis media with effusion in children younger than 1 year. Rev Paul Pediatr 2016; 34: 148–153.

Habib

Kajbafzadeh

Hasan

, et al. Artificial intelligence to classify ear disease from otoscopy: a systematic review and meta-analysis. Clin Otolaryngol 2022; 47: 401–413.

Myburgh

Jose

Swanepoel

, et al. Towards low cost automated smartphone- and cloud-based otitis media diagnosis. Biomed Signal Process Control 2018; 39: 34–52.

Myburgh

Van Zijl

Swanepoel

, et al. Otitis media diagnosis for developing countries using tympanic membrane image-analysis. EBioMedicine 2016; 5: 156–160.

Başaran

Cömert

Çelik

. Convolutional neural network approach for automatic tympanic membrane detection and classification. Biomed Signal Process Control 2020; 56: 101734.

10.

Guan

Huang

Zhong

, et al. Diagnose like a radiologist: attention guided convolutional neural network for thorax disease classification. arXiv preprint arXiv:1801.09927, 2018.

11.

Sandström

Myburgh

Laurent

, et al. A machine learning approach to screen for otitis media using digital otoscope images labelled by an expert panel. Diagnostics 2022; 12: 1318.

12.

Hayden

Ogilvie

Stewart

, et al. Development of a clinical decision support tool for diagnostic imaging use in patients with low back pain: a study protocol. Diagn Progn Res 2019; 3: 1–8.

13.

Sundgaard

Harte

Bray

, et al. Deep metric learning for otitis media classification. Med Image Anal 2021; 71: 102034.

14.

Kuruvilla

Shaikh

Hoberman

, et al. Automated diagnosis of otitis media: vocabulary and grammar. Int J Biomed Imaging 2013; 2013: 327515.

15.

Livingstone

Chau

. Otoscopic diagnosis using computer vision: an automated machine learning approach. Laryngoscope 2020; 130: 1408–1413.

16.

Tran

Fang

Pham

, et al. Development of an automatic diagnostic algorithm for pediatric otitis media. Otol Neurotol 2018; 39: 1060–1065.

17.

Afify

Mohammed

Hassanien

. Insight into automatic image diagnosis of ear conditions based on optimized deep learning approach. Ann Biomed Eng 2024; 52: 865–876.

18.

Akyol

. Comprehensive comparison of modified deep convolutional neural networks for automated detection of external and middle ear conditions. Neural Comput Appl 2024; 36: 5529–5544.

19.

Dubois

Eigen

Simon

, et al. Development and validation of a smartphone-based deep-learning-enabled system to detect middle-ear conditions in otoscopic images. NPJ Digit Med 2024; 7: 162.

20.

Sundgaard

Hannemose

Laugesen

, et al. Multi-modal deep learning for joint prediction of otitis media and diagnostic difficulty. Laryngoscope Investig Otolaryngol 2024; 9: e1199.

21.

Cha

Pae

Seong

, et al. Automated diagnosis of ear disease using ensemble deep learning with a big otoendoscopy image database. EBioMedicine 2019; 45: 606–614.

22.

Lee

Choi

Chung

. Automated classification of the tympanic membrane using a convolutional neural network. Appl Sci 2019; 9: 1827.

23.

Dosovitskiy

. An image is worth

16 \times 16

words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.

24.

Shamshad

Khan

Zamir

, et al. Transformers in medical imaging: a survey. Med Image Anal 2023; 88: 102802.

25.

Szegedy

Vanhoucke

Ioffe

, et al. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Las Vegas, NV, USA. 2016, pp.2818–2826

26.

Shao

Chen

. Hybrid vit-cnn network for fine-grained image classification. IEEE Signal Process Lett 2024; 31: 1109–113.

27.

Bur

Shew

New

. Artificial intelligence for the otolaryngologist: a state of the art review. Otolaryngol Head Neck Surg 2019; 160: 603–611.

28.

Litjens

Kooi

Bejnordi

, et al. A survey on deep learning in medical image analysis. Med Image Anal 2017; 42: 60–88.

29.

Song

Kim

Lee

, et al. Image-based artificial intelligence technology for diagnosing middle ear diseases: a systematic review. J Clin Med 2023; 12: 5831.

30.

Abouzari

Goshtasbi

Sarna

, et al. Prediction of vestibular schwannoma recurrence using artificial neural network. Laryngoscope Investig Otolaryngol 2020; 5: 278–285.

31.

You

Lin

Mijovic

, et al. Artificial intelligence applications in otology: a state of the art review. Otolaryngol Head Neck Surg 2020; 163: 1123–1133.

32.

Mironică

Vertan

Gheorghe

. Automatic pediatric otitis detection by classification of global image features. In: 2011 E-health and bioengineering conference (EHB). IEEE, 2011, pp.1–4.

33.

Shie

Chuang

Chou

, et al. Transfer representation learning for medical image analysis. In: 2015 37th annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, 2015, pp.711–714.

34.

Kılıçarslan

Diker

Közkurt

, et al. Identification of multiclass tympanic membranes by using deep feature transfer learning and hyperparameter optimization. Measurement 2024; 229: 114488.

35.

Senaras

Moberly

Teknos

, et al. Detection of eardrum abnormalities using ensemble deep learning approaches. In: Medical imaging 2018: computer-aided diagnosis. Vol. 10575. SPIE, 2018, pp.295–300.

36.

Binol

Moberly

Niazi

MKK

, et al. Decision fusion on image analysis and tympanometry to detect eardrum abnormalities. In: Medical imaging 2020: computer-aided diagnosis. Vol. 11314. SPIE, 2020, pp.375–382.

37.

Xiao

, et al. Fine-grained classification of endoscopic tympanic membrane images. In: 2019 IEEE international conference on image processing (ICIP). IEEE, 2019, pp.230–234.

38.

Cai

Chen

et al. Investigating the use of a two-stage attention-aware convolutional neural network for the automated diagnosis of otitis media from tympanic membrane images: a prediction model development and validation study. BMJ Open 2021; 11: e041139.

39.

Cao

Chen

Grais

, et al. Machine learning in diagnosing middle ear disorders using tympanic membrane images: a meta-analysis. Laryngoscope 2023; 133: 732–741.

40.

Akyol

Uçar

Atila

, et al. An ensemble approach for classification of tympanic membrane conditions using soft voting classifier. Multimed Tools Appl 2024; : 0–0.

41.

Chaudhari

Mithal

Polatkan

, et al. An attentive survey of attention models. ACM Trans Intell Syst Technol 2021; 12: 1–32.

42.

Bahdanau

. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.

43.

Han

Wang

Chen

et al. A survey on vision transformer. IEEE Trans Pattern Anal Mach Intell 2022; 45: 87–110.

44.

Patra

Kisku

. A multi-modal approach for efficient and contextually rich visual description generation. In: International conference on computational intelligence in pattern recognition. Springer, 2024, pp.177–190.

45.

Patra

Kisku

. Exploring bengali image descriptions through the combination of diverse CNN architectures and transformer decoders. Turk J Eng 2025; 9: 64–78.

46.

Tsutsumi

Goshtasbi

Risbud

, et al. A web-based deep learning model for automated diagnosis of otoscopic images. Otol Neurotol 2021; 42: e1382–e1388.

47.

Başaran

Şengür

Cömert

, et al. Normal and acute tympanic membrane diagnosis based on gray level co-occurrence matrix and artificial neural networks. In: 2019 international artificial intelligence and data processing symposium (IDAP). IEEE, 2019, pp.1–6.

48.

Smith

. Cyclical learning rates for training neural networks. In: 2017 IEEE winter conference on applications of computer vision (WACV). IEEE, 2017, pp.464–472.

49.

Van der Maaten

Hinton

. Visualizing data using t-SNE. J Mach Learn Res 2008; 9: 2579–2605.

50.

Ding

Huang

Tian

, et al. Diagnosis, treatment, and management of otitis media with artificial intelligence. Diagnostics 2023; 13: 2309.

51.

Kaleida

Ploof

Kurs-Lasky

, et al. Mastering diagnostic skills: Enhancing proficiency in otitis media, a model for diagnostic skills training. Pediatrics 2009; 124: e714–e720.

52.

Patra

Kisku

. Enhancing image captioning with asynchronous dual attention vision transformer. Intell Data Anal 2025.

53.

Patra

Kisku

. Precise and faster image description generation with limited resources using an improved hybrid deep model. In: International conference on pattern recognition and machine intelligence. Springer, 2023, pp.166–175.

54.

Wang

Deng

. Deep visual domain adaptation: a survey. Neurocomputing 2018; 312: 135–153.

55.

Velden

der

Van

Kuijf

Gilhuijs

, et al. Explainable artificial intelligence (XAI) in deep learning-based medical image analysis. Med Image Anal 2022; 79: 102470.

Fine-grained otitis media classification using vision transformer leveraging Inception features

Abstract

Keywords

Get full access to this article

References