Sage Journals: Discover world-class research

Abstract

Objective:

To evaluate an embedded artificial intelligence (AI) system for otoscopic image-based screening and prediagnostic triage of pediatric otitis media, with an emphasis on point-of-care deployment on a microcontroller.

Methods:

We retrospectively analyzed 19 522 tympanic membrane images labeled by otolaryngologists as acute otitis media, otitis media with effusion, or normal. A lightweight convolutional neural network derived from AlexNet was trained using a patient-level split. Two posttraining INT8 variants (per-channel and per-tensor quantization) were deployed on an STM32H7S78-DK board using STM32Cube.AI. We evaluated accuracy on a held-out test set (n=600) and measured activation memory, total random access memory (RAM) usage, and average inference latency on the target device.

Results:

The full-precision model achieved 97.67% test accuracy. Per-channel INT8 preserved accuracy (97.67%). Per-tensor INT8 showed a small decrease (97.50%) but reduced activation memory by 16%, total RAM by 8%, and latency by 13%.

Conclusion:

On-device otoscopic image analysis is feasible on an STM32H7-class microcontroller and may support screening and triage workflows (eg, repeat examination, tympanometry, monitoring, or referral). The tool is not intended to provide a final diagnosis or to direct treatment decisions; clinical diagnosis and management remain clinician-led.

Keywords

otitis media otoscopy tympanic membrane pediatric deep learning embedded AI

Introduction

Otitis media is among the most common pediatric conditions evaluated in routine practice and remains a major driver of antibiotic use and follow-up visits.^1
-5 Differentiating acute otitis media (AOM), otitis media with effusion (OME), and normal tympanic membranes can be difficult, particularly in high-volume outpatient clinics and telemedicine workflows where image quality and operator technique vary.^6
-10 In settings with limited access to otolaryngology expertise, decision support that improves consistency of image interpretation may help prioritize appropriate next steps, while preserving clinician oversight.^11,12

Deep-learning models for otoscopic image classification have shown promising results, including evidence from systematic reviews and meta-analysis.^12
-19 However, many implementations rely on cloud or workstation resources, which can introduce latency, connectivity requirements, and deployment constraints.^20
-22 Tiny machine learning enables on-device inference under strict memory and compute budgets and is well suited to handheld or portable workflows.^21,22

In this study, we present an embedded, point-of-care otoscopic image screening system designed to support prediagnostic triage for pediatric otitis media. Using a specialist-labeled dataset of 19 522 tympanic membrane images, we trained a lightweight AlexNet-derived CNN²³ to classify AOM, OME, and normal findings and then deployed INT8-quantized variants on an STM32H7-class microcontroller. Our main contribution is a workflow-oriented evaluation that links model outputs to practical next steps (eg, repeat examination, tympanometry, monitoring, or referral) and quantifies the on-device trade-offs that determine feasibility in real clinics, including random access memory (RAM) footprint and inference latency. The system is intended as decision support for screening and triage rather than a standalone diagnostic device; final diagnosis and management remain clinician-led and should incorporate symptoms, history, and objective testing when indicated.

Materials and Methods

Dataset and Labeling

Otoendoscopic tympanic membrane images were retrospectively collected at Shenzhen Children’s Hospital (January 2016 to December 2019) from pediatric patients younger than 18 years, including infants as young as 2 months. The study was approved by the Ethics Committee of Shenzhen Children’s Hospital, and all images were de-identified prior to analysis.

Images were acquired using multiple clinical systems, including Otocope 0° φ2.7 × 105 (Zhejiang Tiansong Medical Instrument Co., China), EndoSTROBD (XION GmbH, Germany), and EPK-i5000 VNL-1070STK (PENTAX, RICOH Imaging Company, Japan). Board-certified otolaryngologists labeled each image as AOM, OME, or normal tympanic membrane. The dataset contained 19 522 images: 6210 AOM, 7548 OME, and 5764 normal. Figure 1 shows representative pediatric otoendoscopic images for each category.

Figure 1.

Representative otoendoscopic tympanic membrane images for AOM, OME, and normal. AOM, Acute otitis media; OME, Otitis media with effusion.

We prespecified a held-out test set of 600 images (approximately 3%) to provide stable estimates of overall and per-class performance while preserving a large development set under a patient-level split. With an expected overall accuracy of 0.97 to 0.98, a test set of 600 yields a 95% confidence interval with an approximate half-width of 1% for overall accuracy, and—given class-balanced sampling—supports per-class estimates with acceptable precision for a screening/triage tool evaluation. The remaining 18 922 images were used for model development. The held-out test set was constructed with 200 images per class (AOM, normal, and OME) to support balanced per-class evaluation, while preserving patient-level separation from the development set.

Model Architecture and Training

We trained a lightweight convolutional neural network derived from AlexNet for 3-class classification.²³ The classification model was based on a standard AlexNet architecture, consisting of 5 convolutional layers followed by 3 fully connected layers. The numbers of output channels in Conv1-Conv5 were 96, 256, 384, 384, and 256, respectively. ReLU activation was applied after each convolutional layer and after the first 2 fully connected layers, and max-pooling was used after Conv1, Conv2, and Conv5. Dropout (P = .5) was applied to the first 2 fully connected layers. For the present study, the final classification layer was adapted to output 3 classes (AOM, OME, and normal). No pruning was applied. The full layer-by-layer specification is provided in Table S1.

Training ran for 800 epochs with a batch size of 64. We used the Adam optimizer²⁴ and applied standard data augmentation (random rotation, scaling, and horizontal flipping) to improve robustness.²⁵ Early stopping was enabled based on validation loss, and the checkpoint with the best validation performance was selected for final testing. The training and validation learning curves are shown in Figure 2.

Figure 2.

Training and validation accuracy and loss over 800 epochs. (a) Accuracy of training (b) Loss of training (c) Accuracy of validation (d) Loss of validation.

Quantization and Embedded Deployment

For microcontroller deployment, we implemented post-training INT8 quantization using 2 granularities: per-channel and per-tensor.^26,27 Models were deployed to an STM32H7S78-DK board (ARM Cortex-M7) using the STM32Cube.AI toolchain (X-CUBE-AI).^28,29 Model weights were stored in external Octo-SPI flash on the STM32H7S78-DK, while intermediate activations were allocated in on-chip static random access memory under a fixed firmware configuration. The key steps of deployment are shown in Figure 3. We measured activation memory, total RAM usage, compute-node count in the generated inference graph, and average inference latency under a consistent firmware configuration.

Figure 3.

Embedded deployment pipeline and software stack.

Results

After deployment, classification performance was assessed on the held-out test set using overall accuracy and confusion matrixes. Embedded benchmarks were performed on the same test set to ensure comparability across quantization schemes.

On the held-out test set (n = 600), the full-precision model achieved 97.67% overall accuracy (Figure 4). Most errors occurred between AOM and OME, reflecting overlapping otoscopic appearances in borderline presentations.

Figure 4.

Confusion matrix of the original unquantized model on the held-out test set.

We benchmarked per-channel and per-tensor INT8 AlexNet deployments on the STM32H7S78-DK to quantify accuracy–efficiency trade-offs. As summarized in Table 1, per-tensor quantization reduced activation memory by 16% and total RAM by 8%, simplified the execution graph (9 vs 13 compute nodes), and lowered average latency by 13%. These gains come at the cost of reduced quantization granularity, whereas per-channel quantization better preserves numerical fidelity but with higher resource demand.

Table 1.

Resource Utilization Metrics for Per-Channel and Per-Tensor Models.

Metric	Per-channel	Per-tensor	Difference
Activations memory	152 320 B	127 872 B	24 448 B (16%)
Total RAM usage	302 852 B	278 404 B	24 448 B (8%)
Weight memory	14 248 364 B	14 248 364 B	—
Compute nodes	13	9	4 nodes
Average inference latency	603.32 ms/sample	524.67 ms/sample	78.65 ms (13%)

Abbreviation: RAM, Random access memory.

Confusion matrixes for both models (Figure 5) show subtle differences. Per-channel INT8 preserved test accuracy (97.67%, Table 2). Per-tensor INT8 produced a small decrease (97.50%, Table 2) while improving RAM and latency. The per-channel and per-tensor INT8 models showed similar classification performance on the held-out test set (97.67% vs 97.50%). Given the small absolute difference, the results are presented descriptively and are best interpreted in the context of deployment trade-offs rather than as evidence of superiority of 1 quantization strategy over the other.

Figure 5.

Confusion matrixes for INT8 per-channel and per-tensor models on the held-out test set (a) Per-channel quantized Model (b) Per-tensor quantized model.

Table 2.

Accuracy Comparison Between Per-Channel and Per-Tensor Models.

Model	Test accuracy (%)	Notes
Full precision (float)	97.67	Reference
INT8 per-channel	97.67	Accuracy preserved
INT8 per-tensor	97.50	Small decrease; improved RAM/latency

Abbreviation: RAM, Random access memory.

To provide a clinically relevant evaluation for this 3-class clinical classification task, we further computed per-class sensitivity, specificity, positive predictive value, and negative predictive value using a one-vs-rest formulation, with 95% confidence intervals estimated by the Wilson score method, as is shown in Table 3.

Table 3.

Per-Class Clinical Performance Metrics for the Per-Channel and Per-Tensor Quantized Models, Computed Using a One-vs-Rest Formulation.

Model	Class	Sensitivity (%) (95% CI)	Specificity (%) (95% CI)	PPV (%) (95% CI)	NPV (%) (95% CI)
Per-channel	AOM	97.50 (94.28-98.93)	98.00 (96.10-98.98)	96.06 (92.42-97.99)	98.74 (97.09-99.46)
Per-channel	Normal	97.50 (94.28-98.93)	100.00 (99.05-100.00)	100.00 (98.07-100.00)	98.77 (97.14-99.47)
Per-channel	OME	98.00 (94.97-99.22)	98.50 (96.77-99.31)	97.03 (93.67-98.63)	98.99 (97.44-99.61)
Per-tensor	AOM	97.50 (94.28-98.93)	98.00 (96.10-98.98)	96.06 (92.42-97.99)	98.74 (97.09-99.46)
Per-tensor	Normal	97.00 (93.61-98.62)	100.00 (99.05-100.00)	100.00 (98.06-100.00)	98.52 (96.81-99.32)
Per-tensor	OME	98.00 (94.97-99.22)	98.25 (96.43-99.15)	96.55 (93.05-98.32)	98.99 (97.44-99.61)

Note. Values are reported with 95% CI (Wilson score intervals).

Abbreviations: AOM, Acute otitis media; CI, Confidence interval; NPV, Negative predictive value; OME, Otitis media with effusion; PPV, Positive predictive value.

Discussion

We evaluated an embedded, on-device otoscopic image screening system for pediatric otitis media and quantified practical trade-offs between 2 INT8 quantization strategies.^26,27 Per-channel quantization preserved accuracy relative to the float reference, whereas per-tensor quantization reduced activation memory and latency at the cost of a small decrease in accuracy. Although the per-channel model showed a numerically higher accuracy, the difference was small. Therefore, the practical choice between the 2 INT8 variants is better interpreted as a trade-off between fidelity to the reference model and deployment efficiency, with per-tensor quantization offering lower RAM usage and faster inference on the target hardware.

Prior work has reported high performance for deep learning-based analysis of otoscopic images, but most systems assume smartphone, cloud, or workstation inference.^12
-19 Embedded inference offers a complementary path that can reduce reliance on connectivity and simplify privacy-preserving deployment. In our implementation, quantization reduced the model size substantially (approximately 163 MiB to 13.6 MiB. For clarity, the 163 MiB refers to the end-to-end deployment footprint measured in our toolchain, including network parameters, intermediate activation buffers, and runtime/graph metadata required by the embedded inference engine), enabling fully on-device inference under tight memory budgets.

This work is positioned as screening and prediagnostic triage support, not autonomous diagnosis. Otitis media diagnosis depends on integration of symptoms, history, and objective assessments when indicated (eg, tympanometry and audiology).^2,3,8 A realistic role for embedded AI is to flag images that may warrant repeat examination, tympanometry, closer follow-up, or referral—especially in primary care and telemedicine contexts where specialist review is not always available.^9
-11

Limitations include use of a single-center retrospective dataset and evaluation of a single model family. External validation across institutions and devices, including low-cost digital otoscopes, is needed. Future work will focus on generalizability, robustness under realistic degradations (blur, low illumination, specular reflection, cerumen occlusion), and prospective workflow evaluation to determine clinical impact. We did not perform a formal inferential statistical comparison between the 2 INT8 variants; future studies using larger and externally validated test sets should include such analyses in addition to descriptive deployment benchmarks.

Finally, we propose a possible clinical workflow for image-based screening and prediagnostic triage as shown in Figure 6. The system first performs an image quality check to determine whether the input is suitable for automated analysis. If acceptable, the AI model provides a screening/triage output to support recommended next steps (eg, repeat examination, follow-up, or referral). The clinician integrates symptoms/history and objective assessments to make the final diagnosis.

Figure 6.

Proposed clinical workflow for image-based screening and prediagnostic triage.

Conclusions

A lightweight AlexNet-derived CNN can be quantized to INT8 and deployed on an STM32H7 microcontroller for on-device screening of pediatric otitis media. Per-channel INT8 preserved accuracy, whereas per-tensor INT8 improved latency and reduced activation memory with a minimal accuracy decrease. The system is intended to support screening and triage; final diagnosis and management remain clinician-led.

Supplemental Material

sj-docx-1-ear-10.1177_01455613261441511 – Supplemental material for Embedded AI-Assisted Otoscopic Image Screening for Pediatric Otitis Media

Supplemental material, sj-docx-1-ear-10.1177_01455613261441511 for Embedded AI-Assisted Otoscopic Image Screening for Pediatric Otitis Media by Changwei Lv, Desheng Jia, Zebin Wu, Bo Gao, Linzhong Xia and Xuansheng Wang in Ear, Nose & Throat Journal

Footnotes

ORCID iDs

Changwei Lv

Desheng Jia

Xuansheng Wang

Ethical Considerations

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of Shenzhen Children’s Hospital for studies involving humans.

Consent to Participate

Patient consent was waived due to the retrospective nature of the study and the use of anonymized clinical data.

Author Contributions

Conceptualization, Changwei Lv. and Xuansheng Wang.; methodology, Changwei Lv, Desheng Jia and Zebin Wu.; software, Bo Gao and Changwei Lv.; investigation, Changwei Lv, Desheng Jia and Xuansheng Wang.; resources, Linzhong Xia and Xuansheng Wang.; data curation, Desheng Jia, Zebin Wu and Changwei Lv.; writing—original draft preparation, Changwei Lv.; writing—review and editing, Xuansheng Wang.; funding acquisition, Changwei Lv, Xuansheng Wang. All authors have read and agreed to the published version of the manuscript.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by National Natural Science Foundation of China, grant number 92467204.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The original otoendoscope tympanic membrane images analyzed in this study are not publicly available due to ethical approval constraints and patient privacy protections. Access to de-identified data may be considered upon reasonable request to the corresponding author, subject to institutional approval and execution of an appropriate data use agreement. To facilitate reproducibility without disclosing protected clinical images, the authors will provide trained model weights, inference code, scripts used for robustness/degradation experiments, and anonymized aggregate statistics (eg, per-class performance metrics and confusion matrixes) via a controlled repository or upon request.

Supplemental Material

Supplemental material for this article is available online.

References

Schilder

AGM

Chonmaitree

Cripps

, et al. Otitis media. Nat Rev Dis Primers. 2016;2:16063. doi:10.1038/nrdp.2016.63

Lieberthal

Carroll

Chonmaitree

, et al. The diagnosis and management of acute otitis media. Pediatrics. 2013;131(3):e964-e999. doi:10.1542/peds.2012-3488

Rosenfeld

Shin

Schwartz

, et al. Clinical practice guideline: otitis media with effusion (update). Otolaryngol Head Neck Surg. 2016;154(Suppl. 1):S1-S41. doi:10.1177/0194599815624407

Venekamp

Sanders

Glasziou

, et al. Antibiotics for acute otitis media in children. Cochrane Database Syst Rev. 2023;11(11):CD000219. doi:10.1002/14651858.CD000219.pub5

National Institute for Health and Care Excellence (NICE). Otitis media (acute): antimicrobial prescribing. NICE guideline NG91. Updated 2022. (Official guideline page: NG91).

Pichichero

Poole

MD.

Assessing diagnostic accuracy and tympanocentesis skills in the management of otitis media. Arch Pediatr Adolesc Med. 2001;155(10):1137-1142. doi:10.1001/archpedi.155.10.1137

Shaikh

Hoberman

Kaleida

, et al. Otoscopic signs of otitis media. Pediatr Infect Dis J. 2011;30(10):822-826. doi:10.1097/INF.0b013e31822e6637

Othman

Alashkar

Bitar

MA.

Accuracy of video otoscopy in predicting the presence of middle ear effusion in children compared to tympanometry: a diagnostic study. Cureus. 2024;16(11):e73098. doi:10.7759/cureus.73098

Cao

Chen

Grais

, et al. Machine learning in diagnosing middle ear disorders using tympanic membrane images: a meta-analysis. Laryngoscope. 2023;133(4):732-741. doi:10.1002/lary.30291

10.

Bhatia

Chauhan

Nayak

, et al. Health technology assessment of video otoscopy for the diagnosis of otitis media in children in comparison to conventional otoscopy in primary healthcare settings in India. Clin Epidemiol Glob Health. 2024;27:101590. doi:10.1016/j.cegh.2024.101590

11.

Appelberg

Sjögren

Hultcrantz

, et al. Digital otoscopy in remote consultations. Laryngoscope Investig Otolaryngol. 2024;9(4):e70003. doi:10.1002/lio2.70003

12.

Habib

Kajbafzadeh

Hasan

, et al. Artificial intelligence to classify ear disease from otoscopy: a systematic review and meta-analysis. Clin Otolaryngol. 2022;47(3):401-413. doi:10.1111/coa.13925

13.

Cha

Pae

Seong

S-B

, et al. Automated diagnosis of ear disease using ensemble deep learning with a big otoendoscopy image database. EBioMedicine. 2019;45:606-614. doi:10.1016/j.ebiom.2019.06.050

14.

Livingstone

Chau

Otoscopic diagnosis using computer vision: an automated machine learning approach. Laryngoscope. 2020;130(6):1408-1413. doi:10.1002/lary.28292

15.

Sundgaard

Harte

Bray

, et al. Deep metric learning for otitis media classification. Med Image Anal. 2021;71: 102034. doi:10.1016/j.media.2021.102034

16.

Tsutsumi

Goshtasbi

Risbud

, et al. A web-based deep learning model for automated diagnosis of otoscopic images. Otol Neurotol. 2021;42(9):e1382-e1388. doi:10.1097/MAO.0000000000003210

17.

Habib

A-R

Bock

, et al. Evaluating the generalizability of deep learning image classification algorithms to detect middle ear disease using otoscopy. Sci Rep. 2023;13: 5368. doi:10.1038/s41598-023-31921-0

18.

Shaikh

Conway

Kovacevic

, et al. Development and validation of an automated classifier to diagnose acute otitis media in children. JAMA Pediatr. 2024;178(4):401-407. doi:10.1001/jamapediatrics.2024.0011

19.

Dubois

Eigen

Simon

, et al. Development and validation of a smartphone-based deep-learning-enabled system to detect middle-ear conditions in otoscopic images. NPJ Digit Med. 2024;7(1):162. doi:10.1038/s41746-024-01159-9

20.

Rajput

Tarif

Wolfe

, et al. AI in point-of-care—a sustainable healthcare revolution at the edge. Pac Symp Biocomput. 2025;30:734-747. doi:10.1142/9789819807024_0055

21.

Capogrosso

Cunico

Cheng

, et al. A machine learning-oriented survey on tiny machine learning. IEEE Access. 2024;12:23406-23426. doi:10.1109/ACCESS.2024.3365349

22.

Heydari

Mahmoud

QH.

Tiny machine learning and on-device inference: a survey of applications, Challenges, and Future Directions. Sensors (Basel). 2025;25(10):3191. doi:10.3390/s25103191. PMID:40431982

23.

Krizhevsky

Sutskever

Hinton

GE.

ImageNet classification with deep convolutional neural networks. Commun ACM. 2017;60(6):84-90. doi:10.1145/3065386

24.

Kingma

Adam: A Method for Stochastic Optimization. arXiv. 2014:1412.6980.

25.

Chlap

Min

Vandenberg

, et al. A review of medical image data augmentation techniques for deep learning applications. J Med Imaging Radiat Oncol. 2021;65(5):545-563. doi:10.1111/1754-9485.13261

26.

Jacob

Kligys

Chen

, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference. Proc IEEE/CVF CVPR. Published 2018. https://ieeexplore.ieee.org/document/8578384

27.

TensorFlow Lite Documentation. Post-training quantization (dynamic range / full integer). (Online documentation, accessed 2025-12).

28.

STMicroelectronics. UM2526: getting started with X-CUBE-AI expansion package for STM32Cube. 2022. (User manual, document ID UM2526).

29.

STMicroelectronics. UM3289: discovery kit with STM32H7S7L8 MCU (STM32H7S78-DK)—User manual. 2024.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.01 MB