Abstract
Objective
This study investigated the effectiveness of remote administration of speech audiometry, an essential tool for diagnosing hearing loss and determining its severity. Utilizing two software tools for remote testing, the research aimed to compare these digital methods with traditional, in-person speech audiometry to evaluate their feasibility and accuracy.
Design
Participants underwent the Cantonese Hearing in Noise Test (CHINT) under three listening conditions—quiet, noise from the front, and noise from the right side—using three different administration methods: the conventional in-person approach, video conferencing software, and remote access software.
Study Sample
Fifty-six Cantonese-speaking adults residing in Hong Kong participated in this study.
Results
Analysis revealed no significant differences in CHINT scores among the three administration methods, indicating the potential for remote administration to yield results comparable to those of conventional methods.
Conclusions
The findings supported the feasibility of remote speech audiometry using the investigated digital tools. This study paved the way for the wider adoption of tele-audiology practices, particularly in situations where in-person assessments are not possible.
Introduction
The World Health Organization (WHO) estimates that approximately 900 million people worldwide suffer from disabling hearing loss, leading to potential costs of around USD 750 billion annually if unaddressed. 1 This presents a substantial barrier to accessing comprehensive hearing care services, which include, but are not limited to, hearing aid (HA) provision. Tele-audiology emerges as a component of the broader solution, providing access to audiological care for individuals in remote or underserved areas. 2 The COVID-19 pandemic has notably accelerated the shift from traditional in-person audiological services to remote delivery, markedly changing perceptions and usage of tele-audiology. Before the pandemic, the value of tele-audiology was recognized by 44.3% of respondents in a survey, a figure that rose dramatically to 87.1% during the pandemic. 3 The adoption of tele-audiology services similarly increased from 41.3% pre-pandemic to 61.9% during it, indicating a robust appreciation for its benefits. 3 The US Department of Health and Human Services observed steady telehealth usage rates from April 2021 to August 2022, noting that, despite a peak during the pandemic, telehealth use has remained significantly higher than pre-pandemic levels. This trend indicates a sustained adoption of telehealth by both patients and providers. 4
Tele-audiology encompasses a broad spectrum of services, including screening (e.g. pure-tone audiometry, otoacoustic emission, tympanometry), diagnosis (e.g. auditory brainstem response, video-otoscopy, tympanometry, pure-tone audiometry), and intervention (e.g. cochlear implant mapping, tinnitus management, education, counselling, and hearing aid programming), as detailed in previous studies. 2 Despite the comprehensive capabilities that tele-audiology offers, there has been limited focus on tele-speech-audiometry, a component of audiological assessments. Speech audiometry plays a role in the evaluation of hearing loss, providing essential information on an individual's ability to understand speech under various conditions. This testing is indispensable for diagnosing the degree of hearing loss and for guiding the appropriate choice of interventions, such as hearing aids, cochlear implants, or other therapeutic measures.5–8
When speech audiometry is carried out, it is not limited to a single environmental condition; rather, it encompasses a variety of scenarios to mimic real-life situations as closely as possible. Testing speech perception in quiet conditions is foundational, serving as a benchmark for an individual's ability to process speech without background noise. 9 This scenario is crucial for determining the most basic level of speech understanding and for identifying any primary difficulties with speech recognition.
However, real-world communication rarely occurs in perfectly quiet environments. Consequently, assessing speech perception in noise becomes essential. This aspect of testing reflects more realistic conditions where background sounds are present, which is a common challenge for individuals with hearing loss.6,10 Understanding how well a person can comprehend speech with noise from the front is particularly telling, as this situation is typical in many day-to-day interactions, such as conversations in busy settings where the speaker and the noise source are in the same direction.6,10,11
Equally important is the evaluation of speech perception with noise coming from the side. This condition tests the ability of individuals to focus on speech while ignoring lateral noise, a skill necessary for effective communication in environments.12,13 Testing in these varied conditions—quiet, noise from the front, and noise from the side—provides a comprehensive picture of an individual's speech understanding capabilities. It offers critical insights into the complex nature of hearing loss and its impact on everyday communication. Through such thorough assessments, audiologists can tailor interventions more precisely, enhancing the individual's ability to engage in meaningful interactions across a broad spectrum of environments.
Despite speech audiometry's acknowledged importance in audiology, its integration into tele-audiology has been limited. An international survey of audiologists revealed that the greatest barriers to the use of teleh-audiology were associated with clinical equipment, such as multiple technologies across manufacturers, which can be used to conduct measurements remotely. 3 In addition, respondents suggested that research was needed to confirm the validity of telehealth services.3,14 In response to these challenges, the current study aims to examine the feasibility of administering speech perception remotely using two widely accessible software tools: video-conferencing and remote-control software. The hypothesis guiding this research posits that there will be no significant differences in the outcomes of speech perception tests whether administered in-person, via video-conferencing, or through remote-control software.
Materials and methods
Participants
A total of 56 participants aged 18 to 50 years (mean age = 26.23, SD = 7.85) were recruited from the Education University of Hong Kong using online advertisements, with 67.9% of them being female (n = 38). Eligibility for participation required native proficiency in Cantonese, with no specific criteria set for hearing status. The homogeneity of the participant group helps to reduce variability potentially stemming from factors not central to the study's objectives, such as differences in technological proficiency, cognitive abilities, and auditory function. The scenarios devised for this research aim to replicate the conditions of speech perception evaluations typically encountered in routine physical examinations and hearing function screenings. Demonstrating the feasibility of conducting speech perception assessments remotely in this study opens the door to future research into their applicability among more diverse populations, especially older adults with hearing impairments. Written consent was obtained from all participants prior to the start of the study.
Measures
The Cantonese Hearing in Noise Test (CHINT) 15 is designed to assess an individual's ability to perceive sentences under various acoustic conditions, emphasizing the adaptive nature of this assessment. Central to the CHINT is the evaluation of how well participants can recognize sentences when presented amidst noise or in quiet environments. This ability is quantified through the measurement of Speech Reception Thresholds (SRTs), which determine the minimal level of presentation in quiet conditions or the required signal-to-noise ratio (S/N) in noisy conditions for accurate recognition of 50% of the sentences.
Incorporating principles from the English Hearing in Noise Test, 16 the CHINT consists of 12 lists, each with 20 sentences. These sentences are used to evaluate speech perception capabilities by masking them with speech spectrum-shaped noise during the test. The assessment is conducted through insert earphones, delivering both the target sentences and masking noise under three distinct listening scenarios: in quiet, with noise coming from the front (NF), and with noise coming from the right side (NS).
Moreover, the CHINT leverages head-related transfer functions (HRTFs) to simulate sound-field listening conditions through insert earphones, enhancing the realism of the test environments. 17 HRTFs describe the complex spatial filtering effects of the outer ear, including the torso, head, and pinnae, on sounds from different directions, thereby providing a comprehensive assessment of a participant's ability to perceive speech in varied acoustic scenarios using insert earphones. 12 The CHINT was conducted employing the three methodologies below.
Conventional method
An experimenter operated the computer and delivered instructions directly to the participants. The CHINT was administered face-to-face, following the guidelines outlined in the CHINT manual.
Using remote-access software
TeamViewer version 15.27.3, a remote-access software, was employed to facilitate the administration of the CHINT from a different location through the internet. The selection of TeamViewer was based on its capabilities for remote control and communication, which are essential for conducting tests without the physical presence of an experimenter.
The process began with the experimenter, situated in a separate room, taking control of the computer located within the sound booth. This remote setup allowed for the CHINT to be administered as if the experimenter was physically present, without actually being in the same room. The advantage of this approach was that the audio was produced directly from the participant's side, ensuring it was delivered without delay and without compression.
Furthermore, the use of TeamViewer's video conferencing feature was instrumental in delivering instructions to participants. This real-time communication ensured that participants fully understood the test procedures and what was expected of them, thereby facilitating a smoother testing process. Additionally, the demonstration of equipment usage, particularly the proper way to use insert earphones, was crucial for the accuracy of the CHINT. The experimenter demonstrated this using an additional pair of insert earphones, effectively showing participants the correct usage through the video conference.
Using video-conference software
In this condition, speech audiometry was administered remotely using ZOOM video conferencing software version 5.9.1. It does not require participants to install CHINT administration software on their computers. Importantly, it also raises fewer concerns regarding privacy and data protection. For instance, the host organization—where the participants are tested—might be wary of allowing computers from other organizations to control their systems, a scenario observed in the study using remote-access software. However, while video conferencing software can streamline communication, it may compromise the audio quality due to the application of audio compression techniques. 18 These techniques involve removing redundant or non-essential information from the audio signals to enhance the transmission efficiency. 18
To closely mimic the audio quality of face-to-face testing conditions, we leveraged ZOOM's advanced audio settings to transmit high-quality stereo audio from the computer being administered. This was accomplished by selecting ‘Share Screen,’ proceeding to ‘Advanced,’ and choosing ‘Computer Audio.’ Such configuration was vital for conducting speech perception in noise tests, allowing for the delivery of sentences and noise across separate channels at varying intensities. Moreover, ZOOM's ‘High-Fidelity Music Mode’ was activated through the settings by navigating to audio—original sound for musicians. This specialized audio setting designed to optimize the platform for music and high-quality audio enhances audio quality via higher bitrate transmission, resulting in clearer, more detailed sound. It also reduces audio compression and disables post-processing to preserve the original sound's quality. Additionally, this mode lowers audio latency, which is crucial for synchronizing live performances or tests by minimizing the delay between sound production and reception. This setup ensures the audio fidelity in speech perception tests is on par with clinical settings.
Similar to the remote-access software administration (RASA) method, this procedure was conducted without facilitators present in the same room as the participants, with all operations managed by an experimenter in a separate room via ZOOM. The experimenter could neither see nor hear the participants without utilizing ZOOM. In contrast to the RASA, sentences and noise from the CHINT were generated on the experimenter's computer, and then transmitted to the participant's computer.
Procedures
Before the administration of the CHINT, participants underwent a pure-tone audiometry test in person. The CHINT was administered using three distinct approaches: conventional (CM), video-conference software (VCSA), and RASA using two ThinkPad E15 computers. To determine if variations in outcomes across different administration techniques (CM, VCSA, and RASA) were due to the use of different CHINT lists, the CHINT was administered twice via the conventional method in 34 randomly selected participants out of the total 56 participants (designated as CM1 and CM2). The testing employed a pair of ER-3A insert earphones, which were calibrated beforehand by utilizing the ARSON-DAVIS SYSTEM 824 audiometer connected to an AEC201-A ear simulator and an AEC 304 occluded ear simulator. Specifically, the calibration involved displaying a calibration sound through the CHINT software and adjusting the computer volume until the audiometer indicated the sound level at around 80 dB SPL. For the duration of the study, all participants used the same set of insert earphones, with ear tips being replaced for each participant to maintain hygiene and consistency.
The study encompassed 12 test scenarios, integrating three listening settings (quiet, NF, and noise NS) with four administration strategies (CM1, CM2, VCSA, and RASA). In each scenario, one out of 12 CHINT lists, each consisting of 20 sentences, was selected at random for every test condition. 17 The equivalency of these 12 lists has been previously established to ensure consistency in test difficulty and content. 17 Moreover, the sequence of these test conditions was randomized using a Latin Square design to ensure a balanced distribution. Prior to employing each administration method, a trial test was conducted to confirm participants’ comprehension of the instructions.
The duration of the testing procedure was approximately 1 hour. The participant was granted a 10-minute intermission roughly halfway through the session. Additional breaks were available upon request to accommodate participant needs. All tests were performed in a certified sound booth by an audiologist (i.e. the experimenter) at the Integrated Centre for Wellbeing (I-WELL) of the Education University of Hong Kong. A Wi-Fi connection characterized by a latency of 5.2 ms, a jitter of 1.6 ms, a download speed of 108 Mbps, and an upload speed of 93 Mbps was established for the tests. These connection metrics were verified through the OFCA Broadband Performance Test (https://speedtest.ofca.gov.hk/).
Statistical analysis
To evaluate differences in SRTs across various administration methods (CM, VCSA, RASA) and listening conditions (quiet, NF, NS), a two-way repeated-measures analysis of variance (ANOVA) was employed. Additionally, to investigate the disparities in SRTs among different administration methods, we calculated the differences in SRTs from the first (CM1) and second (CM2) conventional method testing denoted as ΔSRT (CM1-CM2); the differences between the first conventional method testing and the ZOOM administration is marked as ΔSRT (CM1-VCSA); and the differences between the first conventional method testing and the TeamViewer administration is indicated as ΔSRT (CM1-RASA). Given that the reported list equivalence for the CHINT is within ±1 dB, 17 it was expected that ΔSRT (CM1-CM2), ΔSRT (CM1-VCSA), and ΔSRT (CM1-VCSA) would also fall within this range. This hypothesis would be corroborated by a two-way repeated ANOVA, which should reveal no significant differences among ΔSRT (CM1-CM2), ΔSRT (CM1-VCSA), and ΔSRT (CM1-RASA), confirming the consistency and reliability of the test results across different administration methods. The significance level was set at 5%, and SPSS (v29.0.2) was used to perform all the statistical analyses.
Results
Pure-tone audiometry results revealed that 54 of the 56 participants demonstrated normal hearing, with their hearing thresholds reaching 20 decibels hearing level (dB HL) or lower at frequencies of 500, 1000, 2000, and 4000 Hz in both ears. Meanwhile, two participants showed signs of mild hearing impairment characterized by a decrease in hearing sensitivity to a range between 21 and 40 dB HL at these frequencies in the better-performing ear. The SRTs using different administration methods are shown in Table 1.
Speech reception thresholds (SRTs) across different listening conditions and administration methods [range] (standard deviation).
Repeated ANOVA showed that there was a significant main effect of the listening condition (quiet, NF, NS), F (2, 110) = 3891.74, p < 0.05,
The analysis of SRT differences between various administration methods, as highlighted in Table 2, underscores the consistency in SRTs across conventional, ZOOM, and TeamViewer modalities. This consistency is encapsulated by the mean difference scores across all test conditions, which remained within the ±1 dB range.
17
Such a range aligns with the list equivalence previously established for the CHINT,
17
affirming the reliability of remote audiometry methods compared to traditional in-person assessments. Further confirmation of these findings is provided by the repeated-measures ANOVA, which reveals no significant differences in the ΔSRT (CM1-CM2), ΔSRT (CM1-VCSA), and ΔSRT (CM1-VCSA), F (2, 66) = 0.07, p > 0.05,
Difference in SRTs obtained using different administration methods: mean (standard deviation).
Discussion
The present study investigated the feasibility of remote administration of speech audiometry using two different digital tools and found no significant differences in SRTs across different administration methods. The results of this study closely align with those of prior research comparing pure-tone audiometry thresholds obtained through remote testing with those from conventional face-to-face assessments. For instance, Swanepoel et al. 19 utilized the RASA method to evaluate remote pure-tone audiometric testing and found no clinically significant differences between the remote and conventional face-to-face audiometry outcomes. Similarly, Botasso assessed the efficacy and feasibility of remotely administering pure-tone audiometry screening among elementary school children, confirming the reliability and feasibility of tele-audiometry for hearing screening in this demographic. These findings are corroborated by additional studies, such as Choi's PC-based tele-audiometry 20 and the research of Krumm et al. 21
However, these prior studies focused solely on pure-tone audiometry. Speech audiometry, in contrast, not only requires the ability to hear sounds, but also to understand speech, which may necessitate a higher-quality acoustic environment. Hughes et al. 14 compared tele-speech-audiometry with traditional methods among cochlear implant users and reported significantly poorer speech perception in remote settings. This disparity was attributed to the absence of soundproof booths at the remote locations exacerbated by higher background noise levels and longer reverberation times—factors known to adversely affect speech perception in individuals with hearing loss.22,23
In this study, all the speech audiometry was completed in a sound booth. The audio was obtained from the same computer as in the conventional and RASA methods (i.e. TeamViewer). Therefore, the quality of the audio did not change when speech audiometry was administered using the RASA methods. The variations in SRTs between the conventional method and the RASA method may mainly come from the interactions between participants and administrators (e.g. how the instructions were delivered online) and the use of different CHINT sentences. However, these would not significantly affect the CHINT results, as no significant differences in SRTs were found between these two administration methods.
During video calls in the VCSA, audio data, alongside video, are transmitted over the internet. Initially, this involves converting and compressing the analog audio signal into a digital format using audio codecs. 18 This helps minimize the data size while preserving the quality. Once encoded, the data are packaged and sent to the recipient, where they are decompressed and reverted to their analog form. However, variable network conditions can cause packet delays or losses, potentially impacting audio clarity and leading to distortions. 18 The codec selected is vital as it adjusts the acoustic bandwidth and bitrate in response to network changes, significantly influencing the audio transmission quality. 24
Zoom uses the Opus codec, which adapts to various audio types, from narrow-band speech to full-band stereo music, employing two main techniques: linear prediction (LP) and modified discrete cosine transform (MDCT). 25 The LP technique reduces bitrate and enhances compression effectiveness, while MDCT optimizes for higher frequencies or music by segmenting the audio spectrum into bands that align with the Bark scale, which mimics the human ear's frequency resolution. 25 Opus can dynamically adjust its mode based on network conditions, supporting everything from low bitrate speech to high-quality music transmission.18,25 We found that the audio processing in Zoom, as described above, does not significantly affect Speech Reception Thresholds (SRTs), consistent with the findings of Perepelytsia and Dellwo, 18 who reported that acoustic compression in Zoom audio does not compromise the voice recognition performance.
Additionally, the dynamics of interactions between participants and administrators during testing can influence the SRT results. Specifically, remote administration involves communication through digital platforms, which can alter the rapport and clarity of instructions compared to in-person interactions. 26 For instance, technical issues, such as latency or audio quality, can affect the participant's understanding of test instructions, potentially impacting their performance. 18 Moreover, the absence of physical presence might influence the participant's engagement and concentration during the test. In conventional settings, administrators can adapt their communication strategies based on real-time feedback from participants, such as body language or facial expressions, to ensure comprehension and comfort. This dynamic adjustment is more challenging to replicate with remote administration. 26 Nonetheless, the observed variations in SRTs are within a 1 dB range, aligning with the SRT fluctuations encountered when using different CHINT sentences (i.e. ΔSRT (CM1–CM2)). This indicates that the nuanced dynamics of participant-administrator interaction in remote assessments did not significantly affect SRT outcomes.
Limitations and suggestions
This study demonstrated that there were no significant differences among the three administration methods, indicating that the two non-physical-contact methods are viable for conducting speech audiometry. However, all participants in this study were relatively young and exhibited only normal-to-mild hearing impairment. Older adults and those with more severe hearing impairments might encounter difficulties understanding instructions online due to unfamiliarity with the technology. Nevertheless, the instructions for CHINT administration were straightforward, requiring no participant operation of the software, and a trial test was conducted to ensure comprehension of the instructions. To alleviate potential communication barriers, especially among older adults, future studies could benefit from integrating facilitators or visual aids, such as detailed pictorial instructions.
In addition, insert earphones were used in the current study, which are rarely utilized as an outcome measure for hearing aid benefits. Sound field testing using speakers is more commonly employed when evaluating HA fitting outcomes. Compared to insert earphones, sound field speech audiometry is more easily affected by ambient noise. Therefore, further studies are warranted. It is important to note, however, that HA fitting encompasses much more than just speech audiometry. Speech audiometry has other significant clinical applications: it can better predict real-life communication skills; it can assist in diagnosing conditions, such as auditory neuropathy and central auditory processing disorder; it can determine candidacy for cochlear implants or HAs; and it can monitor treatment progress, such as in cases of sudden deafness.27,28 We hope the results of this study will contribute to these fields.
Furthermore, this study employed Zoom's video-conference function to represent VCSA and TeamViewer's remote-access function to represent RASA. While a variety of similar remote-access software options exist, such as Zoom's own remote-access capabilities, the choice of software appears to have a minimal impact on the results. However, due to potential differences in audio data processing and transmission, such as compression and noise reduction by different video-conferencing platforms, conducting additional validity tests is advised. These tests are essential to verify the efficacy of using alternative video-conferencing software for the administration of the CHINT test.
Implications
Despite its limitations, this study highlights several critical implications. First, the two remote administration software tools used are freely available and user-friendly, requiring no specialized equipment or additional personnel. These attributes are particularly beneficial in situations necessitating social distancing, such as evaluating speech perception among individuals with infectious diseases.
Second, tele-speech-audiometry proves vital for residents in remote or underserved areas, where access to audiological services is often scarce. For instance, in comparison to the large number of individuals with hearing loss, the availability of audiologists is quite restricted in China. 29 Speech audiometry is not routinely provided by some audiological service providers, requiring patients to travel across provinces to receive care.29,30 However, finding a sound booth in hospitals or at hearing aid distributors is typically feasible. Therefore, developing region-specific tele-speech-audiometry protocols could significantly enhance accessibility and reduce the need for long-distance travel for audiological evaluations.
Lastly, contemporary studies typically require multicenter collaboration. Ensuring sufficient participant enrollment and maintaining compliance are crucial for the success of these studies. For example, our ongoing multicenter study investigates the effectiveness of online auditory training in enhancing speech perception in noise for adults with untreated mild hearing loss. Administering the speech perception test remotely has significantly increased the number of participants required and reduced travel demands on researchers.
Footnotes
Acknowledgments
I am grateful to all the participants who took the time to participate in this study.
Contributorship
CY solely undertook the conceptualization of the research topic, collection and analysis of data, and the composition of the written manuscript.
Data availability statement
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by the General Research Fund (GRF 18608423), University Grants Committee, Hong Kong SAR, China, and the Knowledge Transfer Seed Fund for 2023–2024 (DTSF03109), Department of Special Education and Counselling, The Education University of Hong Kong, Hong Kong SAR, China.
Guarantor
CY.
Informed consent from participants
The study has been approved by the Faculty Human Research Ethics Committee (HREC), the Education University of Hong Kong (FHREC:23/24-ER030). Written consent was obtained from participants prior to the study.
