Abstract
Previous research has demonstrated that remote testing of suprathreshold auditory function using distributed technologies can produce results that closely match those obtained in laboratory settings with specialized, calibrated equipment. This work has facilitated the validation of various behavioral measures in remote settings that provide valuable insights into auditory function. In the current study, we sought to address whether a broad battery of auditory assessments could explain variance in self-report of hearing handicap. To address this, we used a portable psychophysics assessment tool along with an online recruitment tool (Prolific) to collect auditory task data from participants with (n = 84) and without (n = 108) self-reported hearing difficulty. Results indicate several measures of auditory processing differentiate participants with and without self-reported hearing difficulty. In addition, we report the factor structure of the test battery to clarify the underlying constructs and the extent to which they individually or jointly inform hearing function. Relationships between measures of auditory processing were found to be largely consistent with a hypothesized construct model that guided task selection. Overall, this study advances our understanding of the relationship between auditory and cognitive processing in those with and without subjective hearing difficulty. More broadly, these results indicate promise that these measures can be used in larger scale research studies in remote settings and have potential to contribute to telehealth approaches to better address people's hearing needs.
Introduction
Although telehealth services and remote testing platforms have been of interest to audiologists and researchers for many years, clinical service limitations and social distancing restrictions imposed during the COVID-19 pandemic led to increased interest in these tools (Almufarrij et al., 2022; Peng et al., 2022). Recent technological advances in consumer grade electronics have also made it possible to use low-cost, personal-owned devices to create and present accurate and controlled audio signals, further advancing the development of new portable and remote auditory testing platforms (Almufarrij et al., 2022; Bright & Pallawela, 2016; Gallun et al., 2018; Irace et al., 2021). Such tools can expand the reach of hearing health services to rural settings with fewer available resources or a lack of options for hearing healthcare, and to patients lacking easy access to traditional clinical services due to limited mobility or other health conditions. Even mild untreated hearing loss is linked with higher rates of unemployment, social isolation, depression, and anxiety, as well as a higher risk of dementia (Kannan et al., 2024; Livingston et al., 2024). Increasing the ease of access to hearing screening and assessment through remote testing platforms may help address this important public health issue through earlier detection of hearing difficulties and provision of care to at-risk patient populations. Even basic research studies may benefit from remote testing platforms with larger and more diverse participant samples improving the generalizability of research findings to the larger population.
While a number of app- and web-based tools for remote auditory testing currently exist, information on the functionality, validity, and reliability of these tools is often not well reported (Almufarrij et al., 2022; Bright & Pallawela, 2016; Irace et al., 2021; Peng et al., 2022). Our lab has been developing and testing a freely available app called PART (Portable Automated Rapid Testing; https://ucrbraingamecenter.github.io/PART_Utilities/) that supports work in this area with the goal of making assessments that have traditionally been confined to laboratory settings more accessible to researchers and clinicians (Gallun, 2020). To date, a variety of audiological and psychophysical assessments have been created in PART, including suprathreshold measures of speech understanding in quiet and in noise, measures of spectral and temporal processing, measures of cognition, and self-report of experience (e.g., surveys). Recent studies that used PART have aimed to address the validity and reliability of these tests in different settings and with different groups of people. For instance, Lelo de Larrea-Mancera et al. (2020) validated a battery of PART assessments across a range of environmental noise conditions and confirmed consistency with previous laboratory tests (Cherri et al., 2024a; Lelo de Larrea-Mancera et al., 2020). Additionally, several studies confirmed that data collected via PART is in many cases reliable even when administered on participant-owned, uncalibrated devices (Lelo de Larrea-Mancera et al., 2022; Rink et al., 2022). Transferring assessments to languages other than English (Lelo de Larrea-Mancera et al., 2023a; 2023b; Padilla-Bustos et al., 2025) and successful tests of PART in clinical populations such as those living with Parkinson's disease, mild cognitive impairment, or dementia (Lelo de Larrea-Mancera et al., 2024a, 2025) further underscore the capabilities of this platform to investigate the needs of different participant samples. Recently, work has focused on making PART measurements more efficient through methods that collect more data in a shorter amount of time (Lelo de Larrea-Mancera et al., 2023c, 2024b). This research demonstrates functionality and usability of the PART testing platform and indicates a potential opportunity to address the needs of research participants and clinic patients.
While PART allows for accessible and efficient remote auditory testing, to date research with the PART application has mostly addressed participants with normal or near-normal hearing thresholds. The primary aim of the current study was to establish the validity of a remote testing procedure using PART, identifying measures that can be useful to discriminate between individuals with and without self-reported hearing difficulties. This aim is specific to further establishing PART as a clinically relevant research tool. As a secondary aim, we sought to better understand the factor structure of the testing battery and the extent to which it could inform specific constructs that could individually or in combination help us understand aspects of self-reported hearing difficulties. This aim addresses an unresolved complexity in auditory clinical science (not specific to PART) that concerns the inadequacy of any one measure to fully establish whether someone will experience hearing difficulty (Humes, 2021; Humes et al., 2012). Overall, results of this work speak directly to the feasibility of conducting large-scale studies remotely through an online platform, with a structured hierarchy of auditory processing measures, accommodating individuals with varying ages and hearing sensitivities.
Methods
Participants
One hundred and ninety-two (192) participants were recruited through the online platform Prolific (www.prolific.com). They all provided signed informed consent as approved by the University of California, Riverside Human Research Review Board. Participants were preselected by Prolific to balance the sample based on self-report of hearing status following the prompt: “Do you have any hearing loss or hearing difficulties?”. The accuracy of this preselection question was cross-checked by an additional question specific to the current study: “Do you have hearing loss, reduced hearing, or hearing difficulties (such as difficulty understanding when multiple people are talking or when there is background noise)?”. The Kendall correlation between these two dichotomous variables was of τ = .58 which indicated the responses to these two single questions were not the same. To achieve a more valid approach to classify the sample into hearing difficulty (HD) and no difficulty (ND) groups, final group assignment was based on a cutoff score on the Hearing Handicap Inventory for the Elderly–Screening Version (HHIE-S) of ≥8 as suggested by ASHA guidelines (1989). A comparison of the final group assignment to the Prolific preselection and to another single question assessment of hearing status collected for this study is provided in the Supplemental materials (Table S1). Demographic information divided by final group assignment is presented in Table 1. All participants provided informed consent and received $12 an hour for their participation.
Demographic Information Split by Hearing Difficulty Group.
HD=hearing difficulty; HHIE-S=Hearing Handicap Inventory for the Elderly–Screening Version; ND=no difficulty.
Construct Framework
We used a collection of measures thought to be sensitive to different stages along the hierarchy of auditory processing (see Figure 1). The selection of measures was based upon previous models in cognitive and auditory science (Bronkhorst, 2015; Cowan, 1999; Stecker & Gallun, 2012; Vetter et al., 2024). While this framework is not intended as a comprehensive view of all human auditory activity, it is meant to elucidate potential constructs that can inform different dimensions of hearing (Cherri et al., 2024b; Lelo de Larrea-Mancera et al., 2020, 2022). We place at the base of this hierarchy the basic ability to detect sound using a pure tone detection task. Building upon this, measures of central auditory processing are included, such as temporal fine structure (TFS; guides sound spatialization), or spectrotemporal modulation (STM; guides aspects important for speech intelligibility) (Gallun & Best, 2020; Humes et al., 2012; Stecker & Gallun, 2012). At a higher level we consider tasks that involve speech such as the speech reception threshold (SRT) and the ability to discriminate speech in competition (SiC) with other sources of sound. While all auditory tasks require attentional control and working memory to guide listeners behavior through the task instructions, the speech related tasks are known to rely significantly on these cognitive constructs (Cowan, 1999; Gallun & Jakien, 2019). Our primary hypothesis is that all of these tests can contribute to account for self-reported hearing difficulties. We place self-reported hearing at the top of the hierarchy as it involves a meta-cognitive subjective reflection or an integrated narrative episodic recollection guided by attention and working memory of past perceptual performance. In the following section, we provide more detail on each task utilized to understand these different constructs.

Construct Framework Organizing Instrument Selection and Grouping of Assessments Into Constructs. The Color Indicates Graded Relationships Between Constructs Rather Than Strict Independence. SRT=Speech Reception Threshold; STM=Spectrotemporal Modulation; TFS=Temporal Fine Structure; WM=Working Memory.
Instruments
The measures employed are detailed below. In all cases, the testing was performed on personally owned devices, following the methods of Lelo de Larrea-Mancera et al. (2022), who demonstrated high test–retest reliability as well as good correspondence between results with calibrated equipment in the laboratory and with personally owned equipment at-home. Because of the lack of precise level measurement in remote conditions with uncalibrated equipment, sound levels are reported as nominal dB throughout (as reported by PART).
Self-Reported Hearing Handicap
Self-reported hearing difficulty was evaluated using the HHIE-S, developed by Newman et al. (1990). This inventory includes 10 questions addressing the social and emotional consequences of hearing-related challenges, with participants responding on a three-point scale: “no” (0 points), “sometimes” (2 points), or “yes” (4 points). Total HHIE-S scores range from 0 to 40, where higher scores indicate a greater self-perceived hearing handicap. According to ASHA guidelines (1989), a cutoff score of 8 or above suggests the presence of hearing handicap.
Adaptive Tracking
The adaptive scan (AS) algorithm that combines progressive and adaptive methods of psychophysical testing (Lelo de Larrea-Mancera et al., 2023c, 2024b) was used to vary the magnitude of an adaptive parameter (e.g., dB) specified for each of the auditory processing tasks described below except pure tone detection and working memory tasks. Every task included two consecutive runs of AS with three scans of nine steps each. Each scan ended after the ninth trial or after three errors. Details about the parameters used in the first scan are provided below for each task. Placement of the second and third scans was dependent (adaptive) on participant performance and the cumulative estimate of threshold per scan. Threshold was calculated with a simple heuristic
Sound Detection
Pure-tones were presented to the left and the right ear separately at 1000 Hz, and diotically at frequencies of 500, 2000, and 4000 Hz to estimate detection thresholds, representing an abbreviated digital version of the standard pure-tone audiogram. This measure is intended to provide information about peripheral hearing, not to provide diagnostic measures for hearing loss as the clinical audiogram would. This is due to testing standards (ANSI, 2004) not being easily achieved with uncalibrated participant-owned devices, which can lead to increased variance overall with the greatest difficulty being the high risk of inaccurately elevated thresholds due to external factors such as environmental noise (see Almufarrij et al., 2022).
Participants were presented with a series of three-to-four 100 ms tones initially presented at 70 dB. The question “did you hear the tones” was displayed on screen along with a green button with a check-mark that indicated “yes” and a red button with an “X” on it that indicated “no.” Following three consecutive “yes” responses, the tone level was reduced in steps: first by 20 dB, then by 10 dB down to 10 dB, and finally by 5 dB, using an adapted Hughson-Westlake algorithm (Carhart & Jerger, 1959). If three consecutive “no” responses occurred or 0 dB was reached, the last correctly detected level was recorded as the threshold. Each frequency-specific section of the test lasted about 2 min, totaling approximately 10 min. The primary outcomes were the estimated thresholds at each frequency (average of left and right 1000 Hz). The peripheral hearing construct-level outcome was taken the pure-tone average (PTA) detection threshold across all four frequencies.
Temporal Fine Structure
Sensitivity to the fine structure of sound was assessed with sinusoidal tones set up in three different tasks: (1) the detection of a silent gap placed between two tone bursts; (2) the detection of frequency modulations (FM) in two conditions described below. These assessments employed a 4-interval 2-cue 2-alternative forced-choice (2AFC henceforth) paradigm where the first and last intervals are presented as cues that have the standard stimuli without the target. It took participants around 12 min to complete both AS runs on all three TFS assessments (described in detail below).
The Gap detection task adapted from Gallun et al. (2014) and Hoover et al. (2015, 2019) was used to measure auditory temporal sensitivity. While both TFS and temporal envelope cues are involved in this task, here it is considered as part of the TFS construct. Each interval contained two diotically presented tone bursts (2 kHz) at 80 dB, and target intervals contained a silent gap placed between the two tone-bursts. We adjusted the duration of the gap using AS (described above). The first scan started at 128 ms and halved its value on an exponential scale until reaching .5 ms on the last step of the first scan unless interrupted.
FM detection tasks (2) adapted from Grose and Mamo (2010, 2012) were used to measure phase-related temporal sensitivity. Target stimuli involved detecting a sinusoidal FM at 2 Hz. Every interval presented pure-tones with center frequencies randomized between 460 and 550 Hz presented at 75 dB. Target intervals contained FM that was either presented diotically to produce a monaural cue (monaural frequency modulation, MFM) or presented dichotically with an antiphasic relationship between the ears to produce a binaural cue (binaural frequency modulation, BFM). The range of the 2 Hz FM was adapted using AS. The MFM started at 256 Hz and halved its value on an exponential scale until reaching 1 Hz unless interrupted. The BFM task started at 64 Hz and halved its value on an exponential scale until reaching .25 Hz unless interrupted.
Spectrotemporal Modulation Sensitivity
Spectral, temporal, and STM detection tasks (3) were used to measure sensitivity to spectral, temporal, and spectrotemporal sinusoidal amplitude modulations in broad-band noise through three different tasks that also employed a 2AFC paradigm: (1) the detection of temporal modulation (TM) at 4 Hz; (2) the detection of spectral modulation (SM) at 2 cycles per octave; and (3) the detection of the combined STM at 4 Hz and 2 cycles per octave (see similar stimuli in Bernstein et al., 2013; Lelo de Larrea-Mancera et al., 2020; Sabin et al., 2012). The up/down direction of the modulation was generated randomly for each interval. Stimuli were presented at 65 dB for 500 ms. The adaptive parameter was modulation depth measured mid-to-peak on a logarithmic amplitude scale (dB) as described in Stavropoulos et al. (2021) starting at 6 dB and decreasing .7 dB until reaching .4 dB on the last step of the first scan unless interrupted. These assessments took around 12 min to complete including both AS runs on all three STM assessments (described in detail below).
Speech Reception Threshold
The Coordinate Response Measure corpus (Bolia et al., 2000) was used to calculate SRT. Participants listened to sentences spoken by a single speaker following the structure: “target code name, go to color, number, now.” (e.g., “Ready Charly, go to red one now”). Participants were instructed to identify the color-number combination they heard on a 4 × 8 grid displayed on the screen with four possible colors (white, green, blue, and red) and eight possible numbers (1–8). The signal level adapted starting at 65 dB and progressed in steps of 5 to 25 dB in the first scan unless interrupted.
Speech in Competition
The Spatial release from masking task was used to evaluate the ability to understand speech under competition with highly similar speech maskers (Gallun et al., 2013; Lelo de Larrea-Mancera et al., 2020, 2022; Marrone et al., 2008) and could be considered a measure of auditory selective attention. Masker talkers spoke code-names other than Charly which remained exclusive for target sentences. The target sentence was presented at 65 dB in all trials. Masker sentences were presented either colocated in simulated space or separated by 45 degrees to each side. The maskers in the colocated condition started at 13 dB target-to-masker ratio (TMR) and decreased 2 dB on every step until reaching −5 dB TMR in the first scan unless interrupted. The maskers in the separated condition started at 8 dB TMR and decreased 2 dB on every step until reaching −10 dB TMR unless interrupted. It took about 6 min to complete both runs of both conditions.
The Dichotic sentence identification test, adapted from Fifer et al. (1983), was used to assess the ability to understand speech in dichotic conditions where two speakers need to be attended simultaneously, and could be considered a measure of auditory divided attention. In each trial, two sentences were presented simultaneously at 50 dB, with a different sentence presented to each ear. Sentence pairs were drawn pseudorandomly from a closed set of six seven-word, low-predictability sentences from the Synthetic Sentence Identification test (Speaks & Jerger, 1965), ensuring no repetition within pairs. Participants selected the two sentences they heard by choosing from six written sentence options on a response list. Five trials were presented, and if participants did not achieve 100% accuracy, five additional trials were given. The primary outcome was the percentage of correct responses, with the test taking about 2 min to complete.
Working Memory Capacity and Attentional Control
The auditory visual divided attention task (AVDAT) was used to measure auditory and visual attentional control and working memory capacity. This task has been previously shown to account for significant variance in auditory tests (Gallun & Jakien, 2019). Participants started with a cued visual (V) span task where they were briefly presented with an image of an eye followed by a visual sequence of letters and were asked to recall the letters. Then, participants completed a cued auditory span (A) task where they were briefly presented with an image of an ear followed by an auditory sequence of numbers. In the final block, both auditory and visual stimuli were presented simultaneously, and participants were either cued in advance (Cued_AV) or at the end of the sequence (Divided_AV) whether to respond to the auditory or visual stream. Each section (A, V, Cued_AV, Divided_AV) started at a sequence length of 3 stimuli and advanced to sequence length of 5 stimuli irrespective of performance. Then sequence length adapted to a maximum of 9 stimuli depending on each participant's performance. Each sequence length contained three trials of each type. A span score was calculated for each condition which was defined as the last sequence length at which there was at least one correct trial. Modality-specific attentional control scores were calculated from the span scores of the first two blocks (single-sensory conditions) minus the span scores obtained in cued conditions of the third block. Higher scores indicate dual streams of sensory information worsen the capacity to store information due to the increased attentional control required to ignore one stream of information. Modality-specific WM capacity scores were taken from the average span score of either visual or auditory target trials in the third block (cued and uncued).
Composite Scores
The primary outcomes at the construct-level of TFS, STM, SRT, SiC, attentional control, and WM capacity were computed by normalizing each of their constituting measures to a z-score (considering the entire sample including both HD and ND groups) and averaging across tasks for each individual. The pure tone tests (four frequencies) were first averaged into a pure tone average and then converted to a z-score (PTA). Visual and auditory scores were averaged together for each of the attentional control and WM capacity constructs.
Procedure
Participants were recruited through Prolific, which provided a pseudorandom preselection of participants into balanced groups of self-reported HD and ND including balanced demographics of sex, age, and education. After initial contact through email, participants were assigned dates for three testing sessions to be supervised through Zoom by a research assistant. In the first session informed consent and demographic forms were completed. Testing proceeded in a between-participant counterbalanced manner across four possible orders of tasks. Participants completed all tests using their own equipment (i.e., headphones and mobile device or tablet) across three remote sessions within 14 days of each other. Participants were instructed to complete their sessions in a quiet area with no distractions, to turn the volume of their devices to the maximum value, and to ensure proper headphone placement. Data was uploaded to Amazon Web Services HIPAA compliant servers. Data were collected as part of a larger study that also included: (1) usability questionnaires about each task; (2) an additional run of each task with alternative stimulus delivery algorithms (up-down staircases and progressive tracking) aimed to further validate the adaptive scan algorithm used here; and (3) Letter + Number WM task to evaluate concurrent validity for the AVDAT task. Evaluation of these additional data is beyond the scope of this study and will be reported elsewhere.
Statistical Analysis
Descriptive statistics for each individual measure (Table 2) are presented and discussed in reference to consistency (mean alignment) with previous studies. Further, to address the primary aim we evaluated the ability of each construct to discriminate between the self-reported HD and ND groups using robust multiple regression models that tested for the effect of participant group (HD vs. ND) and age on each construct (Table 3). We used a robust regression approach with a Huber weighting function (Huber, 1981) to account for widespread violations of the normality assumption and presence of outlying values in the variables under study. This method downweighs the influence of extreme values without reducing them to zero. We deem this adequate as extreme values may be testing failures or real indices of poor perceptual performance. Significance on this family-wise set of eight comparisons was controlled for false discovery rate (Benjamini & Hochberg, 1995) as these analyses aimed to confirm the utility of each construct to discriminate self-reported HD. Similar analyses for each individual measure are also provided in the supplemental materials without correction to facilitate exploration and avoid inflating type II error (Supplemental Table S2).
Descriptive Statistics for All Measures Collected for This Study.
Att=attention; Aud=auditory; BFM=binaural frequency modulation; Col=colocated; DSI=dichotic sentence identification; HHIE-S=Hearing Handicap Inventory for the Elderly–Screening Version; kHz=kilo-Hertz; Max=maximum; MFM=monaural frequency modulation; Min=minimum; ms=milliseconds; n=sample size; PT=pure tone; PTL=pure tone left (monaural); PTR=pure tone right (monaural); Sep=separated; SRT=speech reception threshold; SM=spectral modulation; STM=spectrotemporal modulation; TM=temporal modulation; TMR=target-to-masker ratio; Vis=visual; WM=working memory.
Robust Beta Coefficients for HD and the Covariate of Age on Each Multiple Regression Conducted for Each Construct (z-Scores).
Significant effects are indicated in bold and indicate group differences or age effects.
Att=attentional control; HHIE-S=Hearing Handicap Inventory for the Elderly–Screening Version; PTA=pure-tone average; SiC=speech in competition; SRT=speech reception threshold; STM=spectrotemporal modulation; TFS=temporal fine structure; WM=working memory capacity.
To address the secondary aim of understanding the test battery at the construct level, we examined the bivariate relationship between measures (Table 4) (raw correlations for each individual measure appear in Table S3 of the Supplemental materials). False discovery rate was also considered for this family of 36 correlations. Further, a principal component analysis (PCA) was conducted to evaluate the consistency of our theory-driven dimensionality reduction into constructs with a data-driven approach. All individual measures described above were included in the PCA. The number of components was selected based on a criterion of eigenvalue > 1. To increase interpretability an oblique promax rotation method based on the correlation matrix was employed to allow components to be correlated. This analysis allowed for an evaluation of whether the rotated components and their variable loadings (Table 5) were similar to the construct framework originally proposed (Figure 1).
Spearman Correlations (r and in Parenthesis Below, p) Between Construct Level Variables.
Statistical significance is indicated in bold considering a correction for false-discovery rate (corrected alpha < .023).
Att=attentional control; HHIE-S=Hearing Handicap Inventory for the Elderly–Screening Version; PTA=pure-tone average; SiC=speech in competition; SRT=speech reception threshold; STM=spectrotemporal modulation; TFS=temporal fine structure; WM=working memory capacity.
Rotated Components (Oblique Promax) Obtained From the Individual Measures Under Study With Eigenvalues > 1.
Att=attention; Aud=auditory; BFM=binaural frequency modulation; Col=colocated; DSI=dichotic sentence identification; HHIE-S=Hearing Handicap Inventory for the Elderly–Screening Version; MFM=monaural frequency modulation; PT=pure tone; Sep=separated; SRT=speech reception threshold; SM=spectral modulation; STM=spectrotemporal modulation; TM=temporal modulation; Vis=visual; WM=working memory.
Finally, to address whether the principled characterization of the test battery at the construct level could discriminate HD, logistic regression was used including participant age and performance on each construct to predict participant group (HD vs. ND). The accuracy, sensitivity, and specificity of the model to discriminate HD as well as the area under the curve (AUC) of a receiver operating characteristic (ROC) curve is also reported. Accuracy, sensitivity, specificity and AUC for the individual constructs is provided in Table S4 of the Supplemental materials.
Results
Auditory processing measures were largely overlapping with the reports of Lelo de Larrea-Mancera et al. (2022), who collected data across both laboratory and remote conditions (see Table 2). Importantly, the number of outlying cases (> 2 SD) was under 8.5% for every test, which is similar to the reports in Lelo de Larrea-Mancera et al. (2020) for young people without hearing difficulties in laboratory conditions. However, there were a few participants whose pure tone detection and SRT scores were worse than 45 dB (20 participants), raising concern about the audibility of the test stimuli for those participants. Figure S1 in the Supplemental materials shows the distribution of scores for each measure highlighting performance of people with possible audibility issues. The scores of these participants in other measures, importantly on the SiC assessments, is evenly spread across performance levels indicating some of these participants were able to perform well in other auditory and cognitive tasks.
Bivariate Discriminability of Self-Reported Hearing Difficulty
Here the ability of each construct to discriminate between self-reported HD and ND groups is illustrated in Figure 2. Because participant age was distributed unevenly across the groups (mean difference of 6.69 years), and because it is widely known to impact peripheral and central audition (Humes et al., 2012), the effect of age was controlled for in group comparisons. Table 3 shows a summary of the robust beta coefficients for HD (average group differences) and the covariate of age (change in the construct for every SD change in age) on each construct. All lower-level auditory composite measures (PTA, TFS, and STM) were able to discriminate between groups with small-to-medium size decrements for the HD group, significant even after a correction for false-discovery rate (corrected alpha = .0051). In contrast, measures of SRT, SiC, attentional control, and WM capacity failed to discriminate between groups after correction for multiple comparisons, although SRT and attentional control showed a trending effect. Analyses were conducted on standardized scores in every construct and the age variable. All constructs except for TFS and WM capacity showed a significant effect of age. Group comparisons for all individual measures can be found in the Supplemental materials (Table S2).

Construct Level Scores for Both the Self-Reported HD and the ND Groups. The First Panel Indicates the Cutoff Value for Group Assignment According to HHIE-S Scores With a Dotted Line. Robust Beta Coefficients of the z-Scores are Provided for Each Construct. The False-Discovery Rate Corrected Alpha Level is 0.0051. HD= Hearing Difficulty; HHIE-S= Hearing Handicap Inventory for the Elderly–Screening Version; ND= No Difficulty.
Bivariate and Multivariate Relationships Between Measures
Bivariate Relations at the Construct Level
To gain a first intuition about the multivariate structure of the collected measures, Spearman correlations between constructs were conducted and are shown in Table 4. Significant correlations were corrected for false-discovery rate (corrected alpha < .023). Auditory performance constructs correlated moderately with one another from sound detection to SiC (r = 0.29–0.708) such that better scores in one construct relate to better scores in the others. The cognitive measures of attentional control and WM capacity were also correlated (r = −0.32). Of the auditory constructs, only SiC correlated with cognitive measures of WM capacity and attentional control (r = 0.26 and −0.35, respectively). The correlations to WM are negative due to the scoring metrics, as better WM capacity is represented by a higher value while better performance on the rest of the measures is represented by lower values. Results showed that all the observed statistically significant correlations followed the expected direction. The relationship between SiC and other auditory tasks was between r = 0.39 and 0.47, as well as an r = .202 for the relationship to self-report (HHIE-S). The HHIE-S scores were correlated to all auditory (r = 0.202–0.37) but not cognitive tasks. STM showed the highest correlations to other auditory constructs (r = 0.37–0.702) and TFS the lowest (r = 0.21–0.42). The age of the participants was also correlated significantly with most constructs (r = 0.11–0.68) except for TFS and WM capacity. The Spearman correlations between all individual measures are presented in Table S3 of the Supplemental materials.
Multivariate Relations and Construct Framework Alignment
A PCA using all individual measures was used to further understand the multivariate relationships between measures and provide a data-driven alternative to theory-driven composite scores. Table 5 shows the rotated component loadings > .5 on each of the four resulting components with eigenvalues > 1. Component 1 consisted of the higher order central auditory processing (CAP) tasks of STM, speech reception with and without competition, and auditory attention. The rest of the (lower-order) CAP tasks that constitute temporal fine structure sensitivity loaded into component 4. Component 2 was constituted by pure-tone detection tasks, and component 3 by cognitive measures of auditory and visual attention and WM. The self-reported measure (HHIE-S) did not load > .5 into any of the selected components (RC1 = .44; RC2 = .26; RC3 = −.01; RC4 = −.16), which suggests that it is not correlated with the other variables included and does not fit well into the other constructs under study. In sum, this unsupervised method of dimensionality reduction was able to account for .62 of the total shared variance in the dataset (RC1 = .239; RC2 = .186; RC3 = .102; RC4 = .102). This data-driven approach grouped measures in a way that largely confirmed the construct framework under study (see Figure 1).
Multivariate Discriminability of Self-Reported Hearing Difficulty
Table 6 summarizes the beta coefficients and associated p-values from the multiple logistic regression model designed to assess the ability of the full battery of tests to discriminate HD. The model's ability to discriminate HD in terms of accuracy, sensitivity, specificity, and ROC AUC was also evaluated. Using a 0.5 threshold on the individual predicted probabilities from the model for binary classification of HD, the model was able to correctly identify 87 participants as ND and 36 as HD (accuracy = 0.64), while also missing 48 cases of HD (sensitivity = 0.42) and falsely identifying 21 cases of ND as HD (specificity = 0.805). Overall, the model showed a ROC AUC = 0.71 and explained a small proportion of HD variance (adjusted pseudo-R2 = 0.11). In sum, the multivariate approach does no better than the individual constructs in predicting HD classification. Accuracy, sensitivity, specificity and ROC AUC for each construct is reported in the Supplemental materials (Table S4).
Beta Coefficients for a Multivariate Logistic Model Predicting HD From Each Construct Under Study and the Covariate of Age (z-Scores).
Att=attentional control; Coeff=coefficient; HHIE-S=Hearing Handicap Inventory for the Elderly–Screening Version; PTA=pure-tone average; SE=standard error; SiC=speech in competition; SRT=speech reception threshold; STM=spectrotemporal modulation; TFS=temporal fine structure; VIF=variance inflation factor; WM=working memory capacity.
We also assessed multicollinearity in the predictors by computing variance inflation factors (VIFs) from the diagonal of the inverse correlation matrix, as described by Belsley et al. (1980). All VIFs were below 2.58, values below commonly used threholds of 5 or 10 for problematic multicollinearity (for discussion see Vatcheva et al., 2016). Specific values are included in Table 6.
Discussion
In this study data from a principled set of auditory and cognitive processing measures was collected from an online pool of participants (Prolific) in which half self-identified as having difficulty hearing (HD). The auditory measures used in previous studies on PART (Gap, MFM, BFM, TM, SM, STM, Col, Sep, and spatial release) were within one SD of previous reports at the group level. This indicated that data quality in online testing settings using PART including those that self-report HD is similar to that from laboratory settings. Additionally, measures that were able to discriminate between those that self-report HD and those that do not (ND) at the group level were identified. This included a pure-tone detection average, temporal fine structure, and STM sensitivity tasks. Contrary to our expectations, the speech-based measures such as the SRT and the SiC construct did not show group differences. Within the construct, only the separated condition of the speech-on-speech masking task (Supplemental Table S2) showed significant group differences that would survive a correction for false discovery rate (also see auditory WM capacity). However, even when significant group differences in these tests indicate potential utility to discriminate HD, the small differences observed also indicate that the distributions of these scores are largely overlapping and may lead to poor discrimination.
Although the cognitive measures of attentional control and working memory capacity did not discriminate between groups at a statistically significant level, their usefulness in accounting for variance in other constructs and measures in the correlation and PCA analyses is notable. In particular, the measure of auditory WM capacity showed significant group differences that would survive a correction for false discovery rate. Interestingly, cognitive constructs only correlated with the SiC construct while lower order auditory constructs only correlated amongst themselves, including SiC. In this regard SiC presents an interesting construct with participation of all the levels of processing, yet it was not useful to discriminate HD in this sample (but see Lelo de Larrea-Mancera et al., 2023b). It is possible that the relationship between what is considered perceptual and cognitive is highly dependent on task selection, sample characteristics, and grouping criteria (for discussion and review see Vetter et al., 2024).
In this study, we selected a hierarchy of auditory and cognitive processing measures following the structure of a construct framework presented in Figure 1. This framework was implicitly present in previous work (Gallun et al., 2022; Lelo de Larrea-Mancera et al., 2020, 2022) but was explicitly under study here. Results indicate that our theory-driven approach for using groupings of measures into constructs is consistent with a data-driven unsupervised dimensionality reduction technique. PCA grouped the individual measures in a very similar manner than the construct framework (Figure 1) in the first four rotated components. Pure tone tasks grouped together in one component, TFS tasks in another, STM and SiC tasks grouped together in another, and cognitive measures grouped together. Auditory attentional control loaded > .5 both in the cognitive and the higher order auditory processing component. These results are taken as a confirmation that the observed data distributes in a way consistent with the construct framework.
Self-reported hearing ability stood-out as the most challenging construct to understand as even the simple unitary questions used to divide groups did not produce identical results (Supplemental Table S1). To establish a consistent and validated criterion, final group assignment was based on the HHIE-S cutoff value of 8 for hearing handicap (ASHA, 1989). Although the whole range of measures of auditory processing was significantly correlated to the HHIE-S scores, the latter was not strongly correlated with the first four rotated PCs. Also, the multivariate model was able to predict only a small portion of the variance of the binary HD classification (r2 = 0.11) an accuracy of 0.64, low sensitivity (0.4) missing more than half the HD cases, and moderate specificity (0.8) mislabeling relatively fewer ND participants as HD. In general terms, this indicates that self-reported measures address a different source of variance that, although complex, remains indispensable to understand an individual hearing profile. In other words, no other measure or combination of measures was able to substitute for self-report.
One aspect of self-report measures that is inherently different from behavioral test results is that self-report requires an individual to reflect on their own abilities and evaluate whether those self-perceptions are positive or negative. The HHIE was initially developed as a tool for assessing “the effects of hearing impairment on the emotional and social adjustment of elderly people” (Ventry & Weinstein, 1982), which was characterized as an introspective dimension of hearing loss that was not inherently reflected in the audiogram. In the case of the simple dichotomous question used here (“Do you have any hearing loss or hearing difficulties?”), responses indicate which participants would describe themselves as a person with hearing difficulties. These self-perceptions were not always reflected in their performance data, which may have contributed to the divergence of self-report measures from the other factors in the multivariate analysis.
In this work we were mainly interested in evaluating the inter-relationships of a hierarchy of auditory and cognitive performance tests as they related to self-reported classification of HD. We did not intend to introduce a battery of tests that would substitute the HHIE, and given the low accuracy attained with the various models for classification of HD, we are not at a stage to refine a discriminatory model and its capacities including cross-validation to avoid overestimates. Overall, there seems to be little relation between the self-report of HD and the performance measures under study here. Future work will be needed to close the gap between “objective measures” of hearing and cognition and self-report of hearing function. One of the main insights of this study is that self-reported and performance measures might be explaining quite different aspects of a person's ability to hear.
Study Limitations and Future Directions
PART has been proposed as an instrumental development to address the increasing needs for prevention and care in the aging adult populations around the world that may face increased risk of sensory and cognitive decline (see Kannan et al., 2024). Indeed, in this dataset we see significant effects of age on almost all measures under study (except TFS and WM capacity). However, it is important to note areas that must be developed in future studies. For example, despite growing evidence that perceptual and cognitive assessment is the first step to achieve prevention and care (see Humes et al., 2012), PART still lacks large-scale normative datasets that would promote a link to treatment in terms of providing care options or professional advice. To address this, a few steps need to be taken.
As a first step, there is a question of the extent to which the auditory measures in PART provide either convergent or complementary information that informs diagnostics and treatments. While other research groups have made progress in this direction by connecting remote assessment with standard care (see Potgieter et al., 2016; Swanepoel et al., 2019; Wasmann et al., 2022; Zou et al., 2024) further research with PART will be required to create larger scale normative datasets in clinical populations that address both detection and monitoring of conditions. Further, it will be important to understand the extent to which these tests help inform management strategies for auditory and cognitive processing difficulties. To this end, there is growing research indicating that interventions with digital video games may provide benefits in understanding speech-in-competition (Lelo de Larrea-Mancera et al., 2021; Whitton et al., 2014, 2017). Connecting assessment with training and care is an essential future direction of this work.
In addition, it will be important to better determine both the settings and supervisory structures required for test systems like PART to provide informative results. While in the current study we did test people remotely, this was done via supervision during a video-call. While this was appropriate to ensure that participants understood task instructions and ensure that the testing environment was appropriate to the study (e.g., not too much ambient noise or other distractions), it does potentially limit the applicability of the current set of results. It will be important for future work to evaluate the limits of acceptable testing situations, as well explore opportunities to use technologies such as ambient sound monitoring or online data analysis approaches that flag results that may be unreliable. Of note, for these types of tests it is generally a possibility that poor performance may be the outcome of confusion in self-administered testing. In these cases, it would be appropriate to do follow up in more controlled scenarios to confirm potential indications of clinically relevant conditions.
In sum, this work advances current understanding of the relationship between auditory and cognitive processing in those with and without subjective hearing difficulties. Overall, the study further validates remote testing using online pools of participants as a feasible general methodology. However, at the same time our data indicates that challenges remain in clarifying self-report of hearing difficulties using psychophysical tests of hearing function. It will be important to consider the extent to which other tests will be needed to explain self-report of hearing difficulty and more broadly what testing instruments will be most valuable to understand different hearing conditions. This may inform different intervention approaches that could be beneficial to people with hearing needs.
Supplemental Material
sj-docx-1-tia-10.1177_23312165251397373 - Supplemental material for At-Home Auditory Assessment Using Portable Automated Rapid Testing (PART) to Understand Self-Reported Hearing Difficulties
Supplemental material, sj-docx-1-tia-10.1177_23312165251397373 for At-Home Auditory Assessment Using Portable Automated Rapid Testing (PART) to Understand Self-Reported Hearing Difficulties by E. Sebastian Lelo de Larrea-Mancera, Tess K. Koerner, William J. Bologna, Sara Momtaz, Katherine N. Menon, Audrey Carrillo, Eric C. Hoover, G. Christopher Stecker, Frederick J. Gallun and Aaron R. Seitz in Trends in Hearing
Footnotes
Acknowledgments
The authors thank Laura Coco and Michelle Molis for their advice in formulating the self-reported hearing difficulty question.
ORCID iDs
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Institute on Deafness and Other Communication Disorders (grant number R01DC015051, R01DC018166-01A1).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
