Characterizing Perception of Impulse Sounds Through Subjective Ratings and Pupillometric Responses

Abstract

Impulse sounds such as clinking dishes or slamming objects are often perceived as particularly intense or uncomfortable, yet their perceptual characterization remains insufficiently understood. The present study systematically examined subjective and physiological responses to ecologically valid impulse sounds in young normal-hearing adults. Twenty-seven participants rated nine impulse sounds presented at peak levels between 80 and 120 dB SPL in three acoustic conditions: anechoic, reverberant (room-convolved), and anechoic embedded in the International Speech Test Signal (ISTS) at 65 dB SPL. Loudness and discomfort were assessed using categorical rating scales, and pupil dilation was recorded as an index of autonomic arousal. Both perceptual scales followed Stevens-type growth functions. Loudness increased gradually with level, whereas discomfort showed a delayed onset, but steeper growth once activated. Test–retest reliability was excellent for both scales (ICC ≈0.88). Acoustic condition significantly influenced perception: reverberant stimuli yielded higher perceived intensity and lower 50% thresholds than anechoic presentations for most impulse types, while embedding impulses in speech produced comparatively small effects. Mean pupil dilation increased with presentation level and was significantly associated with both loudness and discomfort ratings. Linear mixed-effects modeling demonstrated that subjective ratings explained pupil dilation more strongly than physical level once both were included in the model. Neither uncomfortable loudness levels nor self-reported sound sensitivity significantly predicted ratings or physiological responses. These findings provide a systematic characterization of impulse sound perception in a normal-hearing population and demonstrate a close correspondence between subjective intensity judgments and autonomic responses under controlled laboratory conditions.

Keywords

impulse sound perception discomfort and loudness ratings pupillometry across sound levels

Introduction

Among the many acoustic events encountered in everyday life, impulse or transient sounds, such as clinking dishes, slamming doors, or dropping objects, are characterized by a steep temporal envelope and high peak levels (International Organization for Standardization, 1997). Their sudden onset can attract attention and often elicit strong perceptual reactions. While such sounds can convey useful environmental cues, they are also frequently described as unpleasant or startling (Husstedt et al., 2023; Radun et al., 2022). Depending on their intensity and spectral characteristics, impulsive sounds can be perceived as distracting, particularly when they occur unexpectedly or interfere with ongoing speech signals. Such events may evoke discomfort and have been discussed as potentially affecting speech processing under certain conditions, for example, when impulses occur repeatedly or at high intensity (DiGiovanni et al., 2011; Krueger et al., 2017; Picou & Ricketts, 2018).

To improve the listening comfort for such short, high-level acoustic events, modern hearing aids employ impulse noise reduction (INR) algorithms. Typically, these algorithms rely on rapid transient detection in the temporal envelope and apply short-term gain reduction with fast attack and release times to suppress the impulse while attempting to preserve ongoing speech (Keidser et al., 2011; Keshavarzi et al., 2021; Korhonen et al., 2013; Liu et al., 2012). Although their technical efficacy has been demonstrated repeatedly by showing a clear attenuation of the impulse peak in the hearing-aid output (Husstedt et al., 2023; Keidser et al., 2011; Korhonen et al., 2013), the perceptual consequences of INR, such as changes in naturalness, listening comfort, or effort, remain insufficiently explored. Moreover, evidence indicates that individual factors such as hearing loss, cognitive ability, and noise sensitivity can influence preference for noise-reduction settings and directional processing (Neher et al., 2016), yet such relationships have not been examined for impulse noise reduction. Although early work in the 1970s and 1980s looked at perceptual responses to impulsive sounds, mainly in the context of environmental noise metrics such as C-weighted exposure levels (Fairfax, 1987; Schomer, 1978), little recent research has focused on how impulsive sounds should be assessed in a perceptual and psychoacoustic context. Despite some historical efforts, no contemporary, standardized method exists for evaluating individual perception of impulsive sounds. A systematic comparison of perceptual scaling approaches may therefore help to clarify how impulsive sounds are experienced and quantified in controlled settings.

The present work is intended as a first step towards a more systematic characterization of discomfort and loudness. Future research may build on these findings to develop an assessment method and to evaluate its potential relevance for clinical applications. The present study addresses this by exploring both subjective and physiological measures of impulse sound perception in normal-hearing listeners. Specifically, subjective ratings and pupillometric responses were recorded for a set of ecologically valid impulse sounds originally presented and characterized by Husstedt et al. (2023). For the subjective ratings two scales were used, discomfort which was rated on a nine-step scale ranging from “Not Uncomfortable” to “Extremely Uncomfortable” (Husstedt et al., 2023), and loudness which was rated on an 11-step scale ranging from “Can’t Hear” to “Too Loud” (Brand & Hohmann, 2002; International Organization for Standardization, 2003). In addition, participants’ self-reported sound sensitivity was assessed using the Sound Sensitivity Symptoms Questionnaire Version 2.0 (SSSQ2) (Aazh & Kula, 2024).

Pupillometry has become an established method in hearing science to quantify cognitive processing and listening effort beyond traditional behavioral measures (Naylor et al., 2018; Winn et al., 2018; Zekveld et al., 2018). The task-evoked pupil dilation response reflects activity of the locus coeruleus–noradrenergic system and is commonly interpreted as an index of autonomic arousal and attentional allocation. Importantly, pupil dilation does not measure loudness or discomfort per se, but rather the overall salience and processing demands elicited by a stimulus. Previous studies have demonstrated close associations between pupil dilation, perceived loudness, and subjective salience of sounds (Kemper et al., 2025; Liao et al., 2016), suggesting that physiological responses may track perceptual intensity even if they arise from partially distinct neural mechanisms. In the present study, pupillometry was employed as an additional, objective measure to complement subjective ratings of impulse sounds. This approach allows evaluating pupillometry as a tool for identifying individual sensitivity to impulsive acoustic events. It also makes it possible to examine whether it is possible to correlate a physiological reaction with the perceived discomfort of impulse sounds. Although other physiological measures, such as skin conductance, heart rate, or respiration, may provide additional information and should be considered in future work, the present study focuses exclusively on pupillometry as an initial physiological marker. Thus, the present work should be understood as a feasibility-oriented first step in normal-hearing listeners, providing a basis for future studies that may work toward an assessment approach.

Methods

Study Design and Experimental Procedure

All measurements were completed during a single session of ∼2.5 hr. The session comprised information about the study and the signing of a consent form, otoscopy; pure-tone air-conduction audiometry (warble tones at 0.25, 0.5, 0.75, 1, 2, 4, and 8 kHz); determination of the uncomfortable loudness level (UCL) at 0.5, 1, 2, and 4 kHz; completion of a medical history form; the Sound Sensitivity Symptoms Questionnaire 2.0 (SSSQ2); and subjective rating tasks, during which pupillometric responses were recorded as well. UCLs were measured using pure tones that were gradually increased in level; participants were instructed to indicate immediately when the sound became uncomfortably loud, at which point the presentation was stopped, and the corresponding level was recorded as the UCL. For stimulus presentation and rating, participants were seated in a listening room and exposed to the acoustic stimuli through loudspeakers across four experimental runs. Each run contained 27 stimuli, comprising nine impulse sounds presented in three acoustic conditions.

Participants rated these 27 stimuli on one of two scales per run: a discomfort scale and a loudness scale. Participants with odd ID numbers started with the discomfort scale in runs 1 and 2, while those with even IDs started with the loudness scale. In runs 3 and 4, the other scale was used, resulting in one test–retest pair for each scale. This approach was chosen because both scales were similarly structured; administering them in separate consecutive blocks reduced the risk of confusion between loudness and discomfort and allowed for a clear, scale-specific instruction before each pair of runs. The presentation order was randomized in advance using a Latin-square design (Jacobson & Matthews, 1996; Sagastume, 2014) to balance out sequence effects. The stimuli were presented at 80 dB SPL peak, defined as the absolute sample-based peak of the waveform, and then increased in 10 dB steps until the highest category of the respective rating scale was reached. In cases where the maximum category was selected before the nominal maximum level, the presentation level was subsequently decreased by 5 dB. After completing one series, the next stimulus presentation began again at 80 dB SPL. To reduce predictability, each trial began with a randomized right-skewed prestimulus pause, following the same probability function used in previous audiological pupillometry studies (Husstedt et al., 2025; Kemper et al., 2025). An overview of the trial timing and pupil response structure for all three acoustic conditions is shown in Figure 1. Pause durations t were drawn between 2.9 and 4 s (mean = 3.281, −0.348/ + 0.593; 1st–99th percentile spread). Following each stimulus, a fixed 2-second poststimulus pause was inserted, after which the rating interface appeared. For the Anechoic with Speech condition, the trial timeline differs because the duration of the speech segment was added to the overall trial length, the randomized pause was identical in principle, but extended by the fixed speech duration (2 s), and the postimpulse interval before the rating screen was correspondingly longer (1 s). The next trial began only after the participant had completed the rating (self-paced). A 10-minute break separated runs. Pupil diameter was recorded continuously throughout the entire experiment.

Figure 1.

Example trial structure for the three acoustic conditions (top to bottom: “anechoic,” “reverberant,” and “anechoic with speech”). Each trial began with a randomized right-skewed prestimulus pause (2.9–4.0 s), followed by the baseline window (−0.25 to 0 s) used to compute pupil change immediately preceding the stimulus. A fixed 2-second poststimulus pause was inserted before the rating screen. In the anechoic with speech condition, the speech segment extended the trial duration. MPD denotes mean pupil dilation relative to baseline; the yellow trace illustrates an example pupil response time course.

Facilities and Hardware

Experiments were carried out in a sound-insulated and acoustically optimized booth (2.6 × 3.6 × 2.5 m, T₂₀ = 0.1 s), meeting the specifications for free-field audiometric testing as defined in ISO 8253-2 (International Organization for Standardization, 1994). Stimuli were presented via a three-way coaxial Genelec 8351A loudspeaker (Genelec Oy, Finland), positioned 1 m in front of the participant at 0° azimuth. Visual stimuli and instructions were displayed on a FlexScan EV2451 monitor (EIZO, Japan) mounted above a Tobii Pro Spectrum eye tracker (Tobii AB, Sweden) at a viewing distance of 0.65 m. The eyetracker records pupil diameter in millimeters. The loudspeaker, eye tracker, and control computer (placed outside of the booth) were connected via an RME Fireface 802 soundcard (RME, Germany). Experiments were controlled using MATLAB R2023b with the Audio Toolbox v24.2. Ambient illumination was maintained at 100 ± 10 lx (Voltcraft LX-1108 luxmeter) with the monitor displaying a mid-gray background (RGB [0.55, 0.55, 0.55]). Eye-tracking data were recorded at 300 Hz. The stimulus recording, processing, and playback were conducted at a sampling rate of 96 kHz in order to improve the reconstruction of peak sound pressure levels.

Participants

Twenty-seven young normal-hearing adults (17 female, 10 male; age range: 19–29 years, M = 24.6, SD = 2.8) participated in the study. Participants were recruited via the volunteer database of the German Institute of Hearing Aids (Lübeck, Germany) and announcements among university students. Inclusion criteria comprised normal hearing thresholds, defined as air-conduction thresholds ≤15 dB HL between 0.5 and 6 kHz, no acute ear or respiratory infections, and unobstructed ear canals with at least 50% of the tympanic membrane visible upon otoscopic inspection. All participants provided written informed consent and received financial compensation. The study was approved by the Ethics Committee of the Technical University of Applied Sciences Lübeck, Germany (vote from July 17, 2025).

Impulse Sound Playback

Nine impulse sounds were selected from an open dataset published by Husstedt et al. (2023). The set included the sounds: bursting balloon, rattle of cutlery, slamming of a book, rattle of dishes, glass clink, glass put on table, knife on plate, keychain dropped, and starter clapper. All sounds had been recorded at 1 m distance in an “anechoic” chamber using a ¼″ free-field microphone (Type 4939, Brüel & Kjær, Denmark). Peak sound pressure levels had been determined according to IEC 61672, that is, the sample-based peak value. To emulate everyday listening, each of the nine impulses was rendered in three acoustic conditions:

Anechoic [original free-field recordings from Husstedt et al. (2023)].

Reverberant via convolution with a binaural room impulse response from the AIR database (Jeub et al., 2009): office room, source–listener distance 1 m, 0° azimuth, dummy-head recording; the two channels were down-mixed to mono and resampled to 96 kHz. The reverberant rendering was presented via a single loudspeaker at 0° azimuth. This condition represents a controlled but simplified approximation of natural reverberation, which in real environments is typically diffuse and arrives from multiple spatial directions.

Anechoic with speech, created by embedding each impulse into the International Speech Test Signal (ISTS; Holube et al., 2010) at 65 dB SPL with a randomly selected 2-second lead-in and a 1-second continuation segment.

These three acoustic conditions were chosen to represent a baseline stimulus set and two ecologically motivated extensions. The anechoic condition corresponds to the original free-field recordings as published and previously used by Husstedt et al. (2023), providing a direct reference. The reverberant condition was included because impulsive sounds are rarely encountered under anechoic conditions in everyday life. A binaural impulse response from the publicly available AIR database was therefore selected, using the office room setting (RT60 ≈0.37 s). Compared with other AIR room options, this office response represents a moderately reverberant environment and a common listening situation for many participants, whereas alternatives such as a studio booth or a lecture room reflect more specialized acoustic settings (Jeub et al., 2009). Finally, the anechoic-with-speech condition was designed to test whether embedding impulses in ongoing speech changes perception. This is particularly relevant for future studies with hearing aid users, as modern devices detect the listening situation and adapt their signal processing accordingly. As a result, impulse noises presented in speech or in quiet may be processed and perceived differently. The ISTS was chosen because it is language-independent, freely available, and provides a standardized speech-like masker without introducing semantic content that could differentially distract participants.

This yielded 27 stimuli (9 impulses × 3 conditions) in total. Figure 2 illustrates the time waveforms for the “anechoic” and “reverberant” conditions. The “reverberant” versions show prolonged decays relative to the “anechoic” recordings. All signals were equalized and scaled accordingly to ensure reproducible peak sound pressure levels. To this end, the initial anechoic 10 ms of the impulse response was equalized in the time domain to represent a minimum-phase bandpass response with edge frequencies 80 Hz and 20 kHz based on a regularized least-squares approach (Kirkeby & Nelson, 1999). To verify the target presentation levels, free-field verification measurements for target peak levels between 80 and 120 dB SPL were conducted. These measurements confirmed that playback levels closely matched the intended targets across all stimuli, remaining within ±3 dB up to 110 dB SPL. Calibration accuracy was maintained also at 120 dB SPL within ±2 dB for nearly all signals, except for rattle of cutlery (−8.4 dB/−7.1 dB) and knife on a plate (−4.3 dB/−4.2 dB), where reproduction limits of the loudspeaker system led to reduced peak levels. Slamming of a book in the “reverberant” condition (−3.3 dB) and starter clapper in the “anechoic” condition (−3.1 dB) also slightly exceeded the ±3 dB tolerance.

Figure 2.

Time-domain waveforms (0.15 s windows) of the nine impulse sounds used in the study, comparing “anechoic” (orange) and “reverberant” (green) conditions at a reference peak amplitude of 0.2 Pa (≈80 dB SPL peak), with both conditions shown after absolute peak-based temporal alignment.

Discomfort and Loudness Rating

Two main rating conditions were assessed, utilizing a discomfort scale and a loudness scale. Each rating comprised nine stimulus types across the three acoustic conditions described above. The scales were presented in German. The English translation of both scales is shown in Figure 3. The discomfort scale consisted of nine categorical steps ranging from “not uncomfortable” to “extremely uncomfortable,” following the procedure from Husstedt et al. (2023). The loudness scale featured eleven steps from “not audible” to “too loud,” based on ISO 16832 (International Organization for Standardization, 2003) in the German version (Brand & Hohmann, 2002). For both scales, the dashed line labels indicate selectable intermediate response categories between the verbally defined anchors. Rating buttons were initially disabled and became active 2 s after stimulus onset. Participants provided their response by selecting the respective categories via mouse click, after which the next stimulus was presented automatically. Both scales were presented via an interactive on-screen interface, and participants selected their responses by clicking on the corresponding category.

Figure 3.

English versions of the two rating scales used in the experiment: (a) the discomfort scale comprised nine categories ranging from “Not Uncomfortable” to “Extremely Uncomfortable” (Husstedt et al., 2023) and (b) the loudness scale comprised 11 categories from “Can’t Hear” to “Too Loud” (Brand & Hohmann, 2002; International Organization for Standardization, 2003).

Ratings from both scales were converted to a percentage scale from 0% (lowest category) to 100% (highest category) for better comparability. To describe the psychophysical relationship between physical input level L given in peaks sound pressure level and perceived intensity P, ratings were subsequently fitted with power functions of the form

P = k (L - L_{0})^{n},

following Stevens’ law of psychophysics (Stevens, 1957). The parameters k,

L_{0}

, and n of the model were estimated using bounded nonlinear least-squares optimization applied to all non-ceiling data points. This approach minimized the squared deviation between predicted and measured ratings while enforcing physiologically plausible parameter ranges. These functions were not only used to characterize the observed ratings but also to extrapolate perceived discomfort and loudness beyond the tested range, allowing estimation of perceived magnitude at reference levels below 80 dB SPL and above 120 dB SPL.

Self-Reported Sound Sensitivity

Self-reported sound sensitivity was assessed using the Sound Sensitivity Symptoms Questionnaire, Version 2.0 (SSSQ2), developed by Aazh and Kula (2024). For the present study, the questionnaire was translated into German and administered prior to the experimental session. The German translation, including all response options, can be found in Supplemental Table 2. The SSSQ2 comprises six items addressing the frequency of auditory discomfort, emotional reactions, and avoidance behavior associated with everyday sounds. Participants indicated how often they had experienced each symptom within the last 2 weeks on a 4-point scale (0–1 day(s), 2–6 days, 7–10 days, and 11–14 days). Each response option was assigned to a numerical value from 0 to 3 (Supplemental Table 3). A total score ranging from 0 to 18 can be obtained, with higher scores reflecting greater sound sensitivity.

Pupillometry

Pupil dilation was recorded continuously to examine physiological correlates during the perception of impulse sounds. For each combination of level, signal, and presentation type, data from both rating scales and their respective test–retest runs were combined, resulting in four measurements per condition. The data were acquired at a sampling rate of 300 Hz, with only the left eye included in the analysis. The system was recalibrated between sessions as needed to ensure stable tracking accuracy. The timing of the trial is described in “Study design and experimental sequence” and visualized in Figure 1. For conditions 1 and 2 (“anechoic” and “reverberant”), the prestimulus pause directly preceded the impulse; for condition 3 (impulse embedded in speech), the pause occurred before the onset of the speech segment. During stimulus playback, the rating buttons were temporarily disabled, which automatically rendered them in a dimmed, low-contrast appearance (MATLAB default for inactive UI elements) against the uniformly dark-gray background, thereby indicating playback while keeping changes in screen luminance minimal. Participants were instructed to maintain visual fixation near the center of the rating scale throughout the recording and to minimize blinking during sound playback. Instead of presenting a fixation cross, the mid-scale label (e.g., “Medium”/“Uncomfortable”) served as the fixation point to avoid additional on-screen changes; participants were asked to start searching for the response option only 1 s after the impulse. The recorded data were processed using a custom MATLAB (R2024b, The MathWorks Inc., USA) toolbox, following established recommendations for robust and replicable pupillometric analysis (Geller et al., 2020; Husstedt et al., 2025; Kret & Sjak-Shie, 2019; Naylor et al., 2018; Seropian et al., 2022; Winn et al., 2018; Zekveld et al., 2010, 2018). Gaze data were recorded but not further analyzed. Blinks and signal dropouts were automatically detected by the eyetracker. To account for eyelid closure effects, gaps were extended by ±100 ms before and after the missing segment. Trials containing more than 20% missing data were excluded. Based on this criterion, 4,272 of 15,072 trials (28.3%) were excluded, leaving 10,800 trials (71.7%) for further analysis. Remaining gaps were linearly interpolated after extending blink-related missing segments by ±100 ms, and the pupil signal was low-pass filtered at 4 Hz (fourth-order Butterworth). Transient artifacts were additionally detected using a median–absolute–deviation criterion applied to the pupil-velocity signal. To this end, sample-wise changes in pupil size were transformed into a velocity trace, the distribution of which was used to compute the median absolute deviation. All samples exceeding a threshold of 16 times the median absolute deviation were subsequently flagged as physiologically implausible rapid changes for use in trial plausibility control (Geller et al., 2020; Kret & Sjak-Shie, 2019). Trials were then sequenced from −4 to +2 s relative to the impulse onset for conditions 1 and 2, for condition 3, the duration of the speech segments was added (−6 to +2 s). Those trials were then downsampled to 50 Hz and averaged per condition and participant. A baseline from −0.25 to 0 s (immediately before the impulse) was used to express pupil diameter as a percent change from baseline. For statistical analyses, a poststimulus analysis window was defined around the global maximum of the averaged pupil response across conditions. The peak occurred at ∼1.3 s after stimulus onset, and a 0.4 s window (1.1–1.5 s) was used to extract mean pupil dilation (MPD). Prior to finalizing the analysis window, several alternative definitions were explored, including (i) a broader 1 s window centered at 1.3 s, (ii) a 1 s window centered on the subject-specific absolute peak, and (iii) a 0.4 s window centered on the earlier local maximum (∼0.8 s). Although these alternatives yielded comparable overall result patterns, they showed increased interindividual variability.

Statistical Evaluation

Normality was assessed using the Shapiro–Wilk test for variables entering parametric t-tests. For linear mixed-effects models (LMMs), normality assumptions were evaluated at the level of model residuals rather than predictors. Comparisons were conducted using paired or unpaired t-tests. For multiple comparisons, Bonferroni's correction was applied. Statistical significance was evaluated at the levels p < .05, p < .01, and p < .001. Subjective ratings were fitted individually with power functions according to Stevens’ law (Stevens, 1957) to describe the relationship between presentation level and perceived intensity. In addition to the Stevens power-law model, two alternative growth models were evaluated: a categorical loudness growth function based on Brand and Hohmann (2002) and a psychometric function (Knoblauch & Maloney, 2012). Each model was fitted individually to the rating data using bounded nonlinear least-squares estimation. Because the models differ in complexity (two vs. three free parameters), goodness of fit was assessed not only with R² but also with RMSE, Akaike's information criterion (AIC), and Bayesian information criterion (BIC), which penalize model complexity test–retest reliability was assessed with two-way random-effects intraclass correlation coefficient (ICC; Shrout & Fleiss, 1979) including 95% confidence intervals, as well as a paired t-test at the subject level. Associations between subjective and physiological measures were examined using LMMs. Random-effects structures were first compared using restricted maximum likelihood (REML), starting from a random-intercept model and evaluating by-participant random slopes for repeated-measures predictors (presentation level and subjective rating). Random slopes for presentation level were considered but were not retained because they did not improve model fit (likelihood-ratio tests and AIC/BIC) while increasing model complexity. Fixed-effects structures were subsequently compared using maximum likelihood (ML) estimation and nested likelihood-ratio tests, supplemented by AIC and BIC for model selection (Gordon, 2019; Meteyard & Davies, 2020). The final model was refitted using REML for parameter estimation. In total, seven candidate random-effects structures and six fixed-effects models were evaluated for each dependent variable.

Results

Subjective Rating

To quantify the psychophysical growth of perceived loudness and discomfort, ratings were analyzed as a function of presentation level. The power-function fits with boxplots for the actual ratings for loudness and discomfort are shown in Figure 4, based on ratings obtained at the predefined 10-dB presentation steps between 80 and 120 dB SPL. The additional 5-dB step-down trials were excluded from the analyses due to their conditional nature; a descriptive overview including these observations is provided in the Supplemental Material. Values outside this range were derived through extrapolation of the fitted functions (gray areas in Figure 4). Loudness increased gradually from near 0% at 60 dB SPL to saturation at 129 dB SPL, whereas discomfort rose steeply above 95 dB SPL, reaching 100% at 134 dB SPL. Although all candidate models yielded high coefficients of determination, comparison using RMSE, AIC, and BIC revealed systematic differences between model formulations. For discomfort, the power model clearly outperformed both the psychometric and the Brand–Hohmann model, whereas for loudness, the power and Brand–Hohmann models showed comparable fit quality (see Supplemental Table 4).

Figure 4.

Power-function fits of subjective loudness (violet) and discomfort (gold) ratings averaged across participants. Thin curves represent individual participant fits, whereas bold curves indicate the group-level mean fit. Ratings were obtained at sound pressure levels between 80 and 120 dB SPL in 10-dB increments. Boxplots depict the distribution of empirical ratings at each level, with the central line indicating the median, the box spanning the interquartile range (25th–75th percentiles), and whiskers extending to 1.5 times the interquartile range. Outliers are shown as small circular markers in the corresponding condition color and are defined as values lying more than 1.5 times the interquartile range below the 25th percentile or above the 75th percentile. Sample sizes per level (N) are shown below the loudness boxplots and above the discomfort boxplots.

For the power-function model depicted, discomfort ratings demonstrated slightly lower prediction error and more favorable information-criterion values than loudness. This indicates a marginally better overall model fit for discomfort across participants (see Supplemental Table 5).

Linear regression (Figure 5) revealed a very strong association between loudness and discomfort ratings (R² = 0.852, p < .001), indicating that ∼85% of the variance in discomfort ratings was explained by perceived loudness. The fitted model (Discomfort = −28.94 + 1.12 × Loudness) showed a highly significant slope, t(133) = 27.64, p < .001, reflecting a near-linear coupling across the tested intensity range.

Figure 5.

Relationship between loudness and discomfort ratings averaged across participants and level (27 × 5 observations). Colored markers represent the five presentation levels (80–120 dB SPL in 10 dB steps). Faint points indicate individual participant-by-level responses.

Importantly, however, the regression parameters indicate a systematic shift between the two scales. The negative intercept and slope greater than one suggests that discomfort emerges later but increases more steeply once activated, consistent with the delayed onset and stronger growth pattern observed in the power-function fits (Figure 4).

Test–retest reliability was evaluated separately for loudness and discomfort across all participants, signals, and levels (Figure 6). Reliability was assessed using a two-way random effects ICC (ICC [2,1]). The analysis showed excellent repeatability for both perceptual scales, yielding ICC = 0.877 [95% CI 0.870–0.884] for loudness and ICC = 0.875 [95% CI 0.868–0.882] for discomfort (p < .001 for both). Each point in the figure represents one test–retest pair, with marker size increasing when multiple participants provided identical ratings. The diagonal identity line indicates perfect agreement between test and retest. The fitted functions provide a quantitative description of perceived intensity across and around the tested level range and illustrate the stability of both perceptual judgments across sessions. In addition, potential systematic shifts between sessions were examined using paired t-tests at the subject level. Although discomfort ratings showed a small numerical decrease at retest (mean difference = 2.14%), this effect did not remain significant after Bonferroni correction (p = .056), and no significant difference was observed for loudness (p = .74), supporting the overall stability of both perceptual scales.

Figure 6.

Test–retest comparison of loudness (left) and discomfort (right) ratings across all participants, signals, and levels. Each marker represents a single test–retest observation; larger markers reflect overlapping identical values. The dashed line indicates perfect agreement (x = y).

The 50% levels derived from the fitted power functions are shown in Figure 7 for both rating scales across the three acoustic conditions. Each boxplot represents the distribution of participant averages for the “anechoic” (orange), “reverberant” (green), and “anechoic + speech” (blue) conditions. The upper panel illustrates the 50% level for the rating “uncomfortable,” and the lower panel the 50% level for “medium loudness,” both expressed in peak dB SPL. To assess differences in perceived discomfort across impulse types and acoustic conditions, a two-way repeated-measures ANOVA was conducted with the within-subject factors impulse type and acoustic condition. The analysis revealed significant main effects of impulse type, F(8, 672) = 34.81, p < .001, and acoustic condition, F(2, 672) = 4.34, p = .013, as well as a significant interaction − impulse type × acoustic condition, F(16, 672) = 2.71, p < .001. Posthoc comparisons for the discomfort scale showed significant differences between the “anechoic” and “reverberant” conditions for six of the nine impulse types and for the overall mean (p < .001). A similar pattern emerged for the comparison between “reverberant” and “anechoic + speech” (seven of nine), whereas no significant differences were found between the two anechoic conditions. For the loudness scale, both anechoic conditions differed significantly from the reverberant condition across all stimuli and for the mean (p < .001). A significant difference between the “anechoic” and “anechoic + speech” conditions was observed for only one impulse type and for the mean (p = .01).

Figure 7.

Fifty-percent levels (midpoints of the fitted power functions) for discomfort (top) and loudness (bottom) for the three acoustic conditions (“anechoic,” “reverberant,” and “anechoic + speech”) across the nine impulse types: bursting balloon (S1), rattle of cutlery (S2), slamming of a book (S3), rattle of dishes (S4), glass clink (S5), glass put on table (S6), knife on a plate (S7), keychain dropped (S8), and starter clapper (S9) and the mean over all signals. The boxplots central line indicates the median, boxes span the interquartile range (25th–75th percentiles), whiskers extend to the most extreme values within 1.5 × the interquartile range, and values beyond this range are plotted individually as outliers.

Pupil Dilation

Pupil responses relative to baseline are shown in Figure 8 as a function of presentation level (left) and acoustic conditions (right). Each trace represents the mean of the pupil dilation (PD) relative to the baseline across 27 participants, time-locked to the onset of the impulse sound. The shaded gray area marks the analysis window used to extract the mean pupil dilation (MPD) around a maximum.

Figure 8.

Pupil dilation (PD) relative to baseline as a function of presentation level (left panels) and acoustic condition (right panels). The upper panels show time courses of PD; shaded areas mark the analysis window used for mean pupil dilation (MPD) extraction, and lines represent participant averages. The lower panels show boxplots of MPD values averaged per participant and level (bottom left) and per participant and acoustic condition (bottom right), illustrating the distribution of MPD across listeners. Onset ISTS marks the onset of the International Speech Test Signal used in the anechoic with speech condition. The boxplots central line indicates the median, boxes span the interquartile range (25th–75th percentiles), whiskers extend to the most extreme values within 1.5 × the interquartile range, and values beyond this range are plotted individually as outliers.

For the level-dependent responses (left panel), pupil dilation increased with higher sound levels, while responses at 80 and 90 dB SPL showed little separation. Linear regression confirmed a weak but significant positive association between level and MPD, R² = 0.15, F(1, 133) = 23.6, p < .001. Additional exploratory analyses were conducted to examine the influence of the temporal extraction window. Specifically, broader fixed windows around 1.3 s and subject-specific peak-centered windows were compared. Across these approaches, the overall pattern of results remained unchanged. The 0.4 s window centered on the global peak showed the lowest between-subject variability while yielding effect sizes comparable to alternative window definitions. Although an earlier peak was visible around 0.8 s in the time course (Figure 8), analyses centered on this earlier component did not strengthen the level-dependent effect. The later peak around 1.3 s, therefore, served as the primary measure, as it provided a stable and consistent estimate of the task-related pupil response.

For the condition comparison (right panel), the repeated-measures ANOVA revealed a significant main effect of condition, F(2, 52) = 5.47, p = .007; Greenhouse–Geisser p = .014. Posthoc tests showed that “reverberant” stimuli evoked higher MPD values than “anechoic” (p < .001), whereas no significant differences were found between “anechoic” and “anechoic + speech” (p = .38) or between “reverberant” and “anechoic + speech” (p = .28).

Self-Reported Sound Sensitivity

Self-reported sound sensitivity was assessed using the German version of the Sound Sensitivity Symptoms Questionnaire 2.0. Individual total scores ranged from 0 to 17 (M = 3.3, SD = 3.5, Median = 2), indicating generally low symptom frequency across participants. While most participants reported minimal symptoms, four individuals obtained scores above 10 points, indicating more frequent occurrences of sound sensitivity symptoms within the two-week reference period. One participant reached a score of 17, representing a high expression of general sound sensitivity.

Interrelation of Measurement Results

To relate the subjective ratings to the physiological response, MPD relative to baseline was pooled across all impulse types and acoustic conditions and plotted as a function of the categorical ratings for discomfort and loudness (Figure 9). For the discomfort scale, a linear regression across all participant-by-level data points revealed a robust positive association between rating and MPD, slope = 0.0364% MPD per rating %, R² = 0.207, F(1,133) = 34.7, p < .001. For the loudness scale, a similar pattern emerged. A regression including all participant-by-level data points yielded a significant positive association between loudness rating and MPD, slope = 0.0429% MPD per rating %, R² = 0.201, F(1,133) = 33.5, p < .001. For both discomfort and loudness, inclusion of a quadratic term did not significantly improve model fit (all p > .13; ΔAIC < 2), and the linear model was, therefore, retained.

Figure 9.

Baseline-corrected mean pupil dilation (MPD) as a function of subjective ratings for discomfort (left) and loudness (right). Small colored symbols show individual participant data for each of the five presentation levels (80–120 dB SPL), with color indicating level. Large symbols denote the group means at each presentation level; horizontal caps indicate ±1 standard deviation (SD) of the subjective ratings across participants, and vertical caps indicate ±1 SD of MPD across participants. Black lines represent the linear regression fits across all single-subject data points.

To assess the relationship between subjective and physiological measures, LMMs were fitted, including presentation level, subjective rating, UCL, and SSSQ2 as fixed effects and participant as a grouping factor. Model building followed a stepwise procedure using ML estimation for comparisons of nested fixed-effects structures, whereas the final model was refitted using REML for parameter estimation. The full model comparison process for discomfort and loudness is reported in Supplemental Tables 6 and 8, and the final model estimates are provided in Supplemental Tables 7 and 9.

For MPD Discomfort, subjective rating significantly predicted MPD (β = 0.04, SE = 0.01, t = 5.11, p < .001; Supplemental Table 7). The final model included a random intercept and a random slope for rating by subject. The random intercept variance was 2.553 (SD = 1.598), the random slope variance was 0.001 (SD = 0.035), and the intercept–slope correlation was 0.674, indicating substantial between-participant variability in baseline MPD and in the strength of the rating–MPD association.

Model comparisons showed that adding presentation level improved model fit relative to the intercept-only model (Δχ² = 7.85, p = .005; Supplemental Table 6), but its effect became non-significant once subjective rating was included (Δχ² = 0.002, p = .964), indicating that MPD was more closely associated with subjective evaluation than with physical level per se. The rating-only model provided the best balance of fit and parsimony (AIC = 467.25, BIC = 484.68; Supplemental Table 6).

For MPD Loudness, a similar pattern emerged (Supplemental Tables 8 and 9). Subjective rating significantly predicted MPD, β = 0.05, SE = 0.01, t = 5.11, p < .001; Supplemental Table 9, whereas presentation level did not contribute additional explanatory power once rating was included (Δχ² = 0.04, p = .841; Supplemental Table 8). Random-effects estimates were comparable to those observed for discomfort (intercept variance = 2.617; slope variance = 0.001; intercept–slope correlation = 0.682; Supplemental Table 9).

Across both scales, presentation level predicted subjective ratings far more strongly than MPD (Δχ² = 175.04, p < .001), whereas MPD added only a small but significant portion of variance to rating models (Δχ² = 11.60, p < .001). The slopes of the rating–MPD relationships did not differ significantly between discomfort and loudness (p = .27).

Neither UCL nor SSSQ2 contributed significantly to explaining variance in MPD or subjective ratings. In the MPD models, UCL (p = .63) and SSSQ2 (p = .51) were nonsignificant predictors; similarly, UCL did not significantly predict discomfort (p = .165) or loudness (p = .54) ratings (Supplemental Tables 6 and 8).

In addition to the mixed-effects modeling, a simple linear regressions UCL, SSSQ2, subjective ratings, and MPD was conducted. UCL did not significantly predict MPD (β = −0.028, p = .266), nor discomfort ratings (β = −0.378, p = .226) or loudness ratings (β = −0.098, p = .708). Similarly, SSSQ2 did not significantly predict MPD (β = −0.062, p = .213), discomfort ratings (β = −0.460, p = .463), or loudness ratings (β = 0.117, p = .824).

Across all regression models, explained variance was minimal (R² ≤ 0.012), indicating that neither UCL nor SSSQ2 meaningfully accounted for individual differences in subjective ratings or pupil dilation.

Discussion

Rating of Loudness and Discomfort for Impulse Sounds

Both the loudness and discomfort scales followed Stevens’ law (Stevens, 1957) but differed markedly in their growth patterns. Loudness increased gradually from low values, showing a relatively shallow and almost linear rise across the measured range and reaching its maximum slightly earlier. For reference, broadband noise at 80 dB SPL typically corresponds to loudness values of ∼35–45 categorical units (CU) in standard loudness models (Brand & Hohmann, 2002), whereas the impulse stimuli used in the present study produce values of only around 10 CU on the ACALOS categorical loudness scale. Discomfort, by contrast, remained near zero up to about 90–95 dB SPL and then increased much more steeply once it emerged. Although the discomfort function showed a higher slope within the measured range, its delayed onset prevented it from reaching the same upper asymptote as loudness within the extrapolated region. Consistent with the regression analysis (Figure 5), this indicates that discomfort represents a response with a later onset but stronger growth in the perceptually relevant level range. While each fit is based on only five categorical rating points per condition, using a power-function model remains meaningful because listeners did not reach the upper end of the scale at the highest presentation levels. Therefore, the present power fit provides a way to capture the continuing growth of loudness and discomfort beyond the measured range, which cannot be inferred directly from the raw ratings alone. In interpreting these growth functions, it is important to note that the additional 5 dB step-down presentations, administered only after participants selected the highest category, were excluded from the quantitative analyses.

Although different fitting functions and stimuli were used, the overall shape of the loudness function observed here corresponds relatively closely to that reported in previous categorical loudness scaling studies (Brand & Hohmann, 2002; Rasetshwane et al., 2015). In principle, the adaptive categorical procedure and corresponding fit proposed by Brand and Hohmann (2002) could also have been applied to the present data. However, because that approach was specifically designed for estimating loudness rather than discomfort, we opted for the more general power-function formulation based on Stevens’ law. This choice was empirically supported by model-comparison analyses. In addition, a sigmoidal psychometric function was evaluated. However, this formulation did not capture the observed intensity progression as consistently as the Stevens power model across scales and conditions. Taken together, these findings justify the use of the Stevens-based power formulation as a consistent and parsimonious modeling approach for both loudness and discomfort growth within the present dataset.

The stimuli used are short, ecologically valid impulse sounds normalized by their sample-based peak level, whereas categorical scaling typically uses 1–3 s long noise stimuli and overall level calibration. Consequently, our model starts at higher thresholds (60 dB SPL) and extends up to 140 dB SPL to capture the perceptual range relevant for impulsive sounds, in contrast to the 10–100 dB range commonly encountered in categorical loudness scaling (Brand & Hohmann, 2002; Rasetshwane et al., 2015). Moreover, the present experiment employed a monotonically increasing level sequence rather than an adaptive procedure. When averaged across participants, the level corresponding to medium loudness was clearly below that of medium discomfort, confirming that listeners tolerated substantially higher levels before rating a sound as uncomfortable.

However, the discomfort values observed here are not directly comparable to typical uncomfortable loudness levels (UCLs) reported in the literature, which often occur around 100 dB HL for continuous or stationary stimuli (Moore, 2012). The use of short, impulse sounds with steep temporal envelopes likely accounts for these higher tolerance ratings in the present study. Ratings varied significantly across impulse types, indicating that certain categories elicited stronger reactions than others. These differences were further modulated by acoustic condition, as reflected in the significant impulse type × acoustic condition interaction. Loudness, in contrast, was primarily determined by presentation level, with additional modulation by acoustic context. These results closely match the unaided normal-hearing data reported by Husstedt et al. (2023), who presented the same recorded impulses at their original recording levels, corresponding to the lower end of playback intensities in the current study. While the overall growth pattern was dominated by presentation level, the present data nonetheless revealed robust signal-specific differences (see Figure 7), indicating that some impulse types are consistently perceived as more discomforting than others. This is also reflected in the distance between the 50% fit values (midpoints) of the two most widely separated signals: S5 and S9 differ by ∼13 dB on both the loudness and the discomfort scale. As illustrated in Figure 4, the discomfort scale also showed greater interindividual variability than loudness, reflected in the larger standard deviations across participants. This broader spread suggests that discomfort may capture more pronounced individual differences in the perception of impulsive sounds within this normal-hearing sample. Whether such variability translates into clinically meaningful distinctions requires systematic investigation in hearing-impaired populations.

Both scales demonstrated high test–retest reliability (ICC > 0.875), confirming that fitted categorical scaling yields internally consistent measures of impulse sound perception in young normal-hearing listeners. As shown in Figure 6, the majority of test–retest pairs for both scales cluster closely around the identity line, indicating generally high consistency across sessions. The discomfort ratings, however, exhibit slightly greater scatter than the loudness ratings, with one notable exception: the 0–0 category, which forms a dense cluster due to many stimuli being rated as “not uncomfortable” in both repetitions. This pattern motivated an additional ICC analysis excluding these 0–0 pairs to obtain reliability estimates that are less affected by this disproportionate accumulation. This adjustment particularly affected the discomfort scale, whereas loudness was only minimally impacted. Even under these more conservative conditions, reliability remained (ICC = 0.802 for discomfort; 0.881 for loudness), depending on the framework, well within the “good” (Koo & Li, 2016) or “excellent” (Cicchetti, 1994) range.

Influences of Reverberation and Speech Embedding

After accounting for the effects of signal and level, the acoustic conditions further shaped the perceptual outcomes. This likely reflects an enhanced salience of transient energy resulting from temporal smearing and additional spatial cues, accompanied by an overall increase in signal energy. The existing body of evidence on how reverberation influences perceived loudness, however, remains rather limited, as most psychoacoustic studies have focused on speech intelligibility or spatial perception rather than loudness itself. Ellis and Zahorik's (2021) findings suggest that reverberation and loudness perception share similar underlying mechanisms, which may also account for the elevated loudness ratings observed in the present “reverberant” condition, notwithstanding that the reverberation in this experiment was delivered from a single loudspeaker position and therefore constitutes only a coarse approximation of a natural, diffuse reverberant field. For loudness, a small but consistent effect of the speech background was observed: ratings in the “anechoic + speech” condition were slightly higher than in the purely “anechoic” presentation, indicating that listeners may have incorporated the ongoing 65 dB SPL speech signal into their overall loudness judgment, despite instructions to ignore it. Discomfort ratings, in contrast, remained unaffected by the presence of speech, suggesting that, within the present level range and population, discomfort judgments were relatively unaffected by the speech background once the impulse was clearly audible. Pupil dilation followed the same pattern, increasing with level and reaching its maximum in the “reverberant” condition, while differences between the two “anechoic” conditions were not significant. The relation between ratings and physiological data suggests that both reflect heightened perceptual and autonomic responses to more salient acoustic environments. The overall higher pupil dilation in the “reverberant” condition corresponds to the higher subjective ratings and is consistent with the level-dependent growth of perceived loudness reported by Liao et al. (2016). Only few studies have explicitly examined how reverberation affects pupil responses or perceived loudness, but similar effects have been observed for speech stimuli, where reverberation increased both listening effort and pupil dilation compared to “anechoic” conditions (Kuusinen et al., 2023).

Relation Between Different Subjective and Physiological Measures

Pupil dilation increased systematically with subjective intensity and closely followed both loudness and discomfort ratings. However, pupil responses should not be interpreted as a direct measure of loudness or discomfort. Rather, they likely reflect a shared underlying dimension of perceptual salience and autonomic arousal. Loudness primarily reflects sensory magnitude coding, whereas discomfort involves additional affective-evaluative components. The pupil response, as an index of arousal and attentional engagement, may therefore represent a converging output of these processes. In this sense, loudness, discomfort, and pupil dilation are not identical measures but partially overlapping expressions of stimulus-driven salience within different physiological and perceptual domains (Liao et al. 2016). Across all data points, pupil dilation rose significantly with higher rating values. Both scales predicted the pupil response equally well, suggesting that the physiological and perceptual measures are systematically associated and may reflect overlapping processes related to perceptual salience and arousal. Linear mixed-effects modeling further demonstrated that subjective ratings, rather than the physical level alone, best explained the variance in pupil dilation. When level was included as the sole predictor, its effect vanished once the corresponding rating entered the model, indicating that the pupil response is more tightly linked to perceived than to physical intensity. Conversely, level remained the dominant factor for the ratings themselves, while the addition of pupil dilation accounted only for a small but significant proportion of residual variance. Within the present experimental setting and sample, pupillometric responses did not provide substantially different information beyond subjective ratings. However, whether other physiological measures may reveal additional aspects under more demanding or clinically relevant conditions remains to be determined.

Neither individual UCLs nor self-reported sound sensitivity are related to either the ratings or the pupil data. The lack of correlation between UCLs and discomfort ratings highlights an important conceptual distinction: UCLs, typically obtained with continuous sinusoidal tones, reflect a sensory tolerance to sustained loudness, whereas discomfort to impulse sounds involves transient, startle-like components that engage additional affective and autonomic processes. The mean UCL PTA across all normal-hearing participants was ∼100 dB HL, aligning with established normative values (Moore, 2012), yet the uncomfortable level for impulsive sounds occurred at higher effective levels, suggesting that distinct perceptual mechanisms may be involved. Those levels are also very hard to compare, as sample-based peak dB SPL levels are compared to continuous dB HL levels. The absence of a systematic effect of SSSQ2 scores suggests that the observed effects reflect general auditory responses rather than individual hypersensitivity. Overall, these findings show that pupil dilation provides a converging physiological correlate of perceived intensity within the present paradigm. As impulsive sounds become louder, they evoke both higher subjective ratings and stronger autonomic activation, illustrating a tightly coupled perceptual–physiological response to short, high-level acoustic events.

Limitations

Calibration deviations between impulse types and acoustic conditions introduced small discrepancies between target and actual playback levels. Power-function fits based on either set of values produced only marginal parameter differences; therefore, target levels were used throughout to ensure a uniform reference and to enable consistent pooling in the pupillometric analysis. Although the 5 dB step-down presentations following the maximum rating were excluded for analysis and were by nature the last played back level for that signal, they might have influenced the perception of the following signal. Such responses likely reflect two converging factors: (a) participants’ expectation of a further level increase, given that all preceding presentations followed a strictly monotonic progression, and (b) the fact that these highest perceptual ratings were given only by the most sensitive listeners, resulting in sparse data with disproportionate influence on fitted curves. Consequently, these observations were treated as systematic deviations from the intended rating structure rather than as perceptual variability within the primary measurement range. The precision of the fitted growth functions was limited by the 10 dB level spacing, which constrains resolution. Although the non-adaptive, strictly ascending presentation worked well, an adaptive procedure may ultimately prove superior because it reduces the inherent predictability of the level progression. Ecological validity was also limited: the reverberant condition reproduced only a single-source approximation rather than a diffuse sound field.

Despite controlled illumination and display conditions, pupil dilation remains susceptible to small variations in arousal or visual adaptation, and thus might include residual variance unrelated to the acoustic stimuli (Geller et al., 2020; Winn et al., 2018). Additionally, the SSSQ2 does not optimally capture sensitivity to transient sounds specifically, which limits conclusions regarding the role of individual sound sensitivity in the present context.

Another limitation of the present study is that all data were obtained from young, normal-hearing adults under unaided listening conditions. The findings, therefore, cannot be generalized to hearing-impaired listeners or to hearing-aid users, for whom impulse perception may differ substantially due to altered auditory dynamics and signal processing. In addition, no impulse noise reduction algorithms were evaluated directly. Consequently, the present results should not be interpreted as evidence for the clinical utility or optimization of specific hearing-aid settings, but rather as a baseline characterization that may inform future investigations in clinical populations.

Considerations for Future Assessment Approaches

The present findings may inform future efforts aimed at developing streamlined approaches for assessing perception of impulsive sounds. Although subjective ratings and pupillometric responses showed strong correspondence in the present dataset, this does not preclude situations in which perceptual judgments and physiological markers may diverge, for example, in more complex everyday environments, under fatigue, or during demanding speech-in-noise tasks. In such contexts, pupillometric measurements might offer additional insight into how impulsive sounds are processed under increased cognitive load. Moreover, pupillometry represents a potential complementary approach for evaluating responses to impulsive sounds without requiring active rating, which may be relevant in populations unable to provide consistent behavioral ratings. Within the present laboratory setting and normal-hearing sample, subjective ratings captured the primary subjective effects observed. Under these controlled conditions, behavioral measures accounted for the main variance in perceived impulse intensity, without requiring additional physiological indices. Future research should determine whether a shortened version of the present paradigm can reliably capture impulse sound perception in hearing-impaired listeners, for whom testing time and cognitive load represent important practical considerations, and whether such an approach remains informative without the need for additional physiological measurements such as pupillometry.

Among the tested acoustic conditions, the reverberant condition yielded the largest perceptual and physiological effects within the present experimental framework. From a technical perspective, it yielded higher sensitivity at lower presentation levels, thereby reducing the likelihood of approaching the upper dynamic range limits of typical presentation hardware while still capturing the relevant perceptual differences. These lower effective output levels may simplify implementation in laboratory settings, as comparable effects were obtained without reaching the upper dynamic limits of typical playback hardware, even though the required peaks remain high and call for appropriate caution. While the calibration offsets resulted in small deviations, they did not alter the overall pattern of results; nevertheless, this should be verified systematically in a setup capable of consistently producing peaks of 120 dB SPL, for example, by using headphones. Beyond these practical advantages, the reverberant condition also better approximates everyday listening environments, making it, within the constraints of the present setup, the condition that most closely approximated everyday listening environments. Based on the current results, a single rating scale captured the primary perceptual effects observed within this sample. This conclusion is supported by the strong association between loudness and discomfort ratings, indicating substantial shared variance across stimuli and levels. The discomfort scale exhibited steeper growth and greater interindividual variability than loudness in the present dataset, which may render it particularly sensitive to differences in perceived tolerability of impulsive sounds. Its steeper slope also allows reliable function fitting across a narrower level range (starting around 90 dB SPL). In addition, speech embedding showed a more consistent effect on the discomfort scale than on loudness, suggesting that this measure generalizes better across acoustic contexts. While the test can be streamlined to rely on a single perceptual scale, maintaining a limited yet diverse set of impulse types may be beneficial, as the observed signal-specific differences suggest that stimulus diversity captures important perceptual variance. Such diversity can enhance ecological validity without adding notable measurement time. For practical application, future protocols should omit the 5 dB step-back once the maximum rating is reached, as postmaximum trials produced unreliable data in the present study. Although the test–retest reliability was already high, repeating the complete measurement multiple times per condition (nine impulses, 90–120 dB SPL, optionally up to 130 dB SPL) may further improve precision. Future work could examine impulse perception both without hearing aids and under aided conditions, with and without speech, to explore potential interactions with signal processing (Husstedt et al., 2023). This aspect, however, requires systematic investigation in future studies. Including speech in the aided condition may be informative because several hearing-aid algorithms, such as noise reduction and, in this context, the particularly relevant impulse-noise reduction, may adjust their behavior in response to speech cues. These algorithms are typically optimized to maintain speech audibility and comfort and could therefore also influence how impulsive sounds are handled when speech is present. Assessing impulse perception under such mixed conditions might offer additional insight. In preliminary laboratory testing, the proposed optimized protocol required ∼10–15 min per participant.

Conclusion

This study provides a systematic characterization of impulse sound perception using subjective ratings and pupillometric measures under controlled laboratory conditions in a normal-hearing population. Both loudness and discomfort followed predictable psychometric growth functions, with discomfort exhibiting steeper growth and greater interindividual variability across stimuli and levels in the present sample. Within the present experimental framework, impulse sound perception was reliably captured using subjective behavioral ratings of discomfort and loudness. The close correspondence between behavioral and physiological measures indicates that pupil dilation primarily mirrored subjective judgments in this paradigm. Reverberation increased discomfort, loudness, and pupil dilation at comparatively lower peak levels, indicating enhanced perceptual salience under this acoustic condition. Embedding impulse sounds in speech did not change the discomfort ratings in the unaided condition but may become relevant in future studies examining hearing aid algorithms. Minor calibration offsets at the highest measured level did not alter the overall pattern of results. Moreover, no associations were found between pure-tone UCLs, discomfort ratings, and SSSQ2 scores. This absence of correlation suggests that these measures reflect distinct aspects of sound intolerance, with discomfort to impulsive sounds representing a distinct perceptual dimension. Taken together, the present findings suggest that discomfort ratings obtained under reverberant conditions captured key perceptual characteristics of impulse sounds in this sample. Whether these findings generalize to hearing-impaired listeners and aided listening conditions remains to be determined in future studies.

Supplemental Material

sj-docx-1-tia-10.1177_23312165261446374 - Supplemental material for Characterizing Perception of Impulse Sounds Through Subjective Ratings and Pupillometric Responses

Supplemental material, sj-docx-1-tia-10.1177_23312165261446374 for Characterizing Perception of Impulse Sounds Through Subjective Ratings and Pupillometric Responses by Luca Wiederschein, Anna Schließ, Florian Denk and Hendrik Husstedt in Trends in Hearing

Footnotes

Acknowledgments

The authors thank the participants for their valuable time. English language corrections were assisted by ChatGPT (OpenAI, Inc.). The authors take full responsibility for the content.

ORCID iDs

Luca Wiederschein

Florian Denk

Hendrik Husstedt

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The corresponding author's position is partially funded by the William Demant Foundation.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Supplemental Material

Supplemental material for this article is available online. The stimuli used in the present study, including the anechoic (ST) and reverberant (OF) conditions without ISTS, are publicly available as a MATLAB dataset on Zenodo (Zenodo record 19659610). The dataset contains nine original signals scaled to peak levels between 80 and 120 Pa (5 dB steps), sampled at 96 kHz with 24-bit resolution, and organized by condition ().

References

Aazh

Kula

F. B.

(2024). The sound sensitivity symptoms questionnaire version 2.0 (SSSQ2) as a screening tool for assessment of hyperacusis, misophonia and noise sensitivity: Factor analysis, validity, reliability, and minimum detectable change. Brain Sciences, 15(1), 16. https://doi.org/10.3390/brainsci15010016

Brand

Hohmann

(2002). An adaptive procedure for categorical loudness scaling. The Journal of the Acoustical Society of America, 112(4), 1597–1604. https://doi.org/10.1121/1.1502902

Cicchetti

D. V.

(1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in Psychology. Psychological Assessment, 6(4), 284–290. https://doi.org/10.1037/1040-3590.6.4.284

DiGiovanni

J. J.

Davlin

E. A.

Nagaraj

N. K.

(2011). Effects of transient noise reduction algorithms on speech intelligibility and ratings of hearing aid users. American Journal of Audiology, 20(2), 140–150. https://doi.org/10.1044/1059-0889(2011/10-0007)

Ellis

G. M.

Zahorik

(2021). Reverberation strength perceived by normal-hearing listeners predictable based on time-varying binaural loudness. Hearing Research, 409, 108316. https://doi.org/10.1016/j.heares.2021.108316

Fairfax

J. N.

(1987). Rating methods for impulsive noise [Master’s Thesis]. University of Southampton.

Geller

Winn

M. B.

Mahr

Mirman

(2020). Gazer: A package for processing gaze position and pupil size data. Behavior Research Methods, 52(5), 2232–2255. https://doi.org/10.3758/s13428-020-01374-8

Gordon

K. R.

(2019). How mixed-effects modeling can advance our understanding of learning and memory and improve clinical and educational practice. Journal of Speech, Language, and Hearing Research, 62(3), 507–524. https://doi.org/10.1044/2018_JSLHR-L-ASTM-18-0240

Holube

Fredelake

Vlaming

Kollmeier

(2010). Development and analysis of an international speech test signal (ISTS). International Journal of Audiology, 49(12), 891–903. https://doi.org/10.3109/14992027.2010.506889

10.

Husstedt

Hilgerdenaar

Frenz

Denk

Tchorz

(2023). Evaluation of impulse noise reduction in hearing aids with technical measurements and ratings of discomfort. Acta Acustica, 7, 47. https://doi.org/10.1051/aacus/2023042

11.

Husstedt

Schmidt

Wiederschein

Wiedenbeck

Kemper

Denk

(2025). Listening effort for soft speech in quiet. Trends in Hearing, 29, 23312165251370006. https://doi.org/10.1177/23312165251370006

12.

International Organization for Standardization. (1994). Acoustics—Audiometric test methods—Part 2: Sound field audiometry with pure tone and narrow-band test signals; identical with ISO 8253-2:1992 (SO 8253-2:1994-10; S. 12).

13.

International Organization for Standardization. (1997). Acoustics—Methods for the description and physical measurement of single impulses or series of impulses (ISO 10843:1997; S. 1–23). https://www.iso.org/standard/1179.html#lifecycle

14.

International Organization for Standardization. (2003). Acoustics—Loudness scaling by means of categories (ISO 226:2003). https://www.iso.org/standard/34222.html

15.

Jacobson

M. T.

Matthews

(1996). Generating uniformly distributed random Latin squares. Journal of Combinatorial Designs, 4(6), 405–437. https://doi.org/10.1002/(SICI)1520-6610(1996)4:6%253C405::AID-JCD3%253E3.0.CO;2-J

16.

Jeub

Schafer

Vary

(2009). A binaural room impulse response database for the evaluation of dereverberation algorithms. 2009 16th International Conference on Digital Signal Processing, 1–5. https://doi.org/10.1109/ICDSP.2009.5201259

17.

Keidser

Dillon

Flax

Ching

Brewer

(2011). The NAL-NL2 prescription procedure. Audiology Research, 1(1), e24. https://doi.org/10.4081/audiores.2011.e24

18.

Kemper

Denk

Husstedt

Obleser

(2025). Acoustically transparent hearing aids increase physiological markers of listening effort. Trends in Hearing, 29, 23312165251333225. https://doi.org/10.1177/23312165251333225

19.

Keshavarzi

Reichenbach

Moore

B. C. J.

(2021). Transient noise reduction using a deep recurrent neural network: Effects on subjective speech intelligibility and listening comfort. Trends in Hearing, 25, 23312165211041475. https://doi.org/10.1177/23312165211041475

20.

Kirkeby

Nelson

(1999). Digital filter design for inversion problems in sound reproduction. Journal of the Audio Engineering Society, 47(7/8), 583–595. https://aes.org/publications/elibrary-page/?id=12098

21.

Knoblauch

Maloney

L. T.

(2012). The psychometric function: Introduction. In: Modeling psychophysical data in R. Use R!, vol 32. Springer. https://doi.org/10.1007/978-1-4614-4475-6_4

22.

Koo

T. K.

M. Y.

(2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155–163. https://doi.org/10.1016/j.jcm.2016.02.012

23.

Korhonen

Kuk

Lau

Keenan

Schumacher

Nielsen

(2013). Effects of a transient noise reduction algorithm on speech understanding, subjective preference, and preferred gain. Journal of the American Academy of Audiology, 24(9), 845–858. https://doi.org/10.3766/jaaa.24.9.8

24.

Kret

M. E.

Sjak-Shie

E. E.

(2019). Preprocessing pupil size data: Guidelines and code. Behavior Research Methods, 51(3), 1336–1342. https://doi.org/10.3758/s13428-018-1075-y

25.

Krueger

Schulte

Zokoll

M. A.

Wagener

K. C.

Meis

Brand

Holube

(2017). Relation between listening effort and speech intelligibility in noise. American Journal of Audiology, 26(3S), 378–392. https://doi.org/10.1044/2017_AJA-16-0136

26.

Kuusinen

Kondraciuk

Lokki

(2023). Effects of masker type and reverberation on speech-in-noise recognition thresholds and listening effort as indexed by pupil dilation responses. Journal of the Audio Engineering Society, (154th Convetion), Convetion Paper 10650. https://aes2.org/publications/elibrary-page/?id=22057

27.

Liao

H.-I.

Kidani

Yoneya

Kashino

Furukawa

(2016). Correspondences among pupillary dilation response, subjective salience of sounds, and loudness. Psychonomic Bulletin & Review, 23(2), 412–425. https://doi.org/10.3758/s13423-015-0898-0

28.

Liu

Zhang

Bentler

R. A.

Han

Zhang

(2012). Evaluation of a transient noise reduction strategy for hearing aids. Journal of the American Academy of Audiology, 23(8), 606–615. https://doi.org/10.3766/jaaa.23.8.4

29.

Meteyard

Davies

R. A. I.

(2020). Best practice guidance for linear mixed-effects models in psychological science. Journal of Memory and Language, 112, 104092. https://doi.org/10.1016/j.jml.2020.104092

30.

Moore

B. C. J.

(2012). An introduction to the psychology of hearing (6th Aufl.). Brill | Academic.

31.

Naylor

Koelewijn

Zekveld

A. A.

Kramer

S. E.

(2018). The application of pupillometry in hearing science to assess listening effort. Trends in Hearing, 22, 2331216518799437. https://doi.org/10.1177/2331216518799437

32.

Neher

Wagener

K. C.

Fischer

R.-L.

(2016). Directional processing and noise reduction in hearing aids: Individual and situational influences on preferred setting. Journal of the American Academy of Audiology, 27(8), 628–646. https://doi.org/10.3766/jaaa.15062

33.

Picou

E. M.

Ricketts

T. A.

(2018). The relationship between speech recognition, behavioural listening effort, and subjective ratings. International Journal of Audiology, 57(6), 457–467. https://doi.org/10.1080/14992027.2018.1431696

34.

Radun

Maula

Rajala

Scheinin

Hongisto

(2022). Acute stress effects of impulsive noise during mental work. Journal of Environmental Psychology, 81, 101819. https://doi.org/10.1016/j.jenvp.2022.101819

35.

Rasetshwane

D. M.

Trevino

A. C.

Gombert

J. N.

Liebig-Trehearn

Kopun

J. G.

Jesteadt

Neely

S. T.

Gorga

M. P.

(2015). Categorical loudness scaling and equal-loudness contours in listeners with normal hearing and hearing loss. The Journal of the Acoustical Society of America, 137(4), 1899–1913. https://doi.org/10.1121/1.4916605

36.

Sagastume

I. G.

(2014). Generation of random Latin squares step by step and graphically .

37.

Schomer

P. D.

(1978). Growth function for human response to large-amplitude impulse noise. The Journal of the Acoustical Society of America, 64(6), 1627–1632. https://doi.org/10.1121/1.382128

38.

Seropian

Ferschneider

Cholvy

Micheyl

Bidet-Caulet

Moulin

(2022). Comparing methods of analysis in pupillometry: Application to the assessment of listening effort in hearing-impaired patients. Heliyon, 8(6), e09631. https://doi.org/10.1016/j.heliyon.2022.e09631

39.

Shrout

P. E.

Fleiss

J. L.

(1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420–428. https://doi.org/10.1037/0033-2909.86.2.420

40.

Stevens

S. S.

(1957). On the psychophysical law. Psychological Review, 64(3), 153–181. https://doi.org/10.1037/h0046162

41.

Winn

M. B.

Wendt

Koelewijn

Kuchinsky

S. E.

(2018). Best practices and advice for using pupillometry to measure listening effort: An introduction for those who want to get started. Trends in Hearing, 22, 2331216518800869. https://doi.org/10.1177/2331216518800869

42.

Zekveld

A. A.

Koelewijn

Kramer

S. E.

(2018). The pupil dilation response to auditory stimuli: Current state of knowledge. Trends in Hearing, 22, 2331216518777174. https://doi.org/10.1177/2331216518777174

43.

Zekveld

A. A.

Kramer

S. E.

Festen

J. M.

(2010). Pupil response as an indication of effortful listening: The influence of sentence intelligibility. Ear & Hearing, 31(4), 480–490. https://doi.org/10.1097/AUD.0b013e3181d4f251

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.37 MB