Abstract
Perceiving and reacting to multidimensional objects creates so-called event files via feature binding. Bogon, Thomaschke, and Dreisbach provided the first evidence for the integration of the feature
Keywords
To perceive and interact with our environment appropriately, we need to temporally bind the different features of an object/event to form a coherent representation. Such binding processes have already been demonstrated for various modalities (visual, auditory, and multimodal) and different feature combinations, including stimulus-stimulus (S-S) bindings, stimulus-response (S-R) bindings, and response-response (R-R) bindings (for reviews, see Frings et al., 2020; Spence & Frings, 2020). Although most studies have demonstrated the binding of static features, recent evidence suggests that dynamic features, such as duration, are also bound (Bogon et al., 2023; Bogon, Thomaschke, & Dreisbach, 2017; Köllnberger et al., 2023; Mocke et al., 2022; Pfister et al., 2022). However, it is still unknown whether duration as a stimulus feature is bound to another stimulus feature, to the response feature, or possibly to both. The following study aims to clarify this question.
Object perception and the binding problem
When we hear a smoke alarm beeping, this event consists of several different features, such as the pitch, volume, melody, and duration of the sound; its location (e.g., from the kitchen); and possible appropriate responses, such as evacuating the dwelling and calling the fire department. In our everyday life, we are able to perceive objects and events of different modalities as a coherent unit and, if necessary, react to them, although the features that form such objects and events are usually processed in different parts of the brain (Felleman & van Essen, 1991; Jeannerod, 1999; Mesulam, 1998). The binding problem addresses the question of how these different features become a temporary coherent representation (Treisman, 1996, 1998, 1999). Some evidence suggests that the various perceptual and action features of an event are temporarily bound by episodic bindings (Henson et al., 2014; Hommel, 1998, 2004; Kahneman et al., 1992; Treisman, 1996, 1999). Treisman refers to this temporary binding of perceptual features as an “object file,” a temporary episodic representation of the object that contains the traces of the distributed feature representations (
A commonly used measure to identify bindings between stimulus features (S-S binding) or between stimulus features and response features (S-R binding) is
S-S and S-R binding
Bindings between stimulus features have already been demonstrated for a variety of features. Partial repetition costs have been found for visual stimulus features such as shape and colour, word identity, letter and picture identity, and facial features, as well as for real and abstract objects (e.g., Colzato et al., 2006; Hommel, 1998; Kahneman et al., 1992; Keizer et al., 2008; Tipper & Cranston, 1985; Waszak et al., 2003). Partial repetition costs have also been shown for auditory features such as pitch and loudness, vocal features, and sound identity, as well as for spatial features of visual and auditory stimuli (e.g., Herwig & Waszak, 2012; Hommel, 1998; Maybery et al., 2009; Zmigrod & Hommel, 2009). Furthermore, binding effects are found for both task-relevant and task-irrelevant stimulus features (e.g., Hommel, 1998; Horner & Henson, 2011; Mayr et al., 2009; Moeller et al., 2012; Waszak et al., 2003).
In addition to perceptual features, an event usually also consists of response-related features. Therefore, bindings can occur not only between stimulus features (S-S binding) but also between a stimulus and a response (S-R binding). Such S-R bindings have been demonstrated repeatedly over the years (see Spence & Frings, 2020, for a review). S-R bindings have been demonstrated for response locations, response features such as effector identity, valences of actions, and voice features, among others (e.g., Bogon, Eisenbarth, et al., 2017; Eder & Klauer, 2007; Hommel, 1998; Stoet & Hommel, 1999). 1
Integration of the feature duration
Besides the impressive number of features for which binding processes have already been shown (see previous sections), it is striking that the feature duration has barely been studied so far. This may be due to the fact that duration, unlike most features, is dynamic. The duration of a stimulus can only be fully defined when the presentation of the stimulus ends, i.e., the feature duration constantly re-updates itself. In contrast, a static feature, e.g., colour or pitch, can be defined upon its first appearance. Another characteristic that distinguishes time from other features is its anisotropy, which means that the direction of perceived time cannot be manipulated. We can change the dimension of features, such as sounds, continuously in both directions, i.e., we can turn a sound louder and softer, but we cannot influence the dimensional direction of the perceived flow of time (Riemer, 2015; Riemer et al., 2012).
In recent years, the feature
Bogon, Thomaschke, and Dreisbach (2017) showed that stimulus duration is temporarily integrated into auditory event files. More specifically, they investigated the binding of stimulus duration to pitch (Experiment 1) and to loudness (Experiment 2). For this purpose, they used classification tasks with one-to-one mappings; participants had to respond to a high-pitch tone (loud tone) with a right keypress and to a low-pitch tone (soft tone) with a left keypress. The stimuli could appear either for a short or longer duration. Both experiments showed partial repetition costs, indicating the integration of stimulus duration into auditory event files. However, and relevant to the study presented here, due to the one-to-one mappings in the classification tasks used by Bogon, Thomaschke, and Dreisbach (2017), every stimulus shift (pitch or loudness) was also a response shift, and every stimulus repetition (pitch or loudness) was also a response repetition. Therefore, one cannot distinguish whether
Present study
The purpose of this study is to fill this gap and investigate whether stimulus duration in auditory event files is bound to another stimulus feature (S-S binding), to the response (S-R binding), or to both. To consider the stimulus features and the response features independently of each other, we used a many-to-one mapping design (see Giesen & Rothermund, 2011; Moeller et al., 2016; Moeller & Frings, 2014); two stimuli were assigned to each of two response keys. In Experiment 1, participants completed a sine tone classification task by responding to the two different low tones with the left response key and to the two different high tones with the right response key. All four stimuli (sine tones) could be presented either for a short (50 ms) or long duration (200 ms; cf. Bogon, Thomaschke, & Dreisbach, 2017). The feature
Experiment 1
Material and methods
Participants
An a priori power analysis (MorePower 6.0.4, Campbell & Thompson, 2012) revealed that to detect a two-way interaction effect size of
Thirty students (age
Apparatus and stimuli
The experiment was run in E-Prime (Version 2.0, Psychology Software Tools, Sharpsburg, PA, USA). The stimuli consisted of four pure sine tones (200 Hz, 400 Hz, 800 Hz, and 1,000 Hz) with two durations (50 ms and 200 ms), resulting in a total of eight stimuli. Instructions and stimuli were presented on a 21.5-inch monitor (Dell Inc., Round Rock, TX, USA). The sine tones were played via headphones (HD 201 Sennheiser, Wedemark, Germany) at a constant volume of 78 dB throughout the experiment. Participants were instructed to respond to the “very low” and “low” tones using the left response key and to the “high” and “very high” tones using the right response key (“Y” and “M” keys on a standard QWERTZ keyboard, many-to-one mapping), positioned centrally in front of the participant. This fixed assignment was chosen based on the results of Rusconi et al. (2006) because low tones tend to be associated with the left, and high tones tend to be associated with the right. The fixation cross was a plus sign (black, 28 pt, Courier New, bold), and feedback was only given for errors (“Fehler” in red, 18 pt, Arial, bold). All stimuli were presented on a grey background (RGB: 192, 192, 192).
Procedure
Each trial started with a fixation of 300 ms duration. Then, the target stimulus was presented either for 50 ms or 200 ms, accompanied by a blank screen that was visible until the response was given. The participants were able to give a response from the beginning of the target. After an inter-trial interval of 600 ms, the next trial started. If the answer was incorrect, an error message appeared for 1,500 ms (see Figure 1). The experiment consisted of two practice blocks: the first with 20 trials and the second with 40 trials, followed by five experimental blocks of 128 trials each. The trial order was randomised, with the constraint that in the experimental blocks, each possible factor combination
2
(

Trial procedure in Experiment 1.
Design and planned statistical analyses
Our design included three within-subject factors, each with two levels: Pitch (repetition vs. shift), Duration (repetition vs. shift), and Response (repetition vs. shift). Any shift in pitch, whether within a response category or across response categories, was coded as a pitch shift (e.g., very low—low = shift and very low—high = shift). As a repetition of the response-relevant feature necessarily had to be answered with the same response (response repetition), the combination “pitch repetition and response shift” was not present (see Table 1 for an overview of possible condition combinations). Therefore, instead of an overall three-factorial design, we used separate two-factorial designs to investigate the respective Feature × Feature and Feature × Response interactions. For investigating S-S binding, we conducted a 2 (
Possible condition combinations in Experiments 1 and 2.
Furthermore, we calculated the
All aforementioned analyses were planned and determined before the experiment, i.e., a priori. Raw data files associated with this article can be found online (https://doi.org/10.5283/epub.58056).
Results and discussion
We analysed data from the five experimental blocks, excluding the first trial of each block from the analysis. Error trials (6.71%), trials following an error trial (7.39%), trials with extreme RTs < 100 ms or >8000 ms
5
(0.01%), and trials with RTs deviating more than three
S-S binding
RT data
We conducted a 2 (

Mean RTs and errors of Experiment 1 as a function of Pitch x Duration—S-S binding.
Error rates
An analogous ANOVA for errors yielded a significant main effect of Pitch,
S-R binding
RT data
We conducted a 2 (

Mean RTs and errors of Experiment 1 as a function of Response × Duration—S-R binding.
Error rates
An analogous ANOVA for errors yielded a significant main effect of Response,
Binding effect
Post hoc one-sample

Mean binding effects for RTs and errors of Experiment 1.
A similar pattern has been observed for the error rates (see Figure 4, right panel). Post hoc one-sample
The results of Experiment 1 suggest that the task-irrelevant stimulus
The conclusion that the duration of stimuli is in general not bound to other stimulus features, however, seems premature. We, therefore, decided to conceptually replicate Experiment 1 by replacing the sine tones with more meaningful sounds of musical instruments to create a context of music. The reasoning is that in the domain of music, duration is a critical and informative feature because the duration of single sounds is the essence of rhythm. Even though
Experiment 2
Experiment 2 was an exact replication of Experiment 1, except that the four sine tones were replaced by four different musical instruments. If the (potential) relevance of duration for a given stimulus set modulates its binding with other stimulus features, then we should observe binding effects between duration and response, as well as this time between duration and the stimulus.
Material and methods
Participants
Thirty students (age
Stimuli and procedure
The procedure of Experiment 2 mirrored that of Experiment 1, with the following modifications: the stimuli consisted of four instrument sounds (violin, clarinet, guitar, piano; all at pitch C4) with two durations (70 ms and 300 ms 6 ), resulting in a total of eight stimuli. Participants were instructed to respond to two instruments with the left key and the other two with the right key. The assignment of instruments to response keys was balanced across all participants. In case of an error (participant pressed the wrong key), the word “error” appeared. If there was no response for more than 3,000 ms, the error feedback “too slow” appeared (both: white, 18 pt, Arial, bold; see Figure 5). The stimuli were created with the program GarageBand (Apple, 2018) and cut to the required length with the program Audacity (Audacity Team, 2021).

Trial procedure in Experiment 2.
Design and planned statistical analyses
We used the same design as in Experiment 1 but replaced the two low and two high sine tones with two different musical instruments each. Thus, our design again included three within-subject factors, each with two levels: Instrument (repetition vs shift), Duration (repetition vs shift), and Response (repetition vs shift). Any instrument shift, regardless of key assignment, was coded as an instrument shift (e.g., instrument 1—instrument 2 = shift and instrument 1—instrument 3 = shift). As a repetition of the response-relevant feature necessarily had to be answered with the same response (response repetition), the combination “instrument repetition and response shift” was not present (see Table 1 for an overview of possible condition combinations). Therefore, instead of an overall three-factorial design, we used separate two-factorial designs to investigate the respective Feature × Feature and Feature × Response interactions.
For investigating S-S binding, we conducted a 2 (
All aforementioned analyses were planned and determined before the experiment, i.e., a priori. Raw data files associated with this article can be found online (https://doi.org/10.5283/epub.58056).
Results and discussion
Preprocessing was exactly the same as in Experiment 1, with one exception for extreme reaction times due to the time limit to respond. Error trials (7.57%), trials following an error trial (8.92%), trials with extreme RTs < 100 ms or > 3,000 ms (0.01%), and trials with RTs deviating more than three
S-S binding
RT data
We conducted a 2 (

Mean RTs and errors of Experiment 2 as a function of Instrument × Duration—S-S binding.
Error rates
An analogous ANOVA for errors yielded a significant main effect of Instrument,
S-R binding
RT data
We conducted a 2 (

Mean RTs and errors of Experiment 2 as a function of Response × Duration—S-R binding.
Error rates
An analogous ANOVA for errors yielded a significant main effect of Response,
Binding effect
Post hoc one-sample

Mean binding effects for RTs and errors of Experiment 2.
For the error rates (see Figure 8, right panel), the post hoc one-sample
Post hoc analysis
Given the discrepant results concerning the S-S interaction of the two experiments, the question arises whether the differences are true or due to a lack of power to detect a smaller effect in Experiment 1. Therefore, we conducted an additional ANOVA with
The results of Experiment 2 partially confirmed and extended the findings from Experiment 1. As observed in Experiment 1, partial repetition costs for duration-response binding occurred in both RTs and error rates. In contrast to Experiment 1, the Instrument × Duration interaction was also significant, indicating S-S binding. Together with the post hoc analysis, comparing S-S binding between experiments, this suggests that the potential relevance of duration for a given stimulus set (here, sine tones vs. musical instruments) has an effect on binding.
General discussion
This study aimed to determine whether stimulus duration is bound to another stimulus feature, to the response, or to both. The results suggest robust S-R binding of duration and response, whereas S-S binding of duration to another stimulus feature (indicated by an S-S interaction) was only found in Experiment 2 with four different musical instruments but not in Experiment 1 with four different sine tones. More precisely, in Experiment 1, we used four different sine tones as the stimulus set, resulting in a Duration × Response interaction, indicating partial repetition costs and, thus, duration-response binding (S-R). The Pitch × Duration interaction, on the contrary, did not reach significance and was associated with a negligible effect size estimate (
The robust results of S-R binding in both experiments are consistent with previous findings in the S-R binding literature, which have repeatedly shown that stimulus features, such as colour, shape, location, size, word identity, and sine tones are bound to the response (e.g., Hommel, 2007; Hommel & Colzato, 2004; Horner & Henson, 2009, 2011; Moeller et al., 2015; Moeller & Frings, 2019a; Mordkoff & Halterman, 2008; Rothermund et al., 2005; Schöpper & Frings, 2022; Zehetleitner et al., 2012). Previous findings demonstrating the temporal integration of duration into auditory event files (see Bogon, Thomaschke, & Dreisbach, 2017) left open the question of whether duration is bound to the stimulus (S-S) or to the response (S-R). In the study of Bogon, Thomaschke, and Dreisbach (2017), each repetition/shift of the task-relevant stimulus feature (Experiment 1: pitch; Experiment 2: loudness) also involved a repetition/shift of the response, making it difficult to conclusively determine the binding of duration. The results of this study (S-R binding in both experiments) indicate that duration is reliably bound to the response, at least within auditory files. Considering that in the experiments by Bogon, Thomaschke, and Dreisbach (2017), as well as in Experiment 1 of this study (S-R binding and no clear evidence for S-S binding), the stimulus set consisted of pure sine tones, it is reasonable to assume that the integration of stimulus duration occurred due to S-R binding.
One might argue that the lack of a significant interaction between pitch and duration (S-S binding) in Experiment 1 was due to grouping. Specifically, the two low tones and the two high tones could have been grouped into overarching categories of low versus high. This could perfectly explain the lack of S-S binding (see, e.g., Bogon, Eisenbarth, et al., 2017; Dreisbach & Haider, 2008; no binding for task-rule mapping). However, responses at pitch shifts (between low and very low and between high and very high) were significantly slower (by almost 100 ms) than pitch repetitions. This is hard to reconcile with the view that low and very low (and high and very high) pitches were grouped into low- versus high-pitch tones. However, we admit that grouping cannot entirely be ruled out and might have contributed to preventing a binding effect.
However, we propose an alternative explanation, namely that the binding of duration depends on the specific context (artificial vs. musical). This means that the integration of the stimulus duration to another stimulus feature depends on the type of stimulus set. More specifically, the results of Experiment 2 suggest a potential role of the relevance and informational value of the duration for that stimulus set, even if the duration itself is not task-relevant. For a stimulus set of pure sine tones (Experiment 1: no S-S binding), the stimulus duration contains no potentially relevant information, but for a stimulus set of musical instrument sounds (Experiment 2: S-S binding), it does. This is because, in the domain of music, the duration of individual sounds constitutes the essence of rhythm (Hauser & McDermott, 2003; Honing, 2013). One could argue that in Experiment 1, individual tones of different durations were already presented and, thus, the duration should already have had relevance for the stimulus set. However, it is essential that the presentation of such single artificial sine tones does not create a context of music. In contrast, the sounds of real musical instruments do create a musical context and, thus, a stimulus set for which duration is potentially relevant and informative. In Experiment 1, such a context was not present and, thus, no S-S binding emerged or was not strong enough to be detected in the results. This would mean that the binding of duration as a stimulus feature to another stimulus feature, at least in the auditory context, depends on the potential relevance of duration to the stimulus set and, thus, the indirect task-relevance attributed by the individual. At first glance, our findings seem to correspond to Colzato et al. (2006), who showed that bindings between colour and object are stronger for naturally occurring feature combinations like yellow banana and red strawberry. However, and different from their approach, we did not use musical sounds that have long-learned associations with either long or short durations. Instead, we suggest that some feature categories might more easily bind to each other than others (like durations to musical sounds as opposed to durations to single artificial sine tones).
To sum up, we found robust S-R binding between duration and response in both experiments, independent of the stimulus set. The findings regarding S-S binding are not yet clearly interpretable but suggest that binding between different stimulus features can be modulated by the potential relatedness of the features involved. This is indicated by the lack of S-S binding in Experiment 1 (where duration has no informational value for artificial sine tones) and the occurrence of S-S binding in Experiment 2 with musical sounds, for which duration is always informative. This shows once again that context plays a role in binding processes, not only as a possible feature that can be integrated (e.g., Benini et al., 2023; Frings & Rothermund, 2017) but also as a moderator that modulates S-S binding. Either way, the results of this study provide new impetus to investigate binding mechanisms in the musical context in more detail.
Footnotes
Acknowledgements
The authors thank Anna-Lena Münz and Ellen Burkhardt for their help in collecting data.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Research involving human participants and/or animals
All procedures performed in studies involving human participants were in accordance with the ethical standards of the National Research Committee and with the 1964 Helsinki Declaration and its later amendments.
Informed consent
Informed consent was obtained from all individual participants included in the study.
