Abstract
The cross-modal double flash illusion (DFI) refers to an interaction of multisensory signals wherein the presentation of a single visual flash accompanied by two cross-modal inducers, auditory or tactile pulses, results in observers reporting the presentation of two flashes. This phenomenon has become an exemplary case of the strong modulatory effect possible between cross-modal signals. However, it remains unclear precisely what about the different signals is interacting. The prevailing interpretation typically invokes interactions between different singular sensory representations—an explanation consistent with what would be expected in detection tasks. Here we investigated whether such a simple interaction could in fact account for the DFI. Using a paradigm similar to the original, we manipulated the apparent similarity of the cross-modal inducers such that they differed in attribute (1000 Hz pure tone or Gaussian noise) or modality (audio or tactile). When the inducer pair was the same, a DFI was found. However, if the inducer pair differed, the DFI was strongly mitigated, if not abolished. These results demonstrate that the DFI depends critically on the apparent similarity of the cross-modal inducers, something that wouldn't matter if the illusion were based on direct interactions of discrete sensory representations. We propose that the DFI is the result of interactions between multi-event temporal structures, rather than discrete sensory events. These structures are determined through within-mode grouping and compared and combined supra-modally to generate event representations. This type of interaction likely underlies other multisensory timing phenomena such as temporal ventriloquism.
