Abstract
A movie taken from the front window of a running train, with zooming in and out, has been popularly acknowledged as a perceptual illusion such that the train motion is perceived as much slower when zoomed in. This is, however, not a real illusion because the image speed varies as a function of the focal length of the lens. This could be a meta-illusion, that is, an illusory sense of illusion, that might reflect a lack of understanding of how zooming changes the geometrical structure of the image.
A recent series of video clips posted by the second author, Akiyoshi Kitaoka, to social networks in 2022 has been welcomed by popular audience as an intriguing visual illusion. The videos were taken through the front window of a running train, which give us impressions of slowing down when zoomed in, and speeding up when zoomed out, even though the train speed was roughly constant (Figure 1 and Supplemental Movie S1). Here we show, however, that this is not a real visual illusion because the image speed actually changes with the focal length of the lens, and that it could be more of a meta-illusion, that is, an illusory sense of experiencing a visual illusion that might reflect people's lack of understanding how the zoom lens works.

Two captured frames from Supplemental Movie S1 (taken by Akiyoshi Kitaoka). The train seems to be running faster in the wide-shot (left) than in the close-up (right) video. We can see that the catenary poles look denser on the right, which implies slower image motion in the video at a constant train speed. The right image has been scaled and pasted into the approximately corresponding part of the left (red rectangle), demonstrating the smaller field of view of the close-up picture.
In a close-up picture, 2D projections of objects which are equally spaced in the 3D scene are denser than in a wide-shot picture, such as the catenary poles in Figure 1. If the poles are equally spaced in the scene, a shorter distance between the projections of two poles in the right picture than in the left picture implies slower image speed in the right picture than in the left picture, although it is not easy to confirm distances in the actual scenes.
To understand the image speed under the condition of constant traveling speed, we consider how the image size of an object changes as a function of distance, which can be directly assessed in terms of the lens optics. Figure 2(a) shows the simulated image size of a 5 m object (approximately the height of the overhead wires from the ground) as a function of the distance. The relationship between the image size (L′) and the object size (L) is expressed as L′ = [f/(u − f)]L, where f is the focal length of the lens and u is the distance from the lens to the object. This relationship is derived from a simple pinhole model L′ = (v/u)L added with an effect of the focal length of the lens (f) by applying the lens equation 1/f = 1/u + 1/v, where v is the distance from the lens to the image. More precise models of the actual lens, which we do not know, would not change the results drastically.

(a) The image size of a 5 m object with lenses of f = 50 and 300 mm. (b) First derivative of the image size in (a), as a proxy to the image speed. Areas with colored backgrounds indicate an example of comparable pair of distance changes for the two lenses in each plot (see text for more details).
Figure 2(b) plots the differentiated curves of Figure 2(a). It is obvious that the image size is not scale invariant, with the size change being larger as the objects come closer. This readily explains how the image speed becomes faster as the target approaches at a constant speed. A 300 mm lens yields larger changes than a 50 mm lens within the same physical space, but we should also note that a lens of a longer f has a smaller field of view that clips the image at a farther distance (see Figure 1). Digital zooming directly clips and magnifies the original picture, but the effects are the same except for the loss of resolution. For example, an object yields the same image size with 50 and 300 mm lenses at the left end of the pale-colored areas of Figure 2(a) (10 and 60 m, respectively). Then, by receding 10 m to the right of the colored areas, the change of the image size would be 3.5 times larger with the 50 mm lens than with the 300 mm lens, as indicated by the horizontal dotted lines.
Figure 3(a) shows a simulated image of a fixed rectangular frame (such as the rectangular frame formed by a catenary pole in Figure 1) as the train travels at a constant speed. These figures overtly show the temporal changes of a single object, sampled at a constant time interval and superimposed onto a single image. Therefore, unlike Figure 1, the separations between two rectangles directly represent the image speed. Note that the physical distance between any two adjacent rectangles is the same, while the absolute distance is six times as long in the bottom as in the top panel. The 50 mm lens yields faster image motion than the 300 mm lens.

(a) Simulated size changes of a rectangular frame in the videos, taken with a 50 mm focal length lens (top) and a 300 mm focal length lens. (b) Rectangles that are enlarged exponentially from the center (top), and a magnified image of its central part (bottom). See also the movie versions (Supplemental Movies 2 and 3).
Perceptual speed constancy refers to an accurate perception of speed despite such variability in the image speed with distance. Let aside the discussion on our ability of perceptual speed constancy, which could be imperfect when a target is approaching (e.g., Rushton & Duke, 2009), it is essential that accurate distance measures are necessary for speed constancy (e.g., Rock et al., 1968). When the varifocal lens screws up distance information, it is natural that speed constancy, if any, is not retained so that the perceived speed depends on the image speed. In this sense, it is not really an illusion that we perceive speeds differently through lenses of different focal lengths.
As noted above, however, this phenomenon has been appreciated as an illusion by many people. We speculate that an implicit assumption of constant scene structure, irrespective of zooming, could be the source of the illusory sense of having an illusion (or a “meta-illusion”). Figure 3(b) illustrates a hypothetical fractal-like condition where the image can be magnified to be the same image, with the pictured size being assumed to be an exponential function of distance. The central part of the bottom panel is lacking in order to illustrate the magnification. This figure does not represent the proper linear perspective of equally spaced rectangles, but we tend to feel it does, and we might even feel that it is more natural than Figure 3(a). The assumption of a constant scene structure might be reasonable unless an artifact (zoom lens) is introduced. A related phenomenon is a perceptual compression or expansion in depth in pictures taken with different focal lenses, as we can see in the pole densities in Figure 1. This perceptual compression is caused by a failure in compensating for incorrect viewing distances (Banks, Cooper & Piazza, 2014). In either case, it is natural that such an artificial effect is not compensated in perception. Interestingly, Banks et al. (2014) did not call the compression/expansion effect an “illusion.” The speed effect in videos has been enjoyed more as an illusion, maybe simply because the speed difference in the video is not so obvious as the image compression.
One could still call this phenomenon an illusion as it involves a discrepancy between physical and perceptual reality, if the shape distortion in depth or even the sense of close distance with a telescopic lens is also called an illusion. In any case, however, there is no trick other than the well-understood optical effect.
Supplemental Material
Supplemental Material
sj-docx-1-ipe-10.1177_20416695231187800 - Supplemental material for The zooming-speed illusion: A meta illusion?
Supplemental material, sj-docx-1-ipe-10.1177_20416695231187800 for The zooming-speed illusion: A meta illusion? by Hiroshi Ashida and Akiyoshi Kitaoka in i-Perception
Footnotes
Author contribution(s)
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This work was supported by Japan Society for the Promotion of Science Grant-in-aid for Scientific Research (21H04426 to Akiyoshi Kitaoka and Hiroshi Ashida and 19K03367 to Hiroshi Ashida).
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
