Abstract
Earlier research has shown that seven-month-old infants prefer to look at real objects over their referents. Which visual cues determine that preference? Motivated by research on adult observers highlighting the significance of motion parallax over other depth cues contributing to a sense of presence and place, we tested the hypothesis that motion parallax alone is sufficient to cause preferential looking to real objects in infants. We presented pairs of displays of toys in different formats: (a) The real three-dimensional toy; (b) a realistic image of that toy presented on screen; (c) the same image, but with added depth-from-motion-parallax. Infants preferred (a) over (b) (57% vs. 43%,
How to cite this article
Troje, N. F., Preißler, L., & Schwarzer, G. (2025). Motion parallax allows 7-8-month-old infants to distinguish pictures from their referents.
Introduction
Pictures, in the context of this study, are projections of real three-dimensional (3D) objects or scenes on a canvas, a piece of paper, a computer screen or similar two-dimensional (2D) surfaces. Human observers respond differently to real 3D objects than to pictures of such objects. Pictures are rarely confused with their referents, that is, the real object or scene they depict. They are perceived as depictions of a 3D object or scene, conveyed by some kind of medium which may adopt a planar shape, but nevertheless is itself an actionable object in 3D space. A defining feature of human picture perception is our ability to perceive medium and depiction simultaneously (Wollheim, 1998). The depiction may represent something that exists elsewhere, but it is perceived as a representation that is clearly distinct from the thing or scene that is being depicted.
That is different when we look out into the world itself. Here, perception is experienced as direct. We don’t perceive the process of mediation that continues inside our body and includes the 2D projections on the photoreceptor layers of our eyes, the transduction of light signals into graded neural responses, the subsequent computations in our retinae, and the generation of action potentials and their propagation into the visual cortex where they undergo further transformation, eventually giving rise to visual experience.
The experiential difference between perception of pictorial visual stimuli and the direct experience of objects in visual space is reflected in both behavioral and neuronal responses in the observer (Snow et al., 2023). For instance, real, tangible objects are more readily detected than their 2D images (Korisky & Mudrik, 2021), memory encoding and recall for solid objects is more efficient than for image displays (Snow & Culham, 2021), and real objects attract attention more than images do (Gomez et al., 2018). Neurophysiologically, we see differences in location and degree of haemodynamic responses as measured with fMRI, but also in how these adapt to repeated presentations of similar stimuli. Habituation effects that are common in response to pictures are absent or at least much weaker when participants are presented with real objects (Snow et al., 2011).
A question that arises from these observations is whether the observed behavioral differences and the associated physiological responses are primarily driven by differences in the sensory input, for example, different information provided by depth cues such as binocular disparity or motion parallax, or whether they are due to resulting differences in affordances such as graspability, actability, or edibility. Some recent findings suggest the latter. For instance, Bushong et al ( 2010) employed an auction paradigm to measure how much monetary value participants assign to popular snack items that were presented as text, as pictures or as real objects. Participants assigned about 60% more value to the real objects as compared to pictures. Interestingly, after placing a clear glass plate over the 3D items, the perceived value of the glass-covered objects became similar to the value assigned to pictured objects. Müller (2013) repeated Bushong's experiment, but instead of placing a glass barrier between observer and food items, he added heavy weights to the participants’ wrists, impairing their ability to grasp the objects while keeping them visible in exactly the same way as in the unconstrained condition. Reducing graspability in that way also reduced the difference in values assigned to real versus depicted objects.
The conceptual distinction between pictures as representations and their real-world referents probably develops relatively late during human visual development. DeLoache et al. (1998) found that 9-months-old infants consistently tried to grasp objects in photos as if the photos were actual objects. 18-months-olds, in contrast, behaved dramatically different and almost never attempted to grasp the photographed object.
Grasping behavior of young infants towards photos doesn’t necessarily mean that they are unable to discriminate pictures from their referents. It may simply indicate that they have not worked out the semantic nature of the perceived difference and the resulting changes in reasonable approach behavior. It has been suggested, in fact, that even newborn babies demonstrate the sensory capability to discriminate between real objects and their pictures (Slater et al., 1984). Rose (1977) presented two studies that indicate that by 6 months of age, infants can visually differentiate 3D stimuli from their pictorial representations, and can also transfer information between objects and their pictures. With 9-months of age, infants can robustly generalize from photographs and even from line drawings to real objects while maintaining the ability to discriminate between the two classes of visual stimuli (Jowkar-Baniani & Schmuckler, 2011; Shinskey & Jachens, 2014). Shinskey and Jachens (2014) had used preferential reaching in their experiments. Jowkar-Baniani and Schmuckler (2011) had used a habituation paradigm.
The latter was also used by Gerhard et al. (2016). Here, two groups of 7-months-old and 9-months-old infants were habituated to either real toys or realistic photos of the same toys. They were then tested with either novel toys or pictures of novel toys. During the initial presentations both age groups looked longer at real objects than pictures and they habituated faster to real objects than to pictures. Even after habituation, infants’ looking behavior maintained a strong preference for real objects, regardless of whether they had habituated to photos or real objects.
Ziemler et al. ( 2012) presented 9-months-old infants with objects and pictures of objects under a glass cover in a false-bottom table. Infants grasped more often at the real objects. The glass plate didn’t seem to supress the real-object advantage, which differs from Bushong et al's (2010) observations with adult participants.
From the experiments described above it can be concluded that the sensory ability to assess similarities and differences between picture and real objects precedes the ability to comprehend the representational nature of pictures. This sensory competence is likely a prerequisite to develop the cognitive ability to respond to affordances such as actability, graspability or edibility that has been observed in adult participants (Bower, 1966; Bushong et al., 2010; Müller, 2013).
If infants can discriminate pictures from real objects, what kind of sensory cues are they using? The most prominent visual depth cues that distinguish pictures from their real referents are binocular disparity and motion parallax (Troje, 2019). In contrast to pictorial depth cues, when looking at pictures, these two depth cues respond to properties of the medium rather than conveying information about the structure of the depiction. They indicate to the visual brain the planar shape of the canvas or computer screen rather than depth and distance of the depicted objects. In general, they are therefore in conflict with pictorial depth cues which encode the depth of the depicted scene.
Young infants have been demonstrated to respond differentially to pictorial depth cues even at an age at which they are not able to use them to assign depth to a pictorial scene (Kavšek et al., 2012). There is also strong evidence that infants respond to both stereopsis and motion parallax at an early age when they are not yet able to use them to obtain distance, orientation, and general object constancy. Bower (1966) tested 6–8 week old infants with stereoscopic and monocularly presented stimuli with a viewpoint generalization paradigm and concluded that motion parallax was the most effective cue to depth, much more effective than pictorial cues and also superior to binocular disparity. von Hofsten et al. (1992) demonstrate responses to motion parallax already in 2-months old infants. Nawrot et al. (2009) show that 6-months old infants respond to random dot kinematograms where passive (rather than self-induced) motion parallax was the only cue available. The early predominance of motion parallax might mean that motion parallax, being a sensorimotor contingency (Gibson, 1988; Troje, 2019; Watson, 1966), is the initial depth cue in the visual development of infants.
We therefore hypothesize that the real-object preference observed at that age is due to more general exploratory behavior based on exploiting sensorimotor contingencies between movements that alter the observer's location in space and the resulting parallactic changes that such movements induce in the retinal projection. If that is the case, then the infants are expected to show the real-object preference even in cases, where the object isn’t in fact real, but is rendered to simulate self-induced motion parallax.
In this study, we used a preferential looking paradigm similar to Gerhard et al. (2016) to test if motion parallax is sufficient to elicit the real object advantage in 7 and 8-months old infants. Three groups of children was presented with a binary choice between two out of three types of stimuli: (a) a real toy object, (b) a computer graphical rendering of the same toy presented on an iPad screen, (c) the same computer graphical rendering, presented on the same screen, but this time rendered with MPDepth, a technique that simulates the effects of motion parallax by updating the rendered image in realtime to coherently alter the infant's current viewpoint of a simulated 3D object (Troje, 2023).
Methods
Participants
The final sample consisted of 60 healthy 7- to 8-month-old infants (mean age = 7.57 months, SD = 0.065 months; 34 girls and 26 boys). The data from additional 27 infants were excluded from the full sample due to crying (9), technical error (3), experimenter error (2), failure of fixating on either object in at least 4 of the 16 test trials (7). Infants and their caregivers were recruited from local birth registers obtained from the city of Giessen and surrounding area. Participants were predominantly of Caucasian ethnicity.
The study was conducted in accordance with the German Psychological Society (DGPs) Research Ethics Guidelines. The Office of Research Ethics at the University of Giessen approved the experimental procedure and the informed consent protocol. For each infant, written consent for participating in the study was obtained from the caregiver prior to their participation in the study.
Stimuli
The stimuli used in this study consisted of real objects and virtual images of these objects, presented on iPad screens (Figure 1). The real objects were similar to the ones used in preceding studies by Gerhard et al. (2016; 2021) and consisted of four different, aged-based toys (bear, car, frog, mouse). Each of these real objects was fixated at the bottom of a wooden box (width 19.5 cm, height 27 cm, depth 20 cm). The wooden boxes were closed on all sides except for the front side, which featured a narrow wooden frame (width: 1 cm) through which participants could see the objects. The boxes created the visual impression of the objects being presented in a 3D room. In addition to the real objects, we used the MPdepth application (Troje, 2023) on two 11-inch iPads Pro tablets to display photorealistic renderings of the objects and the inside of the boxes. The iPads were inserted into the wooden frame of boxes identical to the ones that contained the real objects. The iPad stimuli were created such that viewpoint, brightness and lighting of the objects and the room were the same as for the real stimuli. In addition, the MPdepth application allowed to present the stimuli either with motion parallax, creating the impression of a 3D object, or without motion parallax, creating the impression of a 2D image of the object. Motion parallax was simulated by updating the rendering on the screen in real-time such that the inside of the box and the toy inside always appeared from a viewpoint that reflected the infant's current location in front of the screen.

Objects used in the study: car, bear, frog and mouse. left: iPad rendering; right: real object. When rendered on the iPad, we implemented two different viewing conditions. In one, the picture was stationary as shown in the figure. In the other, the rendering updated in real time to simulate the view onto a stationary, 3D object from the current viewpoint of the infant.
The latter was achieved by exploiting the ability of the iPad's RGBD camera to obtain accurate estimates about location and orientation of the infant's head and eyes in front of the screen. That location was then used to steer a virtual camera which was rendering a computer graphics scene carefully modeled to match the wooden box and the toy including material properties and illumination conditions.
The final stimulus set consisted of 4 real objects, 4 matched virtual iPad objects without motion parallax, and 4 matched virtual iPad objects with motion parallax.
Apparatus
The preferential looking task was conducted in a rectangular cabin with white walls and an open front to accommodate a caregiver and the infant. The infant was seated on the caregiver's lap at a distance of approximately 60 cm from the stimuli, which were beyond the infants’ reach. Both experimenters were located outside of the cabin and hidden from the infants’ view. During testing, one experimenter controlled the trial length by opening and closing a curtain covering the rear wall of the cabin, while a second experimenter switched the wooden boxes with the stimuli. By opening the curtain, a 42.5 × 32 cm sized window in the rear wall of the cabin appeared which provided a view of two stimuli in the wooden boxes. Behind the boxes, a dark cloth closed the view to the experimenters and the lab. A video camera (Everio) attached to a peephole in the back of the cabin recorded the entire session. The camera was connected to a screen from which the experimenters could watch the infant during the whole experiment.
Procedure
All infants were tested in individual sessions. To prevent parents from influencing their babies’ fixation times they were asked to refrain talking for the duration of the experiment and to cover their eyes with tinted sunglasses. The sunglasses additionally prevented the iPad RGBD camera from capturing the parents’ head movements to generate motion parallax.
Every infant completed one out of three conditions, each consisting of 16 trials. In each trial, we exposed the infant to one stimulus pair, consisting of the same object in two different object formats, with the wooden boxes standing directly next to each other. In condition 1, one box was showing a real object, and the other an image of the same toy displayed on an iPad screen that now covered the front of the wooden box (Figure 1). Infants in condition 2 saw pairs of the screen-rendered images with motion parallax together with a screen-rendering without motion parallax. Infants in condition 3 saw pairs of the real objects together with a screen-rendered image with motion parallax. The order of the four objects (bear, car, frog, mouse) was counterbalanced across infants.
Each trial started with the opening of the curtain and ended after 15 s with the closure of the curtain. The experimenter then prepared the next stimulus pair, and the next trial started immediately. Each stimulus pair was presented four times while counterbalancing the positions of the two boxes involved. We analyzed infants’ looking behavior frame by frame by using the videotapes of the session. A second observer scored 50% of the data, showing an inter-observer reliability of ICC(2,2) = 0.85,
Results
We first looked for effects of age and object identity in every condition but found no significant differences in looking time behavior between 7- and 8-months-old infants or between the four objects (bear, car, frog, mouse) (all

Results of the preferential looking task. Mean preference scores [%] for two object formats per condition. Each line connects the two preference values of one infant.
Discussion
Our results replicated the ones already demonstrated by Gerhard et al. (2016). When given a choice between real objects and pictures of them, the 7- to 8-month-old infants tested in the current study preferred to look at the former. In addition to the earlier study, we also show that motion parallax alone is sufficient to elicit this preference behavior. Infants prefer to look at visual stimuli under viewing conditions that provide them with control over their viewpoint in the same way as experienced when looking at real 3D objects. Stereoscopic depth is not necessary for this preference.
We observe this behavior at an age at which infants are unlikely to make use of that distinction to control affordance-based behavior. In our study, we did not encourage infants to grasp the stimuli, nor did we record when they tried to grasp. However, we know from the work of others (e.g., DeLoache et al., 1998) that infants at that age, and even older, still grasp at pictorial presentations. Affordance-based differentiation has not yet fully developed at that age.
Differential affordance-based behavior toward pictures and objects of course requires that the infant already possesses a sensory mechanism that can distinguish the two stimulus classes. Early learning might well be driven by sensorimotor exploration directly as has been suggested (and shown in many contexts) by others (Gibson, 1988; Jacquey et al., 2020; Piaget, 1952; Watson, 1966). The preference for real objects, but also for the MPDepth displays in our study, could then be explained by the enhanced sensory feedback infants receive in the form of motion parallax in response to their own movement as they explore them.
We don’t know how infants would behave if we were presenting them with renderings of the toys displaying stereoscopic depth without motion parallax. Stereoscopic depth alone does not provide obvious sensorimotor contingencies. We didn’t engage in such experiments because we are lacking the means to present artificial, stereoscopic-only stimuli to infants in a non-intrusive way that would be comparable to our technique for creating parallax-only stimuli. However, assuming that sensorimotor contingencies drive exploratory behavior in infants and that this is what makes the MPDepth stimuli as interesting as real objects, we predict that stereoscopic stimuli without motion parallax would not demonstrate the real-object advantage. We hope that future research will test this hypothesis, although this might have to wait until powerful light-field display technologies or other convincing autostereoscopic technologies will reach market maturity.
Footnotes
Author Contribution(s)
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Canada First Research Excellence Fund (Vision: Science to Application), Natural Sciences and Engineering Research Council of Canada (Discovery Grant), Deutsche Forschungsgemeinschaft (IRTG: The Brain in Action).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
