Abstract
Background and aims
Although young children's gaze behaviors in experimental task contexts have been shown to be potential biobehavioral markers relevant to autism spectrum disorder (ASD), we know little about their everyday gaze behaviors. The present study aims (1) to document early gaze behaviors that occur within a live, social interactive context among children with and without ASD and their parents, and (2) to examine how children's and parents’ gaze behaviors are related for ASD and typically developing (TD) groups. A head-mounted eye-tracking system was used to record the frequency and duration of a set of gaze behaviors (such as sustained attention [SA] and joint attention [JA]) that are relevant to early cognitive and language development.
Methods
Twenty-six parent–child dyads (ASD group = 13, TD group = 13) participated. Children were between the ages of 3 and 8 years old. We placed head-mounted eye trackers on parents and children to record their parent- and child-centered views, and we also recorded their interactive parent–child object play scene from both a wall- and ceiling-mounted camera. We then annotated the frequency and duration of gaze behaviors (saccades, fixation, SA, and JA) for different regions of interest (object, face, and hands), and attention shifting. Independent group t-tests and ANOVAs were used to observe group comparisons, and linear regression was used to test the predictiveness of parent gaze behaviors for JA.
Results
The present study found no differences in visual experiences between children with and without ASD. Interestingly, however, significant group differences were found for parent gaze behaviors. Compared to parents of ASD children, parents of TD children focused on objects and shifted their attention between objects and their children's faces more. In contrast, parents of ASD children were more likely to shift their attention between their own hands and their children. JA experiences were also predicted differently, depending on the group: among parents of TD children, attention to objects predicted JA, but among parents of ASD children, attention to their children predicted JA.
Conclusion
Although no differences were found between gaze behaviors of autistic and TD children in this study, there were significant group differences in parents’ looking behaviors. This suggests potentially differential pathways for the scaffolding effect of parental gaze for ASD children compared with TD children.
Implications
The present study revealed the impact of everyday life, social interactive context on early visual experiences, and point to potentially different pathways by which parental looking behaviors guide the looking behaviors of children with and without ASD. Identifying parental social input relevant to early attention development (e.g., JA) among autistic children has implications for mechanisms that could support socially mediated attention behaviors that have been documented to facilitate early cognitive and language development and implications for the development of parent-mediated interventions for young children with or at risk for ASD.
Note: This paper uses a combination of person-first and identity-first language, an intentional decision aligning with comments put forth by Vivanti (Vivanti, 2020), recognizing the complexities of known and unknown preferences of those in the larger autism community.
Keywords
Introduction
Young children with autism spectrum disorder (ASD) demonstrate the social communication difficulties characteristic of this neurodevelopmental disability in a variety of ways. Specifically, diagnosed children's difficulties may include less social play, exploratory behavior during play, and recognition of social cues (e.g., facial expressions and body language) compared to peers without ASD (Falck-Ytter et al., 2013; Jellema et al., 2009; Jordan, 2003; Ozonoff et al., 2008). Previous studies concerning autistic children's gaze behaviors documented that atypical visual attention, such as limited rates of sustained attention (SA) to referential cues (e.g., social gaze, gaze and point following, gestures, and joint attention [JA], defined as socially shared attentional experiences such as the parent and child visually attending to the same referent), have been proposed as a key contributor to these social communication development difficulties (Murias et al., 2018).
Since the line of research exploring differences in gaze behaviors between autistic children and TD children often (1) use stationary eye tracking in a computer viewing task or (2) use observing researchers to make inferences about the direction of looking, we know little about the everyday visual experiences of children with ASD and how such visual experiences are organized in a social context. Human gaze behaviors are constantly shaped and supported by incoming input from the environment, including social partners’ movements and changes generated by one's own body movement. Technological advancements—wearable head-mounted camera/eye-tracking systems—allow us to study moment-to-moment gaze behaviors while parent and child interact in a naturalistic play context, and recent work indicates how TD children's visual experiences are tightly coupled with parent's social input (Burling & Yoshida, 2019; Smith et al., 2015).
More specifically, in a parent–child interaction context, previous head-camera studies have shown that parents of young TD children actively hold objects close to their child's head and eyes, creating a clear view of the object. These moments are what we call “optimal viewing moments” (OVM). When parents also name the objects during these moments, we call these “visually optimal naming moments” (VONM). Another study further documented that 18-month-old TD infants are more likely to learn those object names when tested immediately after the object play (Yu & Smith, 2012). The increased SA is also attributable to the early scaffolding in which parents generate these clear viewing experiences (Burling & Yoshida, 2019; Sun & Yoshida, 2022). We also know that joint attention (JA) is essential for language and cognitive development (Brooks & Meltzoff, 2005; Tomasello & Farrar, 1986) and that caregivers, through their actions, provide effective nonverbal cues to intended referents (Gogate et al., 2006; Gogate & Bahrick, 1998, 2001; Koterba & Iverson, 2009). To understand how these critical visual experiences are generated for children with ASD, it is essential to characterize how the gaze behaviors of children with ASD are organized by parental input.
The few studies that have investigated these gaze behaviors in a social context have documented similarities in children's gaze behaviors across ASD and TD groups, such as attention to parent's face, toy objects, SA, and JA (Yoshida et al., 2019; Yurkovic et al., 2021). However, the results of one of the studies focusing on 36–60-month-old children and their parents also suggested group differences in parental input—in that parents of ASD children utilized more gestures, more closely monitored their child's face, and provided more scaffolding for their child's visual experience (Yoshida et al., 2019). One of the other studies using a head-mounted eye-tracking system directly measuring attentional behaviors in 24–48-month-old children found no significant differences in how autistic children and TD children distributed their visual attention, generated manual action, or coordinated their visual and manual experiences during play with their parents (Yurkovic et al., 2021).
The initial attempts of documenting everyday visual experiences of young children with ASD in a social context suggest no significant differences between autistic children and TD children, which is in contrast to the documented ASD-specific gaze behaviors in computer screen-based studies. Why are these group differences not shown in a live social interactive context? Previous observations of parent–child social interactions have documented that parents of autistic children and parents of TD children scaffold their child's play differently (Freeman & Kasari, 2013; Yoshida et al., 2019). For example, during an observation of a 10-min free play with children between 20 and 60 months old, parents of children with ASD were more actively engaged with their children than were parents of TD children (Freeman & Kasari, 2013). In this study, parents of autistic children were more likely to initiate play schemes (sequences made up of connected play-acts with connected toys), suggest and command more play-acts, and respond to their children's play-acts with a higher level play-act. In the same vein, another study with children 3–5 years old found that mothers of children with ASD used more physical contact, more high-intensity behaviors, and fewer social verbal approaches (Doussard-Roosevelt et al., 2003).
Communication abilities may also make a difference in terms of parents’ behaviors, as Kasari and colleagues (1988) found that among children aged 42–65 months, parents of children with ASD spent more time physically orienting their children to tasks and adjusting their own behavior differentially to their children's specific ASD symptoms (i.e., regulating their child's behavior less often and engaging in more positive feedback and mutual play when the autistic child had higher communication abilities). Similarly, in 10-min semistructured play interactions, more severe ASD symptoms predicted increased adversely affected parent–child interactions among children aged 4–14 years, including decreased coordination, communication, and responsitivity (Beurkens et al., 2013). Finally, parents of autistic children may provide more scaffolding for their children by producing more gestures and more closely monitoring their children's faces during play compared to parents whose children do not have this diagnosis, among children between the ages of 36 and 60 months (Yoshida et al., 2019).
The current literature raises the possibility that early visual experiences of children with ASD might be differentially supported in a naturalistic social context when compared with their TD counterparts. The present study explored this possibility by studying similarities and differences in children's (ASD and TD) and their parent's moment-to-moment dynamic gaze behaviors during parent–child interactive play, including micro-gaze behaviors such as saccades, fixation, and sustained attention. Studying early gaze behaviors during parent–child interactive play, where everyday communication and learning occurs, and testing how parents’ social input organizes early visual experiences of autistic children may provide the foundation for developing novel interventions that may ultimately support social communication outcomes.
Current aims and research questions
Our overarching aim in this exploratory study was to document the visual experiences of autistic children and TD children during live social interaction with their parents and explore how parents through their looking behaviors may differentially scaffold their children's visual experiences as a function of child's diagnostic outcomes (ASD vs. TD group). The present study recorded observations of parent–child play using head-mounted camera systems. These methods allowed for the measurement of precise, moment-by-moment visual experiences while also allowing the child to actively move around their environment and interact with a variety of stimuli and with their parents. Specific questions investigated were: 1) How are visual experiences different and similar between children with ASD and TD children during live social interaction with their parents? We hypothesized that visual experiences—the rate of SA, JA (and precise gaze behaviors considered in computer viewing tasks such as saccades, fixation, and attention shifting)—would also be similar across the groups. 2) How are visual experiences different and similar between parents of children with ASD and TD during live social interaction? We hypothesized that parents of ASD children will pay more attention to their children, while parents of TD children will pay more attention to objects. 3) Do parental visual experiences impact the rate of their child's JA? We hypothesized that parental visual experiences would predict JA experiences for both groups.
Methods
Participants
Observation data from 26 parent–child dyads with children between the ages of 3 and 8 years old were collected by our team and included in the present study, 13 of whom had parent-reported diagnoses of ASD (M = 5.69 years, SD = 1.81 years, 3 females). The other 13 were reported by their parents to have no history of language or developmental delays (M = 4.20 years, SD = 0.92 years, 7 females) and were considered to be TD children. The racial and ethnic backgrounds of the parents in the ASD group (12 mothers, 1 father) included White (5), Black or African American (1), Hispanic (5), and Asian (2). The ethnic backgrounds of the parents in the TD group (13 mothers) included White (6), Black or African American (2), Hispanic (3), and Asian (2). All families provided informed consent and the study and its procedures were approved by the Institutional Review Board of the university where the research took place.
Five of the children in the ASD group had their ASD diagnosis confirmed with the Autism Diagnostic Observation Schedule (ADOS) (Lord et al., 2000) prior to their lab visit and the other eight children had their ASD diagnosis confirmed with the Autism Diagnostic Observation Schedule, Second Edition (ADOS-2; Lord et al., 2012) during their lab visit by research-reliable administrators. Both the ADOS and ADOS-2 are semistructured assessments that allow for the observation of symptoms related to ASD during social, communicative, and play interactions with trained clinicians. The ADOS-2 is a revision of the ADOS, featuring updated coding items and descriptions, and algorithm items. The general structure, administration guidelines, and goals of the ADOS-2 remain largely unchanged from the ADOS. Children were administered ADOS/ADOS-2 Module 1, 2, or 3 based on their expressive language ability. All 13 participants’ ADOS/ADOS-2 scores exceeded the cutoff score for inclusion in the ASD group for this study (i.e., their scores were consistent with those of children who have been diagnosed with ASD) (Lord et al., 2012).
We used three vocabulary measures to characterize our sample further: the MCDI; (Fenson et al., 1993, 2007), the Peabody Picture Vocabulary Test Fourth Edition, which measures receptive vocabulary (PPVT-4) (Dunn & Dunn, 2007), and the Expressive Vocabulary Test Second Edition, which measures expressive vocabulary (EVT-2) (Williams, 2007). The MCDI is a parental vocabulary checklist consisting of early-learned words and sentences by TD children and has been used in populations of older ASD children to report expressive language ability (Plesa Skwerer et al., 2016). Although parents of all 26 children completed the MCDI, not all participants completed the PPVT-4 and EVT-2 due to changes in the experimental protocol (six children from the ASD group and eight children from the TD group completed the PPVT-4 and five children from the ASD group and eight children from the TD group completed the EVT-2). These measures were collected to characterize our sample and were not included in the current analyses (i.e., to test effects). Child age, gender, ADOS/ADOS-2 Module, and language scores for individual participants are reported in Table 1.
ADOS:n autism diagnostic observation schedule. ADOS-2: Autism Diagnostic Observation Schedule, Second Edition.a
MCDI: MacArthur Communicative Development Inventory: Words and Sentences (reported as the percentage of words known by each child as reported by the parent using a normative sample of words that are known by 50% of children by 30 months). PPVT-4: Peabody Picture Vocabulary Test Fourth Edition (reported as a standard score based on age). EVT-2: Expressive Vocabulary Test Second Edition (reported as a standard score based on age).
Bolded numbers represent the Mean (SD) as noted by the first column.
Equipment
There were two generations of head-mounted camera systems used to collect the data analyzed for this study. A small head-mounted camera (Watec WAT-230A, miniature (25 g) color video camera) was initially used to capture child-centered views alongside a wall-mounted camera that captured an overall view of the scene. The focal length of the lens is f2.1 mm, the numbers of effective pixels are 512 (H) · 492 (V), and the camera's visual field is 84 degrees (V) and 115 (H). This camera system was used for nine of the twenty-six parent–child dyads (five ASD and four TD). Furthermore, the second head-mounted camera system included two small cameras and an infrared light-emitting diode (LED) and weighed 51g in total. One of the cameras faced the child's or parent's right eye and recorded the pupil movements and corneal reflection while a head-mounted camera placed on the forehead recorded the visual field from the child's or parent's perspective (FPV: 54.4° horizontal by 42.2° vertical) (Figure 1). This head-mounted eye-tracking system was used for the other seventeen parent–child dyads (8 ASD and 9 TD). In addition, a wall-mounted camera and a ceiling-mounted camera were also used to capture an overall view of the scene for all of the parent–child dyads. All of these videos were later synced with the videos captured from the head-mounted cameras. All videos recorded by both camera systems were processed at a rate of 33 ms per frame; hence, the use of different camera systems does not impact the capacity of testing the present set of hypotheses. On average, the non-upgraded devices generated 9,477 (SD = 290.9) frames, while the upgraded devices generated 10,010 (SD = 155.6) frames for child gaze data during the live social interaction task. There was no difference in the number of frames for the play session between the two recording devices. In addition, the upgraded devices generated 10,007 frames (SD = 160.3) for parent gaze data during the live, social interaction task for a more precise estimate of gaze patterns and JA. Long and lightweight cables were used in both device types for the study to allow for free, unrestrained engagement during the play session.

(A) Wall-mounted view of the parent–child play session. (B) The first-person view of the parent–child play session of the child. The blue calibration dot indicates the ROI to which the child looked during any given frame. (C) Ceiling-mounted view of the parent–child play session. (D) The first-person view of the parent–child play session of the parent. The blue calibration dot indicates the ROI to which the parent looked during any given frame.
Procedure
The small room in which the play session took was located on the university campus of the principal investigator's institution and contained a child-sized table with a child-sized chair on either side for the parent and child, as well as the computer and eye-tracking equipment. The walls, door, and two-way mirror were covered by black curtains to limit distraction. The computer and the research assistants monitoring the equipment were on the opposite side of the room, further blocked by a black curtain during the duration of the play session. During the play session, the parent and child dyads sat at the table with a box of objects and were instructed to play as they normally would at home. Play sessions were 320 s (5 min and 20 s) long, which is consistent with previous studies utilizing head-mounted eye-trackers with young children during play with their parents (Borjon et al., 2021; Deák et al., 2018; MacNeill et al., 2021; Yu & Smith, 2013, 2017; Yurkovic et al., 2021; Yurkovic-Harding et al., 2022). Each play session featured eight objects, four of which were considered early learned nouns (bottle, bunny, car, and cookie), which are listed on the MacArthur Communicative Development Inventory: Words and Sentences (|Fenson et al., 1993, 2007), a developmental vocabulary checklist for children under 30 months old, and four of which were considered unfamiliar words (caliper, nylon, pipette, and strainer) that are not among the most frequently used English words. The unfamiliar words were added to the play session in order to increase task complexity for the purpose of maintaining the child's interest in the play session, as children are more likely to select and attend to unfamiliar objects (Henderson, 1981; Horst et al., 2011).
Data processing and coding
The recordings from the head-mounted camera systems and scene view videos (wall-mounted and ceiling-mounted camera views) were synchronized and imported into the Datavyu coding software where each of the target behavior variables was annotated by trained research assistant coders who had achieved >84% reliability in coding, were blind to the hypotheses and had over 70 h of prior coding training each (Datavyu Team, 2014). The trained coders went frame-by-frame of both the child and parent head-camera views and annotated the frequency and duration of each instance the child or parent looked at a region of interest (region of interest [ROI]: objects, child hands, parent hands, child face, and parent face). These behaviors were further annotated for saccades (<200 ms in duration), fixation (200–2000 ms), SA (>2000 ms), and frequency of shifts between ROIs (Burling & Yoshida, 2019; Campbell et al., 2014; Devillez et al., 2020; van Beers, 2007; Yu et al., 2019). JA was determined to be temporally aligned looking instances where both the child and parent focused on the same ROI. Operationalization of these variables was based on previous head-mounted eye-tracking studies in developmental and eye-tracking literature (Deák et al., 2018; Yoshida & Smith, 2008; Yu & Smith, 2013).
Data analysis
The first and second hypotheses concerning the group comparisons in the visual experiences (e.g., saccades, fixation, SA, and JA) to various ROIs of children and parents were tested by general linear modeling. The third hypothesis concerning the predictiveness of parents’ attention to ROIs upon their child's JA was tested by a set of linear regressions. An alpha level of 0.05 was used for all statistical tests.
Results
Children’s gaze behaviors: Gaze location
We first examined how children distributed their attention during the play session among ROIs in terms of frequency and duration measurements per group and summarized in Table 2.
Means and standard deviations (SD) of frequency (number of discrete looks) and duration (ms) of observed child visual experiences by group, averaged across participants within the ASD and td groups.
Autistic children, on average, spent 43.17% of the play-session duration looking at ROIs compared to non-ROIs (e.g., the wall, table, and ceiling) while TD children, on average, spent 50.01% of the play-session duration looking at ROIs compared to non-ROIs (t(24) = −0.67, p = .53).
We examined the effects of group (TD vs. ASD) and the attention allocations on ROIs (objects, parent's hands, child's hands, parents’ face) by linear mixed models on both frequency and duration of children's attention. First, the mixed model on frequency counts had fixed effects for groups and types of ROIs and random effects for participants and ROIs within participants to account for the variability in dyads across ROIs. The frequency model revealed no group difference in attention throughout the play session, ß = 20.96, t = 0.63, p = .528 (see the full model in S1 in Supplementary Materials). Children with and without ASD shared a similar ROI distribution in the following order, from greatest to least: objects, parent's hands, child's hands, and parent's face. Specifically, the post-hoc Tukey contrasts showed that both groups attended the target objects significantly more than other ROIs, following with attention to their parent's hands. In addition, children with ASD attended more on parent's hands than their own hands (ß = 80.39, t = 2.86, p = .028), while TD children had no attention difference between parent's and child's hands (ß = 67.15, t = 2.30, p = .109; also see Figure 2A and S2).

The distribution of children’s gaze behaviors. (A) Children’s gaze by ROIs counted by frequency. (B) Children’s gaze by ROIs counted by duration (second [sec]). (C) Children’s gaze by gaze type counted by frequency. (D) Children’s gaze by gaze type counted by duration (sec). A, Children’s gaze counted by frequency across ROIs. Significance levels: marginal, * <.05, ** <.01, *** <.001.

The frequency distribution of children’s gaze shifting to objects (A) and parents’ gaze shifting to objects (B). A, No difference between children with and without ASD in the amount of gaze shifting; children in both groups were significantly more often to shift attention when the prior location was the parent’s hands than the child’s hands, and the parent’s face was the least. B, Parents of TD children had significantly more gaze shifting than parents of children with ASD did; parents in both groups had significantly more shift to objects when the prior location was child’s hands, followed by comparable amounts of shifting from the child’s face and the parent’s hands.
Similarly, the mixed model on gaze duration with fixed effects for groups and types of ROI and a random effect for participants was run. Children from both groups had no difference in the duration of attention on ROIs: ß = −6.32, t = −0.47, p = .639. Notably, there was an age effect in the duration of ROIs, that older children attended significantly less time on ROIs, ß = −0.59, t = −2.94, p = .007 (see the full model in S3). Post-hoc Tukey tests revealed that children with and without ASD shared a similar ROI distribution in the following order, from greatest to least: objects, parent's hands, parent's face, and child's hands (see Figure 2B). Specifically, both groups attended to the target objects significantly longer than the other ROIs, following with attention to their parent's hands. There was no difference in overall duration between the child's attention on the parent's face and on the child's own hands.
Children’s gaze behaviors: Gaze type
We focused on children's attention on ROIs only and classified these by the type of gaze: (1) saccades, (2) fixations, and (3) SA. First, S4 summarizes the mixed model with fixed effects for group and gaze type and random slope of gaze type within participants. It revealed no group difference in frequency counts in the variety of gaze behaviors, ß = 12.09, t = 0.21, p = .835. Moreover, gaze type accounted for frequency distribution, accompanying the significant fixed effects of SA (ß = −122.85, t = −2.21, p = .031) and saccade (ß = 156.00, t = 2.80, p = .007). The follow-up Tukey contrasts on gaze type also showed that in both groups, saccades represented the highest frequency, followed by fixations and SA as the least; all the comparisons were statistically significant (see Figure 2C).
Similarly, another mixed model with fixed effects for group and gaze type and a random effect for participants confirmed no group effect on attention duration, ß = 0.46, t = 0.03, p = .976; in other words, both groups had comparable duration counts on different types of gaze. Meanwhile, gaze type accounted for attention duration, with significant fixed effects of SA (ß = −39.75, t = −2.73, p = .009) and saccades (ß = −54.60, t = −3.75, p < .001). The follow-up Tukey comparisons showed that children from both groups exhibited their longest attention in fixations, followed by comparable durations of SA and saccades (see Figure 2D). In addition, the child's age was controlled in the model and showed that older children attended for significantly less time across ROIs, ß = −0.53, t = −3.00, p = .005 (see the full model in S5).
Children’s gaze behaviors: Gaze shifting
We also counted children's gaze shifting among ROIs, in specific, how likely the child tended to shift attention to target objects from other ROIs. The generalized linear model fitting with a quasi-Poisson distribution predicted the likelihood of gaze shifting to objects as a function of groups and prior gaze locations (i.e., parent's face, parent's hands, child's hands). There was no main group difference in the likelihood of gaze shifting to objects (ß = 0.17, t = 1.15, p = .253), while the effects of gaze shifting from face to objects (ß = −1.05, t = −3.79, p < .001) and gaze shifting from parent's hands to objects were significant (ß = 0.87, t = 5.20, p < .001, see the model in Table 3). The model was performed in log scale and transformed to the incident rate ratios for group comparisons: the incident rate for parent's hands-to-object was 2.39 times the incident rate for child's hands-to-object (z = 5.20, p < .001); the incident rate for child's hands-to-object was 2.85 times the incident rate for parent's face-to-object (z = 3.79, p < .001), and the incident rate for parent's face-to-object was 0.15 times the incident rate for parent's hands-to-objects (z = −7.54, p < .001; also see Figure 4A).

The distribution of parents’ gaze behaviors. (A) Parents’ gaze by ROIs counted by frequency. (B) Parents’ gaze by ROIs counted by duration (sec). (C) Parent’s gaze by gaze type counted by frequency. (D) Parent’s gaze by gaze type counted by duration (sec). Significance levels: * <.05, ** <.01, *** <.001.
The model summary of children's gaze shifting to target objects.
Parent’s gaze behaviors: Gaze location
We first examined how parents distributed their attention during the play session among ROIs in terms of frequency and duration measurements per group and summarized in Table 4.
Means and standard deviations (SD) of frequency (number of discrete looks) and duration (ms) of observed parent visual experiences by group, averaged across participants within the ASD and TD groups.
Secondly, we examined how parents distributed their attention during the play session as a function of group and type of ROIs; we ran the linear mixed models on both frequency and duration counts of the parent's gaze behaviors throughout the play session.
Similar to the models on the children's gaze, the mixed model on parent gaze included a random slope by participants to account for the variability of each dyad. Table 2 presents the model summary and reveals a significant interaction between the TD group and gaze on objects, ß = 191.29, t = 2.84, p = .007 (see the full model in S6). The follow-up group contrasts by location indicated that parents of TD children had significantly more gaze on target objects than parents of ASD children did, ß = 189.55, t = 3.38, p = .002, but no attention differences between groups on other ROIs. Besides, parents of both groups had a similar ROI distribution in the following order from greatest to least: objects, child's hands, child's face, and parent's hands. Specifically, the post-hoc Tukey contrasts showed that both groups attended the target objects significantly more than the other ROIs, and they had comparable counts of attention on the child's hands, child's face, and parent's own hands (see Figure 4A).
Moreover, the duration model with fixed effects of group and type of ROI and a random effect of participants also showed a significant interaction between TD group and gaze on objects, ß = 72.49, t = 3.33, p = .002 (see the model in S7). In specific, the post-hoc comparisons demonstrated that parents of TD children generated longer attention on target objects than parents of children with ASD did, ß = 59.83, t = 3.57, p < .001, while there was no group difference in other ROIs (see Figure 4B). To be noted, the follow-up ROI contrasts revealed various distribution by groups: (1) among parents of children with ASD, the duration of attention on parent's own hands was significantly less than on objects (ß = −58.14, t = −3.91, p = .002) than faces (ß = −52.20, t = −3.51, p = .006), while attention on target objects and child's face were comparable (ß = 5.94, t = 0.40, p = .978); (2) among the parents of TD children, the duration of attention on objects was significantly longer than the other three ROIs (see also all the contacts in S8).
Parent’s gaze behaviors: Gaze type
We then examined the types of gaze behaviors parents displayed to ROIs only during the play session; the statistics summary on both frequency and duration measurements are listed in Table 4. First, the frequency model with fixed effects of group and type of gaze and a random effect of participant revealed no group differences on parent's attention, ß = 105.45, t = 1.44, p = .159; the type of gaze accounts for gaze frequency, with a significant effect of SA, ß = −175.37, t = −2.79, p = .010 and a significant effect of saccades, ß = 285.25, t = 4.54, p < .001 (also see S9). Similar to the children's frequency distribution on various gaze types, parents of both groups also showed a similar distribution order, that saccades represented the highest frequency, followed by fixations and SA as the least, and all the comparisons were significant (see Figure 4C).
Notably, the duration model with fixed effects of group and type of gaze and a random effect of participants found two interaction effects: (1) a significant interaction between TD group and SA, ß = −64.10, t = −2.94, p = .007; and (2) a significant interaction between TD group and saccades, ß = −46.16, t = −2.12, p = .044 (see the full model in S10). To better interpret the interaction effect, the post-hoc Tukey contrasts revealed that parents of TD generated significantly longer fixations than parents of children with ASD did, ß = 54.80, t = 2.96, p = .006. Besides, both groups generated similar duration distribution as follows, from greatest to least: fixations, saccades, and SA by the least. Specifically, parents of both groups had significantly longer fixations, followed by comparable saccades and SA (see Figure 4D).
Parent’s gaze behaviors: Gaze shifting
We also counted parents’ gaze shifting to objects. The generalized linear model fitting with the quasi-Poisson distribution was used to predict the likelihood of gaze shifting as a function of groups and prior gaze locations. Table 5 summarizes the model in a log scale and reveals the main effect of groups that parents of TD children were significantly more likely to shift their gaze toward the target objects than parents of children with ASD did, ß = 0.38, t = 2.35, p = .024. In addition, parents of both groups shared a similar shifting trend: the incident rate for child's hands-to-object was 1.80 times the incident rate for child's face-to-object (z = 3.14, p = .005), the incident rate for child's hands-to-object was 2.42 times the incident rate for parent's hands-to-object (z = 4.27, p < .001), and the incident rate for child's face was 1.34 times the incident rate for parent's hands-to-object (z = 1.28, p = .405; also see Figure 4B).
The model summary of parents’ gaze shifting to target objects.
Parental impact on joint attention
We examined how parent attention to ROIs was predictive of JA. Parents’ attention to combined ROIs- (β = 3.11, p = .04) (See Figure 5A) and hand- (β = 1.04, p = .01) looking (See Figure 5B) frequency was significantly predictive of the frequency of JA experiences. Parents’ object-looking duration (β = 4.36, p = .004) was significantly predictive of the duration of JA experiences (see Figure 5C).

Significant parent attention predictors for JA instances. (A) Parents’ attention to all of the regions of interest (objects, parent hands, child face, and child hands) frequency as a significant predictor of A frequency. (B) Parents’ hand-looking frequency as a significant predictor of JA frequency. (C) Parents’ object-looking duration as a significant predictor of JA duration.
Parents of ASD children's child-looking frequency was predictive of the frequency (β = .2, p = .04) of JA experiences (see Figure 6A) though this was not the case for the TD parents. Rather, parents of TD children's object-looking duration were significantly predictive of the duration of JA experiences, β = .18, p = .02 (see Figure 6B).
Discussion
In the current study, we used head-mounted camera techniques during live, social interaction between children and their parents engaged in object play to explore the visual experiences of children with and without ASD and how they distribute their visual attention in relation to parent visual experiences. Overall, we found similar looking behaviors in play contexts where parents are actively influencing a child's visual experiences but found parents of ASD children and parents of TD children do so by taking potentially different pathways.
First, despite the previous findings from well-controlled experiments and observational studies of parent–child interaction, we did not find differences in viewing experiences of children between ASD and TD groups: children with and without ASD had similar gaze distributions corresponding with the ROIs and types of gaze, and they both were most likely to follow parent's hands to shift attention to objects. These seemingly contradicting results suggest the possible impact of social context on children's early viewing experiences through parents’ corresponding scaffolding. Thus, we addressed this potential impact of social context by observing parents’ looking behaviors. The results point to the different attention focuses and gaze distributions between parents in the TD group and the ASD group. Specifically, parents of TD children primarily focused on objects in terms of increased gaze shifting and longer attention to objects compared to parents of children with ASD. Parents of children with ASD had comparable amounts of attention to objects and their child's face, which can be due to their enhanced sensitivity to children's reactions—child's gaze direction—during social interactions. These differences in parental gaze patterns, combined with the similar visual experiences of children in TD and ASD groups, may indicate potentially different processes by which parents scaffold children's visual experiences—suggesting differential attention strategies used by parents from TD and ASD groups. We speculated that parents in the two groups pay attention to different information to support their child's viewing experiences, and thus the children's visual experiences were shown to be comparable despite the previous studies documenting different JA between ASD and TD groups which used computer screen-based tasks (Amso et al., 2014; Chita-Tegmark, 2016; Wang et al., 2015). Third, this speculation was addressed by testing the relationship between parents’ looking behaviors and their JA experiences with their children. This is our third key finding. The results not only support the link between parents’ gaze behaviors and children's gaze behaviors but also suggest that parents of children with and without ASD differentially support JA experiences for their children. We found that while TD parents’ attention to objects significantly predicted JA experiences, ASD parents’ attention to their children significantly predicted JA experiences. One may further speculate that while parents of TD children focus on demonstrating the objects clearly to the child (e.g., which object when, and how those objects are facing to the child), parents of ASD children may attend to whether or how the objects are perceived by the child (e.g., child's direction of looking). To be noted, the present findings highlight that children with and without ASD were both sensitive to parents’ hands rather than parent's face before shifting their attention to objects, implying a correspondence between hand actions and gaze behaviors by parents and various scaffolding strategies to support constant and stable-looking experiences. A semi-longitudinal study (from 5 to 24 months old) with a head-mounted eye tracker in an object play context indicates that parents significantly change the way they support child's object viewing and that TD children maintained their effective viewing experiences and strengthen their SA throughout the development period (Burling & Yoshida, 2019; Sun & Yoshida, 2022). Taken together, all of these findings suggest that parents can use a variety of social cues that may work separately and together to accommodate the child's developmental changes and scaffold children's visual experiences (Sun et al., 2022).
The potential importance of social scaffolding in terms of early learning is only the beginning. However, the present study revealed the potential mechanism of the impact of task contexts in early visual experiences and unique pathways in which social scaffolding supports similar developmental experiences and/or achievements. This work further illustrates that while it is important to continue investigating details about how social scaffolding shapes attentional, linguistic, and cognitive development and developmental significance, it is also critical for us to keep in mind whether these experiences are equally important across typically and children who have been diagnosed with ASD.
Limitations and future directions
Although this study offers novel information about the visual experiences of children with and without ASD along with their parents, our findings must be considered in light of at least three important limitations. First, the small sample size and large age range. Given our exploratory study concerning visual experiences, our needed sample size was calculated using a few similar studies with TD children, and the study used a relatively wider age range. Given the nature of how play structures change by age, we controlled for age and found a significant effect of age on children's attention duration only. Therefore, the generalizability of the current study findings is limited to similar groups. Future studies with larger samples with more narrow age ranges will allow researchers to include precise measurements of children's developmental level (e.g., scores reflecting IQ, language levels, and ASD symptom severity), which would allow exploration of how these factors may or may not influence both child and parent behaviors and how their behaviors are related. For example, previous research has shown that parents of autistic children have differences in behaviors toward their children based on their child's ASD symptom severity (Beurkens et al., 2013; Kasari et al., 1988; Strid et al., 2013); further exploration of this in regard to visual attention may reveal a graded effect—variation among ASD children. Nonetheless, the current study extended the previous head-mounted camera studies with microlevel behavioral approaches to ASD samples and successfully generated 7,000–12,000 frames from collected samples of 20–40 parent–child dyads (Deák et al., 2018; Falck-Ytter, 2015; Falck-Ytter et al., 2015; Wass et al., 2018; Yoshida et al., 2019; Yu et al., 2019; Yurkovic et al., 2021) and revealed the potentially unique links between child's and parent's attention with two important groups.
Furthermore, we acknowledge that there are a number of factors influencing the current results. One factor is gender—like most studies, we had limited participation of females diagnosed with ASD; recent research has suggested that there are differences in visual attention between females diagnosed with ASD and males diagnosed with ASD that are not easily captured when using predominantly male samples (Harrop et al., 2018, 2019, 2020). Another factor is an individual's treatment history. For example, prior enrollment in applied behavior analysis (ABA) therapies may influence children's behaviors due to the emphasis placed on attending to items of interest and other people (Martins & Harris, 2006; Murza et al., 2016); this may also explain children's similarities in attention to different ROIs. Further, parents’ tightly coupled behaviors to their child's visual experiences may be different as a function of background factors, such as socioeconomic status and bilingualism (Abels & Hutman, 2015; Sun et al., 2022), and this is worthy of future exploration, as well.
With these factors in mind, future studies in this vein will continue increasing the clinical relevance of research results and allow for more individualized approaches to improving intervention efforts for children with ASD. Nonetheless, the present study made a critical first step forward in our understanding of the mechanisms underlying the different ways in which parental scaffolding may impact visual experiences for children with typical and atypical developmental trajectories. Targeted parental scaffolding—parental gaze behaviors—have been documented to create potential far-reaching effects. This study adds to the current literature by documenting the similar nature of visual experiences of children with and without ASD in everyday visual experiences and how such visual experiences are also tightly linked to parental social scaffolding—specifically, the study provides insights into the potential training ground for early attention development at home, which has potentially great relevance for developing new intervention approaches.
Conclusion
In the current study, we used head-mounted camera systems to study child (and parent)-centered viewing and gaze behaviors of parent and child (ASD and TD groups) in parent–child interactive play and documented similarities and differences in the distribution of visual attention and the impact of parental gaze behaviors on visual attention behaviors among children with and without ASD. The results of this innovative study concerning the shared visual experiences of children with and without ASD revealed that their parents use differential attention strategies to support their children's visual experiences. Our work sheds light on the impact of social context and parental involvement on children's attention, which is a foundation of early learning, and ultimately may contribute to the development and improvement of parent-mediated interventions to support developmental outcomes.
Supplemental Material
sj-docx-1-dli-10.1177_23969415221137293 - Supplemental material for What children with and without ASD see: Similar visual experiences with different pathways through parental attention strategies
Supplemental material, sj-docx-1-dli-10.1177_23969415221137293 for What children with and without ASD see: Similar visual experiences with different pathways through parental attention strategies by Elizabeth Perkovich, Lichao Sun, Sarah Mire, Anna Laakman, Urvi Sakhuja and Hanako Yoshida in Autism & Developmental Language Impairments
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Exploring Visual Fixation Patterns in Young Children as a Potential Biomarker for Autism Division of Research/Provost Faculty Research Invigoration Programs: High Priority Area Research Seed Grants (Accessible Healthcare) University of Houston, Social-Embodied Attention and Language Development in Young Children at risk for Autism Spectrum Disorder. CLASS Research Progress Grant, University of Houston.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
