Abstract
The cultural transmission of technical know-how has proven vital to the success of our species. The broad diversity of learning contexts and social configurations, as well as the various kinds of coordinated interactions they involve, speaks to our capacity to flexibly adapt to and succeed in transmitting vital knowledge in various learning contexts. Although often recognized by ethnographers, the flexibility of cultural learning has so far received little attention in terms of cognitive mechanisms. We argue that a key feature of the flexibility of cultural learning is that both the models and learners recruit cognitive mechanisms of action coordination to modulate their behavior contingently on the behavior of their partner, generating a process of mutual adaptation supporting the successful transmission of technical skills in diverse and fluctuating learning environments. We propose that the study of cultural learning would benefit from the experimental methods, results, and insights of joint-action research and, complementarily, that the field of joint-action research could expand its scope by integrating a learning and cultural dimension. Bringing these two fields of research together promises to enrich our understanding of cultural learning, its contextual flexibility, and joint action coordination.
Keywords
From knowing how to build tools and shelters to knowing techniques for hunting and sailing, the intergenerational transmission of technical skills has proven essential to the success of the human species in populating all but the most hostile ecosystems on the planet. The transmission of these vital skills 1 is made possible by cultural learning, 2 our capacity to seek out and acquire information from one another (Henrich, 2016; Richerson & Boyd, 2005). What makes this capacity special may not be the learning mechanisms themselves, as cultural learning may rely on the same general learning processes as nonsocial (individual) learning (i.e., learning from interacting with one’s environment; Behrens et al., 2008; Olsson et al., 2020; Perreault et al., 2012). Instead, it would be our ability to attend specifically and efficiently to information conveyed by other people (Heyes, 2012; Sterelny, 2009), an ability depending on social cognitive processes such as shared intentionality (Tomasello et al., 2005), mind reading (Apperly & Butterfill, 2009), and ostensive communication (Csibra & Gergely, 2009).
In the field of cultural evolution (Richerson & Boyd, 2005), the transmission of technical skills—complex actions that have an instrumental function of producing material changes in the environment (Charbonneau, 2015)—and of the form of their material outputs—artifacts, engineered environments, and technologies more broadly—have been understood to be the result of high-fidelity cultural learning (Boyd et al., 2013). Techniques and technologies, moreover, have been of special interest to cultural evolutionists because they are one of the most important means by which human populations adapt to their environment (Henrich, 2016). Moreover, they often exhibit a strong signal of improvement over generations, as documented by the incremental evolution of human material culture by archaeologists (Lipo et al., 2006). They have served as the key focus for the study of human cumulative cultural change, the process by which cultural traits are incrementally modified and improved over generations (Mesoudi & Thornton, 2018; Osiurak & Reynaud, 2020). Finally, although of lesser sophistication, technical traditions are also found in other species, providing the main evidence for the claim that culture is not uniquely human (Whiten, 2022).
Cognitive scientists have mainly attended to the informational aspect of cultural learning: which stimuli are used when learners acquire skills, whom learners choose to learn from, and how faithfully the skills are transmitted. From these, researchers have developed sophisticated typologies of cultural learning processes based on the kind of (a) social inputs learners are sensitive to (Hoppitt & Laland, 2013), (b) social learning strategies learners employ when selecting a model from which to learn (Kendal et al., 2018), and (c) mechanisms that can encourage or impede high-fidelity transmission (Dean et al., 2014). Less examined, however, are the diverse forms of interpersonal interaction dynamics and social configurations involved in learning episodes and our capacity to adapt and learn under various, often uncertain circumstances.
In contrast, ethnographic studies have shown that the means by which we acquire technical skills from one another and the forms of interindividual interactions involved vary tremendously, both within and between populations (Garfield et al., 2016; Gauvain, 2005; Lancy et al., 2010; Lew-Levy et al., 2017, 2019; MacDonald, 2007; Rogoff, 2003). Technical skills can be passed on through various institutional forms of education, in which the interaction structure may vary considerably, such as learning in decontextualized ways (e.g., through formal education) or in a more hands-on manner (e.g., apprenticeship), but in many cultures it is acquired informally and in situ (Lancy et al., 2010). Cultural learning can also be achieved with different degrees of involvement on the part of the learners and those they learn from. A learner can acquire a technical skill by directly engaging in participatory learning within the context of using the skill (Lave & Wenger, 1991; Paradise & Rogoff, 2009; Rogoff et al., 2003), through opportunistic observational learning (Gaskins & Paradise, 2010), or even more casually through play with peers (Boyette, 2016; Chick, 2010). Models, the knowledgeable individuals from whom skills are acquired, can themselves be involved to different degrees, from actively engaging with and guiding the learner to more limited forms of engagements, such as offering evaluative feedback through explicit assessments or even physical punishment, to merely providing learning opportunities by tolerating the presence of a naive observer (Kline, 2015).
This broad diversity of learning contexts and social configurations and the various kinds of coordinated interactions they involve speak to our capacity to flexibly adapt to and succeed in transmitting vital knowledge within varying learning contexts. Moreover, this flexibility is likely to be a major contributor to our success in adapting to novel and changing environments (Boyd & Richerson, 2005). These ethnographic observations therefore pose a challenge to the study of cultural learning: How is it that cultural learning can be deployed in so many different social setups, using different means of coordination between the learners and those they learn from, and how do these various forms of interaction fit with our current understanding of cultural learning? Coordination and the many fine-grained interaction dynamics between learners and models have yet to be addressed by cognitive scientists studying cultural learning. In fact, just how this flexibility is achieved has so far received little attention from cognitive scientists. Here, we argue that a key feature of the flexibility of cultural learning is that both the models and learners recruit cognitive mechanisms of action coordination to modulate their behavior contingently on the behavior of their partner, generating a process of mutual adaptation supporting the successful transmission of technical skills in diverse and fluctuating learning environments.
Within cognitive science, the field of joint action (Sebanz et al., 2006; Sebanz & Knoblich, 2009, 2021; Vesper et al., 2010) offers an extensive experimental literature focusing on how individuals coordinate their actions in space and time, elucidating the mechanisms that allow for successful coordination (Sebanz et al., 2006; Sebanz & Knoblich, 2021). Superficially, links can be drawn between joint action and cultural learning in that many forms of joint action involve highly complex technical skills that are the result of cumulative cultural learning across generations, such as playing music with another person, piloting a plane with a copilot, or performing surgery with a medical team. More fundamentally, however, joint action and cultural learning face a common problem: understanding another person’s actions and modifying one’s own behavior contingently on that understanding. Such understanding can come about from dividing actions into discrete motor units and applying a teleological framework to assign intentions or goals that likely give rise to that behavior (Buchsbaum et al., 2015; Byrne, 2003), and contingent responses are the result of an individual’s ability to control their own movements and to predict and monitor both their own actions and those of the other person. In the case of joint action, this understanding and response process facilitates interpersonal coordination, allowing partners to anticipate and adapt to each other in service of their joint goal. In the case of cultural learning, it provides the learner with access to some mental content that is held within the mind of the model, enabling them to perform the skill themselves as a result of that access.
There is a rich literature of joint-action research exploring the cognitive and behavioral mechanisms that support coordination to address this problem of understanding or reading other people’s minds through their actions. Researchers in this area, however, have devoted little attention so far to the role these mechanisms play in cultural learning (but see McEllin et al., 2018b). We propose that the study of cultural learning would benefit from the experimental methods, results, and insights of joint-action research and, complementarily, that the field of joint-action research could expand its scope by integrating a cultural learning dimension. Bringing these two fields of research together promises to enrich our understanding of both cultural learning and (joint) action coordination.
In the following section (Flexible Cultural Learning), we approach cultural learning by characterizing the flexible recruitment of supportive mechanisms, both cognitive and behavioral, facilitating cultural learning interactions in different learning contexts. In the next three sections, we review a curated selection of the literature on coordination and joint action, identifying some candidate mechanisms relating to attention (see Attention Mechanisms That Support Action Coordination), action prediction and monitoring (see Prediction, Planning, and Execution That Support Action Coordination), and strategies for informative signaling that we consider relevant to the study of cultural learning (Communicative Strategies That Support Action Coordination). In each of these sections, we compare how these mechanisms, both on the side of the learner and the model, interactively support skill transmission and identify promising research questions that a joint-action perspective opens for the study of cultural learning. In the final section (Cultural Learning and Coordination: Further Questions), we further argue that approaching the study of cultural learning through an action coordination framework opens up novel avenues of fruitful research on (a) the joint development of coordination and cultural learning capacities, (b) the strategies employed by models and learners to engage with one another, and (c) the longer-term cultural evolutionary impacts of coordination mechanisms.
Flexible Cultural Learning
Mechanisms and processes of cultural learning
A focal point of research in the field of cultural learning is to explain what makes humans’ ability to learn from others different from the social learning capabilities of nonhuman species (Tomasello, 1999). A widespread assumption is that answering this question will involve identifying a minimal set of human-specific cognitive mechanisms and processes involved in cultural learning that can function within most environments and for all kinds of content (Heyes, 2012; Tomasello, 2019; but see Sterelny, 2012). Social cognitive mechanisms, such as shared intentionality (Tomasello et al., 2005), mind reading (Apperly & Butterfill, 2009), and ostensive communication (Csibra & Gergely, 2009), are strong explanatory candidates for our species’ unique capacity for cultural learning. These would also explain our unique developmental trajectories (Tomasello, 2019), such as why we are expert imitators from a young age, leading us to copy one another by easily learning instrumentally opaque actions and conventions (Gergely et al., 2002).
Researchers have also developed sophisticated typologies of social learning processes based on the kind of informational inputs learners are sensitive to (see Hoppitt and Laland, 2013, for a review). These processes are understood (a) to be constitutive of our human-specific capability for cultural learning and (b) to be adaptive at least in part because they enable or increase our capacity to learn from others. Although they may be shared with other species, they would nonetheless participate in our unique capacity for cultural learning. They would also explain our capacity for cumulative cultural evolution (Dean et al., 2014; Muthukrishna et al., 2018) and the wide range of content we can transmit (Heintz & Scott-Phillips, 2023). Much experimental work accordingly focuses on our unique capacity to transmit different forms of knowledge with sufficiently high fidelity to allow us to increasingly complexify our traditions from one generation to the next (e.g., Caldwell & Millen, 2008, 2009).
Cultural learning processes such as imitation, emulation, and teaching speak of what we can learn and how well we can learn it (Charbonneau & Strachan, 2022; Heyes, 2012). In this way, they are broadly applicable across most or all environments. Because of this broadness, however, a focus on these processes alone cannot explain how cultural learning is deployed in any specific, local scenario and why some processes rather than others are used in these different scenarios. Consider, for instance, learning how to weave a basket through emulation. Although definitions of emulation vary, they usually converge in defining emulation as involving a learner reproducing some object or change in the environment produced by another individual—the end result—without copying the specific actions that the other individual used to produce the end result. Although a basket-weaving skill can be emulated by observing a model produce a basket, it can also be learned through reverse-engineering a basket left behind by a model. In the first case, the learner may be able to follow the gaze of the model and adopt their visuospatial perspective as they weave their basket. Using these cognitive mechanisms—gaze following, visuospatial perspective taking—would support the learner reconstructing the skill as it would help them identify which parts of the environment are useful in the fabrication process and thus scaffold their learning process, something that could not be done in the absence of a model. In contrast, reverse-engineering from the basket the technique used for its production would rely on mental simulations and trial-and-error hypothesis testing. In the latter scenario, technical reasoning would play a more important role than social cognition (Osiurak & Reynaud, 2020). Although both scenarios count as cases of emulation—the end result, not the specific actions of the model, is copied—they rely on different supportive cognitive mechanisms.
As shown by the ethnographic challenge presented in the introduction, learners and the models they learn from must cope with a range of interaction setups involving varying forms of coordination problems. Just how the transmission of technical knowledge is ensured within these dynamically varying contexts goes beyond the explanatory scope of cultural learning processes (e.g., imitation, emulation; Charbonneau & Strachan, 2022). Missing are those mechanisms and processes that support the flexible deployment of cultural learning in the diversity of scenarios within which ethnographers have shown it to operate. In the previous example, gaze following, visuospatial perspective taking, and technical reasoning supported this function. We call these supportive mechanisms because they are not learning mechanisms per se, nor did they likely evolve to partake in cultural learning. Instead, they are supportive in that they can be recruited ad hoc in different contexts to flexibly assist, scaffold, and enhance cultural learning processes (in the previous example, emulation).
The flexible recruitment of supportive mechanisms is not restricted to the learner’s side. Models also recruit different supportive mechanisms when they engage in cultural learning interactions. For instance, consider the case of teaching. Suppose a farmer wants their child to learn how to wield a machete. The farmer can use their own background knowledge of the task and that of the child to evaluate the learning requirements and determine the best way to address those needs. In some contexts, the model may direct the child’s gaze to relevant aspects of the action, perhaps modifying their own actions to be more ostensive as they demonstrate how to wield the machete, thereby relying on their capacity for gaze leading (Edwards et al., 2015; Schilbach et al., 2010) and visual perspective taking (Hawkins et al., 2021; Horton & Keysar, 1996). Alternatively, the model can help the child learn by physically positioning their grip on the tool and haptically guiding them in swinging the machete, thereby recruiting sensorimotor channels of communication focused on haptic perception and motor control (Ganesh et al., 2014; van der Wel et al., 2011). Or the model may recall how they themselves learned and simply provide a blunt machete to the child and let them play around with it, relying more on their cultural background and the material affordances provided by their environment (Ruddle & Chesterfield, 1977). Depending on the model’s choice, different sets of supportive mechanisms will be recruited, both on the side of the model and that of the learner, to adapt the teaching to the local circumstances.
This diversity in learning scenarios and the contextual recruitment of different supportive mechanisms suggests that we are highly adaptable learners, capable of answering the various cognitive challenges posed by the wide range of social interactions through which cultural learning operates. However, as the above examples show, by focusing only on learning mechanisms—in those examples, emulation and teaching—the differences in the specifics of the learning episode are lost, and the capacity for flexibly adapting to varying contexts remains unaddressed. The diversity of learning scenarios and social configurations highlight the necessity for a form of cognitive flexibility, one that allows us to contextually recruit various supportive cognitive mechanisms so as to ensure the successful transmission of cultural knowledge.
Joint action and flexible cultural learning
As documented by ethnographic studies, cultural learning is a diverse and interactive phenomenon that individuals approach with a great degree of flexibility. Much of the experimental literature on cultural learning, however, treats learning as discrete, episodic events of transmission, at a resolution that loses a great deal of the information about the dynamics of the local interaction (e.g., a knowledgeable model produces some behavior that leads to a learner learning how to produce some similar behavior; Charbonneau & Strachan, 2022; Heyes, 2012). Although this approach has become popular for methodological reasons—particularly the logistical practicalities of investigating transmission using chains of nonoverlapping generations (Caldwell & Millen, 2008; Mesoudi & Whiten, 2008)—it has been criticized (Miton & Charbonneau, 2018) and has led to recent calls to center the study of cultural learning on interactivity and coordination (Charbonneau & Strachan, 2022).
If we are to answer the ethnographic challenge, we need to understand how technical skill acquisition unfolds in different learning contexts, through various social configurations and interaction structures, each of which can present coordination problems of its own. We need to examine which cognitive and behavioral mechanisms allow us to learn in more or less dense interactive setups—from unidirectional observational learning with little if any involvement of the model to highly interactive teaching—and how learning interactions can take different temporal structures—from discrete turn taking to synchronic dynamics. This requires us to focus specifically on mechanisms of interindividual coordination and what role they play in supporting dynamic cultural learning interactions between learners and models during and within learning episodes.
In this article, we approach cultural learning by focusing on its observed flexibility by ethnographers. We focus on examining the supportive cognitive and behavioral mechanisms involved in the interactions and coordination occurring between model and learners within and during learning episodes. We argue that a specific class of supportive mechanisms play a major role in shaping these interactions, namely action coordination mechanisms, which allow for the prediction, monitoring, and planning of another person’s behavior during joint action (Vesper, Abramova, et al., 2017; Vesper et al., 2010). These mechanisms play a fundamental role in allowing for rich, complex social interactions between people, making them highly relevant to investigate as supportive mechanisms for cultural learning.
Action coordination mechanisms are the subject of experimental investigation by joint-action researchers (Sebanz et al., 2006; Sebanz & Knoblich, 2009, 2021; Vesper et al., 2010). In fact, the field of joint action is centered on the cognitive science of social interactions and works to address the need for investigation of real-time social encounters (Hadley et al., 2022; Schilbach et al., 2013). Joint-action research is thus particularly suited for the study of cultural learning as a form of social interaction, where learners and models interact in a variety of ways, and it offers an extensive literature providing new insights, experimental paradigms, and promising new research avenues for the cognitive study of cultural learning.
In the next three sections, we review a curated selection of the literature on coordination and joint action, identifying and describing some candidate mechanisms relating to attention (Attention Mechanisms That Support Action Coordination), action prediction and monitoring (Prediction, Planning, and Execution That Support Action Coordination), and informative signaling (Communicative Strategies That Support Action Coordination) that we consider relevant to the study of cultural learning. For each section, we first provide an overview of each relevant mechanism in the context of joint action and then discuss the insights and open questions their study offers to our understanding of the workings of cultural learning. We also address the ethnographic challenge by examining how these mechanisms support our capacity to learn flexibly in different interaction setups.
Attention Mechanisms That Support Action Coordination
Attention plays an important role at all stages of social interactions. Attention mechanisms, which control how a person orients to, filters, and processes information from their environment, allow individuals to distill an overwhelming amount of sensory input down to manageable relevant information that can be effectively integrated into ongoing actions. Monitoring the attention of a partner during social interaction, therefore, can offer insights into the opaque mental processes by which they parse information that may be relevant to the interaction (Elekes & Király, 2021). This is particularly important for joint actions, where coordination partners must establish or recognize shared goals and intentions.
Joint attention
Seeing where somebody is looking not only results in adopting their perspective but also affects how we allocate our own attention and process information about the environment. When seeing a person avert their gaze to a location in the environment, observers are faster and more accurate at identifying and processing target objects appearing in the looked-at rather than the ignored location (Driver et al., 1999; Friesen & Kingstone, 1998; Frischen et al., 2007). This attention reallocation, known as gaze cuing, is sensitive to some features of the cuing faces, with people being more likely to follow the gaze of trustworthy (Jessen & Grossmann, 2020; Süßenbach & Schönbrodt, 2014), socially dominant (Jones et al., 2010), and previously helpful faces (Dalmaso et al., 2016; for a review, see Dalmaso et al., 2020). Sensitivity to these features could reflect a disposition to follow the gaze of individuals who seem to be good social sources of information, which emerges as early as 8 months of age (Tummeltshammer et al., 2014).
Attending the same object as another person, known as joint attention, affects how that object is processed (Becchio et al., 2008). In gaze-cuing paradigms, objects that are looked at by a cuing face are liked more than objects that are ignored (Bayliss et al., 2006, 2007) and are remembered better (Gregory & Jackson, 2017) but only if the cuing face can see them (Gregory & Jackson, 2019). Seeing another person looking at an object can also trigger mental simulation of actions directed toward that object (Castiello, 2003), and this interference is attenuated in children with diagnoses of autism spectrum disorder (Becchio et al., 2007), suggesting that this effect reflects a specifically social propensity to anticipate intended motor actions by monitoring gaze. The effects of joint attention on object processing are also sensitive to group size. Objects that are looked at by more than one person are liked more than objects looked at by only one person, a tendency that may facilitate learning of cultural group practices, attitudes, or norms (Capozzi, Bayliss, & Ristic, 2021; Capozzi, Wahn, et al., 2021; Herrmann et al., 2013).
These studies examined effects of gaze on object processing in static scenarios, but in continuous and dynamic interactions, eye-tracking evidence indicates that gaze following is crucial for effective communication. Examining the coupling between where speakers and listeners look during monologues or dialogues shows that tighter coupling is associated with better comprehension (Richardson & Dale, 2005; Richardson et al., 2007), indicating that joint attention can play a key role in supporting communicative interactions.
Visuospatial perspective taking
The ability to take another person’s visuospatial perspective is an important part of coordination. It has recently been argued that altercentrism reflects a fundamental property of human cognition and early cultural learning. An altercentric perspective allows infants to adopt other people’s perspectives without the costly interference of their own egocentric perspective, with this egocentricity itself emerging later in ontogeny as a result of prolonged learning (Kampis & Southgate, 2020; Southgate, 2020). With adults, experimental evidence shows that others’ perspectives are computed rapidly and involuntarily (Samson et al., 2010) and that interference from another’s perspective is sensitive to top-down beliefs about the other’s intentions and what they can see (Freundlieb et al., 2016, 2017; Furlanetto et al., 2016). Others’ perspectives can also facilitate information processing, as people find it easier to read a rotated word if the word appears upright from another person’s perspective (Freundlieb et al., 2018).
Automatic computation of another’s perspective also allows people to form topographical mappings between one’s own body and a partner’s, which can facilitate synchronous coordination in cases where the spatial relationship is incongruent. Topographical mapping facilitates coordination of anatomically matching body parts that are spatially incongruent—for example, when copying another person’s movements, participants find it easier to copy a right-hand action with their right hand, even though that hand appears on the left when the model is facing them (Ramenzoni et al., 2015).
Perspective taking is also important when taking into account a coordination partner’s particular action constraints. If a partner cannot see a stimulus, they cannot act on it, thereby changing the social affordance landscape of the scene. During a director task—a communicative interaction where participants direct a partner to interact with particular objects—directors show sensitivity to what their partner can see when communicating about the spatial location of objects (Hanna et al., 2003) and will adapt more to their partner if the partner has a more difficult task (Mainwaring et al., 2003). Awareness of another person’s constraints and affordances are also important for action prediction and information signaling during coordination, which we discuss in more detail below.
Sensitivity to being observed
We are sensitive to the presence of faces, particularly those that are looking at us (the “stare in the crowd” effect; von Grünau & Anston, 1995), and evidence suggests that this is driven by sensitivity to direct eye gaze (Senju et al., 2005). Awareness of being observed, or feeling observed, can affect one’s behavior in different ways (Hamilton, 2016). For example, people are more likely to mimic the actions of a person who makes eye contact before initiating an action (Prinsen & Alaerts, 2019; Prinsen et al., 2017; Wang et al., 2011; Wang & Hamilton, 2014) and show higher fidelity imitation when being observed by the person they are imitating (Krishnan-Barman & Hamilton, 2019). On the other hand, awareness of being observed can also have adverse effects on performance (Colombatto et al., 2019). Knowing that one is being watched can lead to choking, in which highly skilled individuals fumble tasks that they would otherwise find routine. The presence of an audience can also lead to social facilitation on cognitive processes, improving performance on both laboratory tasks (Huguet et al., 1999; Sharma et al., 2010) and naturalistic tasks (Bowman et al., 2013), which suggests that coordinating partners may help each other by mere virtue of being present.
However, being watched can be a double-edged sword. Believing that one is being watched is an arousing stimulus that can be measured in skin conductance responses (Myllyneva & Hietanen, 2015; Nichols & Champness, 1971), and this heightened state of arousal can be detrimental as well as beneficial, depending on the context. Prolonged eye contact (staring) is typically considered an aversive stimulus in live interactions with strangers (Ellsworth et al., 1972), and awareness of being observed can lead to choking in experts, where highly skilled individuals fumble tasks that they would otherwise find routine as a result of overthinking or distraction (Colombatto et al., 2019). The cognitive and behavioral consequences of being observed are therefore highly context dependent and may help or hinder joint actions according to individual feelings of social pressure, responsibility, or commitment to the partner.
Although direct gaze may be particularly salient because it is interpreted as an initiator to interaction (e.g., such as a cultural learning interaction), people are also sensitive to configurations of bodies and faces that indicate an interaction is occurring. Two bodies facing each other are detected faster by a third party than two bodies facing away from each other (Papeo et al., 2019; Vestner et al., 2019, 2020), and advantages in processing a facing dyad over a nonfacing dyad are eliminated when the images are inverted (Papeo et al., 2017), suggesting that interacting partners are perceptually grouped as holistic units by the visual system. Furthermore, this perceptual grouping appears to be sensitive to the expressed emotion of these dyads (Strachan et al., 2019) and can also have consequences for how the emotional expression of faces is perceived (Gray et al., 2017).
Attention and flexible learning
The importance of attention in cultural learning is well established. Opportunities for social learning are proposed to be preferentially attended (Heyes, 2012), and the ability to share attention and follow where conspecifics are looking has been proposed as one of the foundational cognitive abilities driving human cultural evolution (Heyes & Moore, 2023; Tomasello et al., 2005). Furthermore, direct gaze is one of several ostensive cues proposed as key to infant cultural learning by natural pedagogy (Csibra & Gergely, 2009), as it allows infants to detect when adults are demonstrating something for them to learn. A common thread of these existing accounts recognizes attention as important for guiding learners toward learning opportunities.
While there is some debate as to how specific the perceptual grouping of facing dyads is—as similar effects have been shown with arrows and other nonhuman objects (Vestner et al., 2021), indicating that this attentional bias may not be entirely specialized for perceiving human dyads—an attentional bias that prioritizes visual processing of engaged dyads may prove useful for identifying opportunities for cultural learning. A novice who can easily detect another novice engaged in an interaction with a model may quickly determine that the model is tolerant to learners and could be a valuable source of information (given that they already have an attentive pupil), perhaps inciting the novice to also attend to the model. Such a bias could thus effectively support different social learning strategies (Heyes, 2016; Kendal et al., 2018). For instance, a bias toward socially engaged dyads or direct-gaze faces can help novices in search of a learning opportunity, effectively acting as a model-based social learning strategy. The extent to which these attentional biases promote the detection and initiation of cultural learning opportunities specifically, and in which contexts, remains an open question.
What is less established, however, is how attention mechanisms can affect the learning process within interactions. Social learners may derive great benefits from being able to see not only what models do but also what models attend to as they are acting. For example, training a novice tennis player to use the same visual search strategy as expert players in order to anticipate upcoming dynamics of their opponents’ movements leads to marked improvements in performance, compared with the performance of a control group of novices who have equal physical experience but are trained with a placebo visual search strategy (Williams et al., 2002). In cases of teaching, this can be exploited further through stimulus or local enhancement, where a model may direct a learner’s attention around a scene in order to expose them to efficient information search strategies in a flexible manner. For example, a model cook demonstrating how to prepare a recipe may use their own gaze as a pointing device to direct the learner’s attention to each ingredient in turn, in the order that each should be added to the pot. Directing attention sequentially may offer benefits to the learner’s retention of both the component ingredients and the sequence in which they should be added, both of which are integral parts of the recipe being transmitted.
Given the importance of observational learning for acquiring motor skills in early infancy, it may be that the ability to take other people’s perspectives plays an important role in how learners relate what they see to their own developing motor repertoire (Moll & Kadipasaoglu, 2013). Topographical mapping likely plays an important role in this because it allows learners to map the kinematics of observed movements onto their own motor systems during imitation (Heyes, 2001). In addition, by adopting the model’s visuospatial perspective, learners can process information from a model flexibly and pragmatically based on what they understand the model to see and believe about the world around them.
In turn, topographical mapping can be exploited by the model to facilitate learning, for instance by having the model match the learner’s bodily orientation to make it easier for the learner to map the observed movements onto their own body (Downey, 2008). It is also important for models to adopt the visuospatial perspective of the learner in order to provide perceptual access to learning opportunities. For instance, a caregiver can adjust the way they hold an infant on their lap so that the infant can see demonstrations and actions of other group members within the broader environment (Hewlett & Roulette, 2016).
Attention mechanisms may play an important role in guiding learners’ information processing and helping them develop ancillary tools for successful performance (such as visual search strategies). However, it remains an open question whether being the object of attention may have any effect on the model’s behavior. Being observed may encourage more prosocial behavior (the audience effect; Bateson et al., 2006; Rotella et al., 2021), which may affect the kinds of behaviors that models produce when observed by a learner. For example, models may exaggerate a prescribed “correct” way of performing a behavior that they would not usually adhere to if they are being watched by a learner in order to set a good example, such as waiting for a traffic signal before crossing an empty street when a child is present. Alternatively, models may be more susceptible to choking under pressure, either by distraction or by overthinking their actions as a result of deconstructing a skilled technique for ease of acquisition. Although previous work has investigated how learners choke under the pressure of observation from an authority figure (Belletier et al., 2015), the impact of observation by a learner on models’ behavior remains an avenue for future research.
Prediction, Planning, and Execution That Support Action Coordination
When coordinating with other people, in order to efficiently adapt their behavior to that of a partner, people need to anticipate what their partners are going to do while preparing their own response. Because of the temporal and physical constraints involved in coordination, the interaction between coagents cannot rely only on passive, delayed processes but instead requires reliable, on-line predictions about the outcomes of a coagent’s movements that can be used to plan their own actions in turn (Sebanz & Knoblich, 2009).
Representing and predicting the actions of other people
Coordination between two or more agents poses two kinds of cognitive problems. First, people need to understand what actions their partners are going to perform. Second, people need to select, prepare, and execute an adequate response with a level of spatial and temporal precision high enough to ensure successful coordination. This is particularly challenging when coordination requires agents to perform actions simultaneously.
The sophisticated computations required to solve these problems likely rely on action–perception matching processes (Jeannerod, 1999; Prinz, 1997), supported by the morphological similarities of the two agents’ bodies and a shared motor repertoire. By virtue of the common representational domain shared by observed and planned actions, we understand the goals of the actions we observe by simulating them in our own sensorimotor system (but see Csibra, 2008; Gallese & Goldman, 1998; Jacob & Jeannerod, 2005; Kilner, 2011). This sensorimotor coding feeds forward models of observed actions (Kilner et al., 2007), allowing realistic predictions about actions goals—what a partner is about to do—and very precise information about the sensorimotor consequences and real time kinematics of the actions—where and when exactly the action will unfold (Wolpert et al., 2003). Moreover, by comparing divergences between anticipated and actual outcomes (errors), the predictive models can be updated and, in turn, support better learning (Wolpert et al., 2003).
Predictions about other people’s actions are not triggered exclusively by the observation of unfolding actions. Predictions can also be based on knowledge concerning the task at hand and which actions should be performed in response to specific events. In a coordination context, this can involve entertaining a complex set of action representations about not just what one will do but also what a coagent will need to perform next. Such task co-representation occurs when an agent’s awareness of another’s task activates a corresponding task representation. Representing the partner’s task can trigger predictions about the actions that the coagent will be performing (Rocca & Cavallo, 2020; Schmitz et al., 2017, 2018; Sebanz et al., 2003). In many joint activities, prediction and coordination are supported by the knowledge about a partner’s role in the task. For example, by knowing that your partner is leading the interaction, you can expect their actions to be performed in a stable and predictable fashion. In turn, by knowing that your partner has no knowledge about the task, you can predict that certain modulations of your own actions—such as slowing down or performing predictable trajectories—will help them coordinate with you (Curioni et al., 2019). Knowledge about the partner’s task can support coordination even in the absence of direct perceptual feedback about the partner’s actions (Vesper et al., 2016). This provides evidence that even in the absence of direct perceptual information about a partner’s actions, coagents can and sometimes do accurately predict and integrate their partner’s actions into their own action planning and adapt their performance by using precise motor representations to anticipate their partner’s actions.
Planning individual and coordinated actions
Action prediction—of oneself and of partners—also plays an important role in action planning, as it allows for the sensory consequences of an action to be anticipated and later compared with the observed outcomes to monitor for errors. To investigate whether people form internal representations not only of their own actions but also of coagents’, Kourtis et al. (2013) recorded electroencephalograms from participants involved in a joint task. The authors found evidence of predictive motor planning preceding movement onset, showing that when people engage in interactive tasks, they represent each other’s actions before achieving coordination. In a further study, Kourtis et al. (2014) compared action planning in a coordinated joint action with planning to individually perform the same action (bimanually), indicating very similar motor activation during the planning phase of both kinds of actions. On the basis of their knowledge of the upcoming task, participants engaged in motor predictions concerning both their own and another’s contributions.
When performing joint actions, individuals seem to be able to apply action monitoring processes selectively to their own and another person’s actions, thus relying on distinct representations of individual as well as joint outcomes. For example, when expert pianists are performing a piano duet together, their neural responses to key errors are sensitive to whether the error affects only their individual melody or the joint harmony of the piece (Loehr et al., 2013). This suggests that, when involved in action coordination with others, people monitor not only their individual contributions but also their partners’ contribution and the combination of the two. It also suggests that action outcomes affecting the joint outcome are processed as more salient than those that affect only one individual’s part, as joint-action partners represent the joint action holistically rather than only their own contribution (Strachan & Török, 2020; Török et al., 2019, 2022).
Relying on internal forward models of their own and their partner’s actions, interacting agents can exploit the mismatch between the expected and the observed consequences of joint and individual actions to refine and improve these internal models, playing a central role in their ability to learn nearly optimal individual and joint-action plans (Pesquita et al., 2018).
Prediction, planning, and flexible learning
Prediction, planning, and execution mechanisms are fundamental in coordinated joint actions because they allow interaction partners to dynamically adapt to each other on-line. In cultural learning scenarios, however, such mechanisms face a unique problem, which is that one partner (the learner) does not have an existing representation of the task to allow them to form the fine-grained predictions and sophisticated motor plans that are hallmarks of expert coordination.
However, it is likely that action prediction may be recruited in a flexible manner in order to support the cultural learning of technical skills in various ways. For instance, while observing a model, a learner can form, through reiterated simulations, an internal representation of the model’s actions. From this, the learner acquires an increasingly refined model of a given action plan through the repeated observation of its movement kinematics, information crucial for when the learner will plan their own enactment of the same action. Furthermore, learners can recruit their own learning systems to iteratively test predictions about another’s observed actions, internalizing the outcome and prediction errors about the model’s behavior in a similar way to independent learning in order to develop increasingly sophisticated action representations (see Joiner et al., 2017, for a review).
Similarly, learners may invert action-planning mechanisms to reverse-engineer the internal mental states that give rise to particular behaviors. That is, rather than generating fine motor plans on the basis of shared representations as models do, learners may use observations of motor behavior to infer the higher order states that give rise to them (Baker et al., 2005, 2009; Jara-Ettinger et al., 2012).
On the model’s side, valuable information can be acquired about where and how the learner errs by anticipating the learner’s actions. These simulations can provide the model with expectations at the level of the goal of the action—what is the expected outcome of the observed action—but also at the level of the action itself—which aspects of the action are essential to achieve the action’s goal. By monitoring the learner’s goal and actions, the model can selectively detect where and when the action of the learner is going astray and implement the appropriate error-correction strategy in a timely fashion.
Research on vicarious prediction errors indicates that teachers in learning interactions represent the discrepancy between the target and realized outcome of their students’ actions in a similar way to their own errors (Apps et al., 2015). Similarly, in joint-action contexts, even when agents are not required to act on a partner’s movement, they spontaneously produce corrections to their partner’s errors that resemble the ones that emerge from self-generated actions both behaviorally (De Bruijn et al., 2012; Schuch & Tipper, 2007) and neurally (Bates et al., 2005; De Bruijn & von Rhein, 2012; Kang et al., 2010; Picton et al., 2012). Such costly monitoring processes may be a basic mechanism for collaborative, cultural learning through trial and error and the reciprocal correction of each other’s mistakes (Sacheli et al., 2021).
The ability to represent different parts of a task (e.g., one’s action and the action of a partner, and even the specific subparts of each action) allows a model to entertain task-specific and action-specific representations about what the learner can and will do and compare their current and desired performance. Such monitoring activities play a fundamental role in supporting the model’s adaptive behavior, as they allow the model to closely and precisely tailor the transfer of information to the learner’s needs, as well as to anticipate learning errors and misunderstandings (Rueschemeyer et al., 2015).
Studying task co-representation and action-prediction mechanisms in the context of cultural learning interactions raises several interesting questions. Key among them is how the necessary asymmetry in know-how between the model and the learner affects the performance, use, and expression of these mechanisms during coordinated action (Curioni et al., 2019). Previous studies have used simple tasks with physical constraints that do not require extensive motor learning and where both interactants have the same experience with the task (Rocca & Cavallo, 2020; Schmitz et al., 2017, 2018). When learning a technique, however, there is often a gap between the richer, finer grained, and more flexible action representation of the model and the sparser, coarser, more schematic action representation of the learner. Such an asymmetry places the onus of co-representation on the part of the model (see Vesper et al., 2016), as they will use their knowledge of the learner’s needs, capabilities, and constraints to define the interaction parameters of the coordination episode. For example, a teacher can design learning opportunities that will optimize the expected information yield for the learner, whereas an expert who expects learners to learn by doing can assign them necessary but achievable subtasks so that the learner can contribute to the overall action.
Some interesting avenues for future research would be to investigate models’ (e.g., teachers’) predictive and error-monitoring behavior as a function of the interaction history they share with a learner and whether models use such interaction history to adapt their behavior for future interactions. Considering that prediction and monitoring processes imply costs in terms of cognitive resources, we would expect both models and learners to take past performances into account so as to strategically tailor their efforts when engaging in cultural learning interactions with each other. For example, a model might stop monitoring a certain aspect of the learner’s behavior once they have enough evidence that the given aspect has been successfully mastered by the learner’s past trials. Conversely, a learner may at first rely heavily on iterated predictive mechanisms to construct a novel action representation, observing demonstrations and monitoring their own practices through the lens of hypothesis testing and motor exploration. But as their own skill develops, they may become less reliant on prediction and instead use more fine-grained monitoring or error-correction mechanisms that can fine tune their action plans. Characterizing how and when these mechanisms are recruited to compensate for knowledge asymmetries across sustained cultural learning interactions remains an avenue for future research.
Communicative Strategies That Support Action Coordination
Communication, ranging from rich verbal dialogues to very subtle adaptations to movements, is key to coordination. Conventionalized signals such as language or gestures are used to exchange information and achieve the alignment of mental representations necessary for a common ground between agents (Brennan et al., 2010; Clark, 1996; Garrod & Pickering, 2004). Language and gestural communication are already the focus of much interest in the study of cultural learning (e.g., see Tomasello, 2008), and so we will not examine these specifically (but see, e.g., Begus et al., 2014; Begus & Southgate, 2012; Southgate et al., 2007). Instead, we focus on how agents send more subtle, often nonconventionalized signals through their actions. Strategies such as these are important for the success of an interaction insofar as they allow for a low-cost and effective transfer of information between interacting agents, often without the need for full-blown verbal communication.
Sensorimotor communication
When interacting with one another, agents often modulate their movements in subtle yet highly informative ways in order to facilitate the coordination of their actions. An action executed primarily to fulfill an instrumental goal can be modulated in order to carry a communicative signal that allows a coagent to read the intention behind that action more easily, as well as better predict how it may unfold in space and time (Pezzulo et al., 2013, 2019). For example, McEllin et al. (2018a) found that agents playing a xylophone melody in synchrony, compared with playing a melody alone, would exaggerate the spatial profile of their movements in order to make the goal of their actions (i.e., the end location of the movement) easier to disambiguate and would slow down their movements in order to make themselves easier to synchronize with.
Communicative signals of this sort are produced flexibly insofar as they are tailored to a coagent’s perceptual access to the actions, performance history, and specific constraints imposed by their joint task (Candidi et al., 2015; McEllin et al., 2018a; Vesper & Richardson, 2014). Moreover, these signals are embedded in instrumental actions and so do not necessarily rely on previously conventionalized codes. In contrast to linguistic or gestural forms of communication, which impose significant cognitive demands with regard to semantic processing, these embedded signals require relatively little physical and cognitive effort to produce and interpret (Pezzulo et al., 2019).
Haptic information sharing
Individuals who are physically coupled can also modulate the force of their movements in order to exchange haptic signals with one another, thereby facilitating goal recognition and action prediction, and additionally constrain and stabilize each other’s movements in the face of external perturbations (Ganesh et al., 2014; Melendez-Calderon et al., 2015; Takagi et al., 2017; van der Wel et al., 2011). Haptic signaling is similar to sensorimotor communication insofar as it provides information that augments the spatial and temporal resolution of an action’s prediction without relying on a conventionalized code, thus demanding relatively little work on the recipient’s end compared with gesture and language (Pezzulo et al., 2019).
Communicating through the haptic channel may be advantageous compared with communicating through other modalities. When coordinating actions, very fine-grained information such as precise timing and specific effector configurations (including invisible cues such as tension or grip pressure) may be lost in the process of translating visual to motor representations because the visual system does not have the resolution to deal with such information. The haptic channel can provide interaction partners with a means for transmitting this very fine-grained information in a way that circumvents the correspondence problem (Brass & Heyes, 2005), allowing for the direct transmission of very specific effector configurations that are not afforded by the visual system.
Informatively modulating variability
When attempting to achieve precise coordination, agents have been shown to strategically reduce the variability of their movements in order to make them easier to predict, thus easier to align temporally. For example, a study by Vesper et al. (2016) demonstrated that when agents in a coordination task could not access each other’s movements perceptually, they significantly reduced the variability of their movement time by moving as fast as possible, thus allowing them to coordinate effectively.
More broadly, the flexibility by which we engage in joint actions is further demonstrated by how we create and use strategies to share information through varying channels to support coordination, even when opportunities to share such information are scarce. For instance, Vesper, Schmitz, & Knoblich (2017) found that in the absence of any other perceptual feedback, leaders in a synchronized joint action could learn to modulate their movement timing in such a way that they could effectively segment their actions, allowing followers to identify where the movement would end on the basis of how long the leader took to move to the target.
This flexibility is particularly important because, in many cases, people find themselves engaging in interactions in which there is no obvious means of communicating with each other, perhaps because of environmental constraints (e.g., low visibility, noise) or constraints specific to the task (e.g., furtive hunting). In such situations, the flexible capacity to contextually adopt different communication strategies and channels is important for ensuring successful coordination.
Communicative strategies and flexible learning
There is growing evidence that communicative signals embedded in instrumental actions are used during cultural learning interactions similarly to coordinated joint actions to shape the perceptual input of a learner and help them acquire the to-be-learned technique in a manner that is flexibly tailored to the specific needs of the learner. Such embedded signals may help learners parse an action sequence and highlight particularly learning-relevant components of the sequence (Brand et al., 2002; Koterba & Iverson, 2009; McEllin et al., 2018a; Tominaga et al., 2021). Importantly, these cues do not seem to be integrated into the learners’ own reconstruction of the observed actions, demonstrating that they act purely as communicative cues and that they are not processed as part of the instrumental action (Strachan et al., 2021). One study by Okazaki et al. (2019) demonstrated that when taking turns completing the same task, teachers and learners dynamically tailor their actions in order to scaffold and fulfill the specific needs of the learner. Moreover, McEllin et al. (2018a) demonstrated that although actions with the same instrumental goal can have different kinematic profiles depending on whether they are produced in a teaching context or a temporal coordination context, people do modulate some of the same cues when teaching as when coordinating. This points to the idea that signals produced when coordinating with each other can also double up as cues that support learning.
Haptic information sharing is also potentially very useful with regard to scaffolding cultural learning. It can prove particularly useful by exploiting the high temporal resolution of the motor system when one is teaching actions that require precise timing. For instance, Feygin et al. (2002) demonstrated that when learning an action sequence, learners who were given computer-based haptic feedback more accurately acquired the sequence’s timing, compared with learners given visual feedback. Conversely, models could also exploit the higher resolution of the haptic channel to provide better motor feedback to the learner. It is thus likely that, in many learning contexts, haptic information sharing provides a better learning experience than is afforded by the visual system alone.
The ability to physically constrain a learner’s movements can also be used for modulating the variability of a learner’s actions, allowing for more or less variability depending on the learner’s needs. At early stages of learning, a model can use haptic coupling in order to decrease a learner’s movement variability, thereby reducing the large number of degrees of freedom of a potential goal-oriented action to a narrower range (Bernstein, 1967; Newell, 1991). Such constraints can help a learner get started by reducing the initial possibility space of the action.
Reducing variability is a hallmark of coordinated joint action, as it can help to facilitate prediction across trials. However, higher levels of variability have also been shown to provide advantages with regard to motor learning, allowing an agent to more effectively explore the possibility space, thus allowing them to find the best way for them to execute the action (Wu et al., 2014). Moreover, when coupled with an agent who is highly variable with regard to action execution, those with less variable movements can exploit their partner’s variability in order to explore the possibility space more effectively and improve their own learning (Sabu et al., 2020). Thus, variability can be usefully modulated, with reductions making oneself easier to predict and coordinate with and increases allowing for more efficient learning and exploration. Strategic modulation of trial-to-trial variability may be a useful tool for teaching by allowing a model to modulate the learner’s movement variability systematically in order to help them both reduce and explore an action’s possibility space. Exposure to variability can also be achieved by learning through multiple models, which has been linked to better short-term retention of motor skills (Andrieux & Proteau, 2013, 2014; Rohbanfard & Proteau, 2011) and leads to higher fidelity transmission (Muthukrishna et al., 2014).
One interesting avenue for further research would be investigating how signals produced in order to facilitate coordination can lead to qualitatively different outcomes than signals produced only to teach. This could be particularly interesting in the context of transmission-chain experiments, for instance, by investigating whether or not learning through asynchronous demonstration or learning through synchronous coordination leads to qualitatively different “traditions” emerging from the same starting task. One prediction could be that, compared with actions transmitted through generations by nonsynchronous demonstration, actions learned through synchronous, coordinated movements may result in a higher degree of temporal fidelity (i.e., timing is more stable across generations). Although such signals are presumably intended to facilitate prediction of actions, as the movement is still unfolding, the fact that they are making it easier not only to anticipate the temporal dynamics of the movement but also to pragmatically communicate about relevant timing features (Strachan et al., 2021) could result in a more stable long-term representation of the underlying temporal dynamics.
Haptic teaching also holds promise for future research. One could investigate whether haptic information transfer yields advantages through generations with regard to how quickly individuals can master a particular skill. Moreover, transmission-chain experiments could demonstrate how haptic information sharing leads to different patterns of loss and retention of particular aspects of an action sequence throughout generations, when compared with visual demonstration. One could predict that very fine-grained and nuanced aspects of an action sequence (such as precise timing or bodily configurations) would be transmitted through generations more faithfully for action sequences demonstrated haptically compared with those transmitted visually.
Cultural Learning and Coordination: Further Questions
In the three previous sections, we argued that cognitive and behavioral mechanisms for action coordination, which allow partners to engage in sophisticated social interactions with each other through mutual adaptation, are also actively involved in and constitutive of episodes of cultural learning. These action coordination mechanisms support the flexibility by which we can learn the same skills in various learning contexts and through different social interactions. Moreover, we have detailed how each of the supportive mechanisms we described offer exciting avenues for future work expanding our understanding of cultural learning. They also ask questions as to how the various contexts are understood and possibly exploited by the learners and models.
This change in approach—focusing on the impact of supportive mechanisms in addition to cultural learning processes—also opens important questions beyond the underlying causes of learning flexibility. By understanding the ubiquity of flexible cultural learning as the result of the recruitment of various supportive mechanisms, we can start studying how the specific ways that learners and models coordinate can impact not only how techniques are learned on-line but also the longer-term development and expression of skilled actions. Indeed, our approach proposes broader implications for joint action, developmental and comparative psychology, social learning strategies, and the study of transgenerational cultural evolution. We summarize these questions in Table 1.
A Selected Overview From Different Research Fields of Outstanding Questions Raised by Integrating the Study of Cultural Learning With Mechanisms for Joint Action Coordination
Joint action
Throughout this article, we have proposed several research questions targeted toward specific joint-action mechanisms and how they affect and are affected by cultural learning interactions (see Table 1). Broadly speaking, however, studying the phenomenon of cultural learning offers exciting challenges for the field of joint action, which has tended to focus on in-the-moment coordination of actions among dyads of equal proficiency or knowledge of a task and where this proficiency remains stable over the course of the interaction. Coordination between models and learners during cultural learning therefore raises new questions for joint-action researchers about how individuals coordinate in less stable scenarios. In order for learners and models to coordinate during learning—either when teaching or when the learner is acquiring some skill by shadowing or helping the expert model—they must contend with two problems: How are they to coordinate when one individual (the learner) does not know the task? And how do they establish successful coordination strategies when the task-relevant knowledge and skills of the learner are changing dramatically from one trial to the next? The question of cultural learning opens a rich avenue for joint-action research to study how individuals coordinate under such adverse conditions—the shifting sands of a learning interaction—and the role of interaction history and domain-general coordination strategies in facilitating joint action when one participant is actively constructing the necessary skill on-line.
Developmental and comparative questions
Much of the flexibility and variability in cultural learning interactions is the result of individuals’ propensity to coopt supportive mechanisms from other domains, such as those used to coordinate joint actions. Accordingly, we drew on experimental literature detailing these mechanisms in adult humans specifically. An outstanding question is how our capacity for cultural learning and ability to engage in interpersonal interactions develop. What is the relation between the developmental trajectories of these learning capacities, and what impact does the emergence of specific coordination mechanisms have on infants’ and children’s cultural learning across development? How does the ability to compute the visuospatial perspective of another agent, for instance, correlate with the ability to learn more complex motor skills and engage in sophisticated coordinated actions? An analogous question may also be considered from the comparative perspective: Studying the presence and nature of coordination mechanisms in nonhuman animals alongside their capacity for social learning may inform comparative research and offer insights into the phylogenetic trajectory of these supportive mechanisms. However, there is little empirical work investigating such cooperative mechanisms in great apes (but see Voinov et al., 2020).
Social learning strategies
A key concept in cultural learning is that of selection biases, termed social learning strategies, that determine what behaviors learners copy, from whom, and when (Kendal et al., 2018; Laland, 2004). Although a taxonomy of such strategies covers a broad range of learning conditions that are sensitive to the learner’s dispositional state (e.g., copy if uncertain), the frequency of the behavior (e.g., copy the majority), or properties of the model (e.g., copy successful or prestigious individuals), specific strategies can typically be distilled to simple heuristics consisting of if-then decision structures, where the probability of learners engaging in social learning (as opposed to nonsocial learning or no learning) is determined by satisfying a set of preconditions (Kendal et al., 2018).
Considering cultural learning episodes as dynamic, coordinated interactions over a period of time proposes new social learning strategies with respect not only to engaging in but also to sustaining learning interactions. Approaching cultural learning episodes as sustained interactions allows both learners and models to choose, at any given point, to continue or abandon the learning interaction (leading the learner to fail acquiring the knowledge or having to switch to a new model) or to change the coordination parameters while maintaining the learning interaction, for instance by changing the interactional role assignments (e.g., by promoting a discreet, observing novice to a hands-on apprentice) and/or task distribution (e.g., from grinding pigments for a master painter to actually participating into the painting tasks).
Research on the factors that sustain joint action, such as commitment (Michael, 2022; Michael et al., 2016; Vignolo et al., 2021), may shed light on the strategies that learners use to select what to learn, from whom, and how, but also for how long. For instance, people who engage in coordinated joint actions are more likely to make commitments to each other and to the task at hand, with commitments and expectations thereof being generated by cues within the interaction (Bonalumi et al., 2019; Michael et al., 2016, 2020). Moreover, agents are more likely to make commitments to coagents who adapt the timing of their movements or engage in perspective taking in order to optimize coordination (McEllin et al., 2023) or to coagents who send communicative signals to share task-relevant information (McEllin & Michael, 2022). Thus, in addition to directly supporting joint actions by helping agents coordinate, these mechanisms indirectly support joint actions by fostering commitment within the interaction.
Can something similar be said about teaching? We have argued that joint-action mechanisms lend direct support to teaching by facilitating the transmission of technical aspects of knowledge. Could the very same mechanisms also lend indirect support to teaching interactions by reinforcing social aspects of the transmission of knowledge, for example, by fostering commitment between model and learner? Would a learner be more committed to a model who adapts their movements to communicate information about important aspects of the technique compared with a model who does not? How coordination mechanisms that support the transmission of knowledge also influence the choice of learning partners and the maintenance of the learning episode remains an interesting, open question that is best addressed by approaching cultural learning through the framework of joint action.
Cultural evolution
We draw a distinction between social learning processes, such as emulation, imitation, and teaching, and supportive mechanisms, aiming to characterize the latter using a selection of candidate mechanisms canonically involved in joint action coordination. Such a distinction raises several questions about the relative contributions of these mechanisms for cultural learning and cultural evolution. Supportive mechanisms such as those we describe are clearly involved to a greater or lesser degree in cultural learning interactions. However, their impact on the cultural evolution of traditions at the population level remains an open question. One possibility is that supportive mechanisms do not have downstream impacts on the evolution of technical traditions but instead merely facilitate learning interactions to happen at all. According to this view, the social learning processes involved in cultural learning (e.g., imitation, emulation, teaching) would be mainly responsible for the transmission, stabilization, and evolution of technical practices. However, considering that up to now empirical and experimental work on cultural evolution frequently either ignores or minimizes the relevance of supportive mechanisms, their importance in shaping long-term traditions and evolutionary patterns remains an open question.
We see some further open questions in cultural evolution that may particularly benefit from a consideration of supportive mechanisms, especially those of joint action and action coordination. In particular, the production of variability and its effect on cultural transmission may be better explained by supportive coordination mechanisms than by social learning processes. Given that these processes typically aim to explain the high fidelity of cultural learning (Charbonneau, 2020; Charbonneau & Bourrat, 2021), variability is typically treated as copying errors—the unwanted by-product of imperfect learning systems in stochastic environments. However, it has long been established in the cognitive science literature on learning that variability can produce robust generalization in learned behaviors (for a review, see Raviv et al., 2022), pointing to the fact that variability during learning is more than just random error but a beneficial feature for the motor system to acquire flexible skills. This becomes all the more important during social interactions surrounding cultural learning, as the structure of variability can be modulated to facilitate action prediction between partners (Sabu et al., 2020; Vesper et al., 2011), and variability can take on higher order pragmatic implications when it is produced by a model looking to communicate. Accounts that consider only social learning processes that treat variation as random copying errors (i.e., noise) cannot account for the important scaffolding role of variability in learning and interactions and cannot fully capture the nature of variation across transmission (Strachan et al., 2021). We believe that a full understanding of the dynamics affecting cultural learning and its long-term, transgenerational effects would greatly benefit from examining the influence of supportive mechanisms in addition to the influence of social learning processes.
Conclusion
Anthropological research on cultural learning documents how the transmission of technical knowledge and skills occurs in a great variety of learning contexts and involves changing social configurations in both time and space. Our capacity for cultural learning in such variable environments is a testament to our flexibility in opportunistically recruiting different cognitive and behavioral mechanisms and interaction structures to ensure the successful transmission of technical traditions.
In this article, we argued that in order to understand how humans can flexibly adapt their learning interactions in different social and ecological contexts, a focus on dedicated learning mechanisms (such as imitation and emulation) is not sufficient. Given that these mechanisms are assumed to be context general and are often defined in terms of the content that is being transmitted and the accuracy of the transmission, they do not indicate how cultural learning operates in and is flexibly adapted to different contexts. In contrast, we argue that supportive mechanisms are better candidates for explaining flexible cultural learning, that is, those cognitive and behavioral mechanisms not designed for cultural learning but that can be recruited ad hoc in learning interactions to allow the successful transmission of technical knowledge. More specifically, we have argued that a key feature of the flexibility of cultural learning is that both the models and learners recruit cognitive mechanisms of action coordination to modulate their behavior contingently on the behavior of their partner, generating a process of mutual adaptation that supports the successful transmission of technical skills in diverse and fluctuating learning environments. By focusing on the coordinative dimension of cultural learning—both on the side of learners and the models they learn from—we have provided a more interactive understanding of cultural learning and its adaptability to contextual learning factors.
To achieve a richer understanding of cultural learning, its supportive mechanisms, and their impact on the transmission of technical traditions, however, we believe that a better synergy between cultural learning studies and the field of joint action is necessary. Joint-action research mostly focuses on how we coordinate with one another in order to accomplish some shared goal and has developed a rich literature addressing the underlying mechanisms involved in action coordination and their adaptive recruitment in different task scenarios and action contexts. However, joint-action research has not studied the transmission of skill as a form of coordinated action and has instead focused on informational exchange within specific instrumental task scenarios. Inversely, studies of cultural learning generally do not focus on the supportive coordination mechanisms providing more dynamical interaction possibilities, allowing learners and models to adapt and exploit to richer learning contexts and comparatively examine how different coordination mechanisms are recruited in those different contexts. We have indicated how the study of cultural learning as a form of coordination can allow the two fields, their methods, and existing results to complement each other in various productive ways.
