Sage Journals: Discover world-class research

Abstract

Intersubjectivity, the coordination of people’s beliefs and actions, is central to human psychology and society. However, a legacy of Cartesian dualism has fragmented research along two dimensions (psychological vs. interpersonal; structural vs. interactional). This fragmentation obscures how the psychological and interpersonal aspects of intersubjectivity mutually reinforce one another. I argue that, despite epistemological differences, these literatures converge on recurring formal correspondences: three recursive levels of perspective-taking (direct, meta, meta-meta), three-turn interaction sequences (initiation-response-feedback), and triadic self-other-object relations. Building on these convergences, I propose a minimal model of dialogical intersubjectivity that entails three-turn social interaction generating and sustaining three levels of perspective-taking. Each turn can take the prior turn as its object, thereby adding a level of perspective-taking. The exchange of turns rotates participant positions (e.g., speaker becomes listener), introducing an external vantage point on the prior contribution, thus ratcheting up intersubjectivity by one level. This model explains how implicit assumptions become explicit and how social interaction patterns give rise to psychological processes.

Keywords

intersubjectivity perspective-taking turn-taking dialogism social interaction internalization

Introduction

Intersubjectivity is fundamental to psychology and society. It is central to social coordination, communication, and the stream of thought. It spans from strategic negotiation to embodied simulation, and from narratives to inner dialogues. Yet, because intersubjectivity is central to so many domains, research is fragmented, and basic questions about its structure, function and development remain contested.

Intersubjectivity is like the elephant in the ancient Indian parable that is encountered in complete darkness. Not knowing what it is, people explore it by touch. Each encounters a different aspect, identifying it variously as a snake (trunk), spear (tusk), pillar (leg), and brush (tail). Like a hand that can only feel one aspect of the elephant at a time, each approach to intersubjectivity has provided a valid but incomplete account.

Consider the diversity of definitions. Psychological approaches define intersubjectivity variously as embodied simulation, explicit cognitive perspective-taking, or dynamic inner dialogue (Aveling et al., 2015; Trevarthen, 2015; Vogeley, 2017). Interpersonal approaches define it structurally in terms of agreement, misunderstanding, feeling understood, and dynamically in terms of coordination, conversational repairs, and consolidating mutual understanding (Bavelas et al., 2017; Gallagher, 2020; Schegloff, 1992). Consequently, depending on the approach, intersubjectivity is characterized either as a structural matrix of cognitive representations or as a performative interaction, occurring either within or between people.

This fragmentation can be traced back to Descartes’ (1637) infamous dualism, namely, the ontological separation of mind and matter (Gillespie, 2006a). This dualism has bequeathed theories of intersubjectivity that proceed either from inside, through the mind (e.g., Husserl, 1931), or from the outside, via the social relations (e.g., Mead, 1913; Schutz, 1932). Approaches from the inside start with the self, while those from the outside start with the self-other relationship (Crossley, 1996). However, Descartes also created a second split in the literature. He sought timeless, rationalistic truths underlying the flux of experience, thus privileging structures. The alternative to this Cartesian paradigm is the Hegelian paradigm that focuses on how minds and societies develop (Marková, 1982), thus privileging dynamic processes and interactions. Thus, the debate between the Cartesians and the anti-Cartesians created a split between approaches that focus on structures (e.g., mental representations) or interactions (e.g., moment-to-moment sequences).

Figure 1 uses these two dimensions of our Cartesian legacy to conceptualize the literature on intersubjectivity. The psychological-structural approaches focus on people’s cognitive and embodied beliefs about other people’s beliefs. The interpersonal-structural approaches focus on comparing perspectives between people and groups. The interpersonal-interactional approaches focus on moment-to-moment coordination between people. The psychological-interactional approaches focus on the dynamic interplay of perspectives, or voices, within psychological experience.

Figure 1.

Overview of approaches to intersubjectivity

In the center of Figure 1 are the theories that straddle Descartes’ dualism trying to explain how social interaction gives rise to psychological intersubjectivity. For example, Trevarthen (2015), Tomasello (2019), and Gillespie and Martin (2014) have focused on how specific interactions (mirroring, imitation, emotional attunement, culture, language, and exchanging positions) scaffold psychological intersubjectivity, becoming internalized as psychological functions (e.g., inner dialogue, self-reflection). However, the mechanism connecting interactional and psychological intersubjectivity has remained unclear.

I propose that a key bridging mechanism is turn-taking. Specifically, I argue that human intersubjectivity usually entails three levels of perspective-taking (direct, meta, and meta-meta) and these are incrementally created through minimally three-turn interactions (initiation-response-feedback sequences) within triadic relations.

My aim is not to resolve the longstanding epistemological puzzles inherited from Descartes. Nor am I proposing to collapse the four quadrants or privilege one over others. My epistemological standpoint is pragmatist; theories and their associated epistemologies are just tools that can be more or less useful (Gillespie et al., 2024). Each quadrant is a different way of looking at the elephant of intersubjectivity. While they might be logically incompatible (because of incommensurable assumptions), they are not empirically incompatible (they pertain to the same observables). The psychological is not a mysterious inner substance but a real semiotic activity that can become observable in talk.

My analysis contributes to theories of intersubjectivity by identifying deep convergence across four siloed literatures on three-level, three-turn, and triadic relations. This convergence is the set-up for my main contribution: proposing turn-taking as the mechanism through which interpersonal interaction scaffolds recursive psychological perspective-taking. This mechanism can explain how implicit intersubjective assumptions become explicit and how social interaction patterns become internalized as ‘inner’ dialogue.

I define intersubjectivity as any coordination (‘inter’) of subjective states (‘subjectivity’) occurring either within or between people (Gillespie & Cornish, 2010). This definition spans all four quadrants of Figure 1. I will also use the shorthand terms psychological, interpersonal, structural, and interactional intersubjectivity to refer to the right, left, top, and bottom quadrants of Figure 1, respectively. I will use the terms implicit and explicit to refer to that which has not been verbally indexed as compared to that which has been verbally thematized (brought into language). Turn-taking will refer to the observable interactional change of speakership or turns of action (e.g., in a game). Although I focus on verbal turn-taking, in practice, it is richly supported by non-verbal cues (Mondada, 2007) and occurs in non-human species (Mondémé, 2022). Perspective-taking will refer to the psychological (i.e., intra-psychological) attempt to adopt another person’s perspective.

To theorize perspective-taking as the psychological corollary of turn-taking, I begin by reviewing the research in each quadrant. This reveals striking convergences on three-level intersubjectivity, three-turn interactions, and triadic relations. These convergences, I argue, are not coincidental; each quadrant addresses the same phenomenon, albeit from a different epistemological stance. Then I use these convergences to build a model of dialogical intersubjectivity, in which the interactional processes of turn-taking scaffold and mutually reinforce the psychological processes of perspective-taking. However, before arriving at this argument, the next four sections aim to pinpoint the convergences upon which my argument builds.

Psychological-Structural Approaches

This quadrant inherits Descartes’ focus on the mind: intersubjectivity resides in cognitive structures. This approach looks ‘within’ people, rather than ‘between’ them. The aim is to identify stable representations in the subject’s mind that pertain to other minds. This approach provides precise vocabulary for conceptualizing these mental representations, but risks studying them in isolation from social interactions. This literature spans cognitive, developmental, social and evolutionary psychology. The key insight for my argument is that people do not merely represent other minds; they also model how those other minds represent them in return.

Theory of Mind

Theory of mind refers to the cognitive capacity to empathize with and understand other minds (Gopnik & Wellman, 1992). The term ‘theory’ reflects the premise that other minds, being inaccessible to direct observation, are always inferred. The theory of mind concept became widespread in developmental psychology and has since been used in many domains (Wellman, 2018). Although measuring theory of mind is challenging (Warnell & Redcay, 2019), there is converging evidence for dual systems of embodied simulation and reflective theorizing (Keysers & Gazzola, 2007; Vogeley, 2017).

Simulation-theory of mind focuses on how people understand the mental states of others by activating equivalent neural circuits (mirror neurons; Gallese & Goldman, 1998). This leads to embodied feelings of empathy, upon which cognitive representations are constructed. Simulation is a spontaneous, non-verbal, embodied resonance between self and other. For example, empathy for pain activates similar brain regions as the direct experience of pain (Lamm et al., 2011).

Theory-theory of mind focuses on the innate and folk psychology beliefs that people have about other minds (Gopnik & Wellman, 1992). It builds upon the mirror neuron system to create more abstract and reflective mind-reading abilities (Gallese & Goldman, 1998). Like naïve scientists, children’s initially simplistic theories of other minds lead to errors that prompt more refined theories (Saxe, 2005), building up from understanding desires and beliefs to representing hidden emotions and false beliefs (Wellman, 2018).

The theory of mind literature has identified the embodied and cognitive components that underlie mental state representation. However, it has tended to ignore the system of social relations that it is, not only designed to represent, but also embedded in (Hughes & Devine, 2015). Additionally, it has tended to neglect higher-order perspective-taking (i.e., understanding other people’s theory of mind). For the present argument, the question is not only how these representations are structured but also which interactional patterns elicit and ratchet them toward greater complexity.

Perspective-Taking

Perspective-taking is defined as the act of trying to adopt another person’s perspective by imagining their thoughts, feelings, and experiences (Epley et al., 2004). Originally, ‘perspective’ referred to a behavioral orientation (Mead, 1925; O’Toole & Dubin, 1968). It has since been reconceptualized as a cognitive structure, measured with surveys and manipulated with primes (e.g., suggestions to think from the standpoint of others). Research has found that it is positively associated with social cognition, such as decision-making (Tuazon et al., 2019) and creativity (Hoever et al., 2012).

One advance has been discovering the importance of meta-meta-perspectives. These include, for example, felt understanding which refers to whether one thinks another person or group values one’s beliefs (Livingstone, 2023). Research has shown how these beliefs about other people’s beliefs about our own beliefs are important for trust (Thomas et al., 2014), intergroup conflict (Lees & Cikara, 2020), polarization (Lees & Cikara, 2021), and social identity (Livingstone et al., 2019). Accordingly, it is increasingly recognized that perspective-taking must be conceptualized within a broader multi-level architecture of intersubjectivity.

Until recently, this approach had also set aside the relation between perspective-taking and social interaction. However, research has found that priming perspective-taking is much less effective than ‘perspective-getting’ (e.g., asking questions; Eyal et al., 2018). Outside of the laboratory, people don’t merely imagine other people’s perspectives, they talk to them (Kalla & Broockman, 2023; Phelps et al., 2025). This reveals that perspective-taking also needs to be conceptualized in the context of social interaction. Remaining exclusively on the psychological side of Descartes’ dualism is problematic because perspective-taking is inextricably linked to interpersonal perspective-getting.

Orders of Intentionality

Research on orders of intentionality examines recursive perspective-taking. This approach starts with the insight that to understand another mind is to understand the intention of that mind toward an object (Dennett, 1983, 1989). This ‘intentional stance’ conceptualizes recursive levels: zero-order intentionality does not entail a belief, as it is mere action; first-order intentionality is a belief, but not a belief about a belief; second-order intentionality entails beliefs about beliefs; third-order intentionality entails beliefs about beliefs about beliefs, and so on.

Naturally occurring conversations contain many orders of intentionality (Dunbar et al., 1997). However, findings on the typical number of orders of intentionality are conflicting. A study of undergraduates recalling stories with complex mental states found that the number of errors dramatically increased after four orders of intentionality (Kinderman et al., 1998). In contrast, jokes have been found to operate at six or even seven orders of intentionality, and were judged to be most funny at five orders of intentionality (Dunbar et al., 2016).

The literature on orders of intentionality provides rich terminology for conceptualizing recursive intersubjectivity and valuable empirical evidence on its pervasiveness. However, the number of levels routinely used remains contested (more on this debate later).

Takeaway: Recursive Levels of Intersubjectivity

Table 1 compares the psychological-structural approaches to intersubjectivity. These approaches also tend to conceptualize the initial forms of intersubjectivity as embodied and implicit, while the latter more recursive levels are more explicit, cognitive, and verbal. These approaches have also increasingly converged on the importance, not only of representing other minds, but also representing how those other minds represent other minds.

Table 1.

Psychological-Structural Approaches to Intersubjectivity

Representational level	Theory of mind	Perspective-taking	Orders of intentionality
Embodied orientation to X	Simulation-theory (implicit)		Zero order (implicit)
Beliefs about something		Perspective	First order (explicit)
Beliefs about beliefs about X	Theory-theory (explicit)	Perspective-taking	Second order
Beliefs about beliefs about beliefs about X		Meta-meta-perspectives	Third order
Beliefs about beliefs about beliefs about beliefs about X			Fourth order etc.

Unanswered questions within these psychological-structural approaches include: How many explicitly indexed levels of intersubjectivity are required for routine coordination? How should the implicit and explicit components of intersubjectivity be conceptualized? And, most importantly for my argument, how are these structures of intersubjectivity scaffolded by specific patterns of interaction? These questions are difficult to address if this approach remains siloed within the cognitive side of Descartes’ dualism.

Interpersonal-Structural Approaches

This quadrant approaches intersubjectivity from the outside: it relocates intersubjectivity into relations that can be compared across persons (agreement, accuracy, misunderstanding). These approaches originate in critiques of Descartes’ (1637) transcendental ego as solipsistic. Mead (1913) and Schutz (1932), for example, situated intersubjectivity in the space between people. The key idea for my argument is not only that Self (S) and Other (O) have different perspectives on the world (X), but also that they recognize this difference and, to some extent, understand that the other recognizes it as well.

Misunderstandings

Ichheiser (1943, 1949) proposed that when Self (S) and Other (O) meet, six representations come into contact: how each person sees themselves, how each person believes they are seen by the other, and how each person sees the other. To conceptualize misunderstandings, he introduced the axiomatic distinction between S’s expression (i.e., S’s verbal or non-verbal communication) and the impression it makes on O (i.e., O’s interpretation). This distinction became fundamental to research on attribution (Heider, 1958), impression management (Goffman, 1959), and interpersonal perception research (Kenny, 1994).

For analyzing intersubjectivity, Ichheiser’s framework had two limitations. First, the object around which the perspectives are being coordinated changes from ‘self’ to ‘other’, which means that the levels are not coordinating around the same object. Second, although Ichheiser analyzed misunderstandings, his framework did not enable him to conceptualize how misunderstandings are resolved. Resolution entails realizing that there is a misunderstanding (i.e., S believes that O’s belief about S’s belief is incorrect), which is a level above his framework.

Interpersonal Perception

Laing and Colleagues (1966) distinguished three levels of intersubjectivity: direct perspectives (S’s beliefs), meta-perspectives (S’s beliefs about O’s beliefs), and meta-meta-perspectives (S’s beliefs about O’s beliefs about S’s beliefs). They argued that without meta-meta-perspectives there would be no way to resolve misunderstandings, as there would be perspective-taking (meta-perspectives) without any awareness that the perspective-taking might be inaccurate.

Using this three-level framework, Laing and Colleagues (1966) developed operational definitions of agreement/disagreement (comparing S’s direct perspective with O’s direct perspective), understanding/misunderstanding (comparing S’s meta-perspective with O’s direct perspective), and realization of understanding/misunderstanding (comparing S’s meta-meta-perspective with O’s meta-perspective). They also distinguished perceived agreement (S’s direct perspective with O’s meta-perspective), feeling understood (S’s direct perspective with S’s meta-meta-perspective), and perceived understanding (S’s meta-perspective with S’s meta-meta-perspective).

Laing and Colleagues’ (1966) framework improves on Ichheiser’s (1943) by keeping each level focused on the same object, explaining how misunderstandings are resolved, and providing a rich terminology for describing complex interpersonal relations. This framework has been expanded to conceptualize societal consensus (Scheff, 1967), personal relationships (Hinde, 1997), and interpersonal perception (Kenny, 1994). Research using this framework has compared the perspectives of parents and children (Sillars et al., 2005), pharmacists and clients (Assa-Eley & Kimberlin, 2005), doctors and patients (Kenny et al., 2010), care-givers and care-receivers (Moore & Gillespie, 2014), and people with autism and their parents (Heasman & Gillespie, 2017). These empirical studies have used questionnaires, but have only focused on direct and meta-perspectives, not meta-meta-perspectives.

Coorientation Framework

The coorientation model of communication, proposed by McLeod and Chaffee (1973), moves beyond person-perception toward perspectives coordinated around any object (X). This model combines Laing et al.’s (1966) interpersonal framework with Newcomb’s (1953) analysis of how S and O co-orient around a generic object X. Therefore, while the interpersonal perception approaches focus on S’s and O’s perceptions of S and O, the coorientation framework focuses on S’s and O’s perceptions of any object X.

The coorientation model has been widely used to study communication patterns between family members (Koerner & Schrodt, 2014; Sillars et al., 2005), employees (Seltzer & Mitrook, 2009; Van Riel & Fombrun, 2007), and human-chatbot relations (Jang & Lee, 2023). As with the interpersonal perception approaches, this research has tended to use questionnaires, and has also tended to neglect meta-meta-perspectives in empirical research.

Takeaway: Three Levels of Explicit Perspective-Taking in Dyadic Relations

Table 2 compares the interpersonal perception approaches to intersubjectivity. These approaches assume two parties (S and O) and map out each perspective (p) in relation to each other (S and O) or an object (X). Conceptualizing the perspectives of S and O simultaneously is valuable because it enables comparing perspectives to reveal accuracy, agreement/disagreement, and understanding/misunderstanding.

Table 2.

Interpersonal-Structural Approaches to Intersubjectivity

	Misunderstanding		Interpersonal perception		Coorientation framework
	S	O	S	O	S	O
Direct perspectives	Sp(S)	Op(O)	Sp(S)	Op(O)	Sp(X)	Op(X)
Meta-perspectives	SpOp(S)	OpSp(O)	SpOp(S)	OpSp(O)	SpOp(X)	OpSp(X)
Self’s perspective on other	Sp(O)	Op(S)
Meta-meta-perspectives			SpOpSp(S)	OpSpOp(O)	SpOpSp(X)	OpSpOp(X)

S = Self; O = other; p = perspective; X = any object.

A limitation of these interpersonal-structural approaches is their detachment from social interaction. What are the interactional and communicative processes that sustain this web of intersubjectivity? Despite providing a compelling argument for the importance of meta-meta-perspectives, why has the empirical research neglected them? And most importantly for us, why does the structure of intersubjectivity require three levels? These questions are difficult to address if one remains focused on timeless structures. All these questions presuppose social interaction, since that is how misunderstandings are resolved. The next section provides the missing ingredient: three-turn sequences enable understandings to be displayed, repaired, and consolidated.

Interpersonal-Interactional Approaches

This quadrant pushes furthest from Descartes. It is neither psychological nor structural. Instead, it conceptualizes intersubjectivity as something that grows within situated interaction; it is a performance rather than a thing. The key is how each turn in an interaction sequence responds to the prior turn, not unobservable mental states. This research spans from non-verbal interactions to highly linguistically mediated forms of intersubjectivity. For my argument, this quadrant contributes a crucial ingredient: turn-by-turn organization, especially the functional role of third turns in both repairing and consolidating mutual understanding.

Enactive Intersubjectivity

Enactive intersubjectivity is embodied coordination achieved through direct physical interaction, without requiring mental representations. This approach begins with a criticism of the psychological approaches and shifts the focus to observable interactions (De Jaegher et al., 2016; Gallagher, 2020). Building on Merleau-Ponty (1945) and Wittgenstein (1953), it is argued that mental representations are often unnecessary; the anger we see in the face of someone shouting at us is immediate and embodied, not based on looking up a mental representation (Fuchs & De Jaegher, 2009). Even an utterance such as ‘I think’ can be treated not as a marker of subjectivity but as a dynamic stance marker that foregrounds contestability (Kärkkäinen, 2006).

Enactive intersubjectivity is embodied, situated, and often non-linguistic (Fuchs & De Jaegher, 2009; Gallagher, 2023). In two-party coordination, each side is adjusting to the other side, creating a dynamically evolving system. This is evident in protoconversations (primary intersubjectivity; Trevarthen, 1998), where infants and care-givers exchange smiles, vocalizations, and gestures. What is being coordinated are not two representational systems, but instead action, voice, touch, gesture, and gaze. Examples of enactive intersubjectivity include the spontaneous reciprocation of a smile, or how boxers have a conversation of gestures through dodging, feinting, and repositioning.

There is strong empirical evidence for basic enactive intersubjectivity, but the approach is often criticized for failing to scale up to more complex intersubjective phenomena. Gallagher (2023), however, argues that narratives, diagrams, and conversation externalize complex (apparently private) representations, enabling practical coordination, training, and socialization into a world of perspectives. His argument is that, whatever is taken as ‘complex’ or ‘private’ intersubjectivity can be decomposed into observable and practical tasks.

The contribution of enactive intersubjectivity is to conceptualize non-representational coordination. It distributes intersubjectivity into body-environment relations, and thus naturalizes it, linking it into non-human forms of coordination. The limitation, however, is that instead of explaining psychological intersubjectivity in terms of social interaction, this approach risks removing the psychological element. In this sense it is similar to conversation analysis (De Jaegher et al., 2016), which also eschews mental representations.

Conversational Repairs

Conversation analysis provides the empirical counterpart to abstract theories of intersubjectivity (Schegloff, 1992). Instead of intersubjectivity being a philosophical problem of knowing other minds, it is studied as a mundane conversational task. It is studied in interactionally endogenous terms, as it arises for participants themselves, without appeal to anything beyond what is empirically evident in the interaction (Schegloff, 2007).

Intersubjectivity is particularly evident in conversation repairs (e.g., rephrasing, clarifications), because it is in the breakdown of intersubjectivity that the mechanisms of perspective coordination become evident. The main components of a repair are who initiates, who repairs, and in which turn the repair occurs. These reveal three main repair types.

The prototypical repair is self-initiated in the third-turn following a revealed misunderstanding in the second turn. Schegloff (1992, p. 1303) gives the example of Annie, a press officer in a civil defense headquarters, who asks Zebrach, the chief engineer, which roads are closed (Excerpt 1). The first turn is ambiguous (“Which one::s are closed”), as revealed by the second turn where Zebrach talks about the closed shelters. Annie initiates (“I’on’t mean the shelters”) and repairs (“I mean on the roads”) in the third turn. In third-turn self-initiated repairs, the second turn is a sequentially appropriate response to a misunderstanding in the first turn, revealing the trouble source to be repaired.

Excerpt 1.

Third-Turn Self-Initiated Repair (from Schegloff, 1992, p. 1303)

Turn	Speaker	Utterance	Conceptualization
1	Annie	Which one::s are closed, an’ which ones are open.	Trouble source
2	Zebrach	Most of ’em. This, this, this, this (pointing to shelters)	Demonstrates misunderstanding
3	Annie	I ’on’t mean on the shelters, I mean on the roads.	Repair

Second-turn other-initiated repairs have requests to repair in the second turn, that index an ambiguity in the first turn. These usually lead to a repair in the third turn. This is illustrated in Excerpt 2, where Marcia is explaining to Tony why her son is late (because his car was stolen). Marcia says that somebody “helped th’mselfs” to his car. Tony initiates a repair by seeking clarification (“Stolen”), which Marcia confirms in the third turn.

Excerpt 2.

Second-Turn Other-Initiated Repair (from Schegloff, 1992, p. 1302)

Turn	Speaker	Utterance	Conceptualization
1	Marcia	Becuz the to:p was ripped off’v iz car which iz tihsay someb’dy helped th’mselfs	Trouble source
2	Tony	Stolen.	Repair initiation
3	Marcia	Stolen. Right out in front of my house.	Repair

Finally, self-repairs can also occur in the first turn, when people repair themselves while speaking. This is illustrated in the first turn of Excerpt 2. Marcia is trying to say that the soft-top on the car was ripped off as part of a theft. However, she is unsatisfied with her first attempt, initiating a self-repair (“which iz tihsay”), and clarifying that “someb’dy helped th’mselfs” to the car.

These analyses of intersubjectivity as a practical achievement are insightful. However, like the enactive approach, these analyses resist integration with psychological approaches because the methodology explicitly excludes mental states, considering only what is endogenously observable in the transcript (de Ruiter & Albert, 2017). Yet integration would be beneficial (Albert & de Ruiter, 2018), and I will argue that first-turn self-repairs can bridge interactional and psychological intersubjectivity.

Initiation-Response-Feedback Sequences

Classroom learning is a domain in which psychology cannot be bracketed aside. Learning is, by definition, a situation-transcending psychological phenomenon that is exogenous to the interaction. Research on how learning is achieved through conversations has identified a recurring three-turn sequence: initiation-response-feedback (Sinclair & Coulthard, 1975; Waring, 2009).

Initiation-response-feedback sequences consist of an initiation by the teacher (e.g., a question), followed by a response from the student, and then the teacher’s feedback on the student’s response. In Excerpt 3, the teacher asks the student why people eat food (from Sinclair & Coulthard, 1975, p. 21). The student answers (“to keep you strong”), to which the teacher responds “yes”. Without this third turn, the student would remain uncertain whether their answer was correct.

Excerpt 3.

Learning in Three-Turn Sequences

Turn	Speaker	Utterance	Conceptualization
1	Teacher	Can you tell me why do you eat all that food? Yes	Initiation
2	Student	To keep you strong	Response
3	Teacher	To keep you strong. Yes.	Feedback

Broadening this insight, Linell and Marková (1993; Marková & Linell, 1996), argue that the third turn does not merely enable learning; it consolidates mutual understanding. In Excerpt 3, the teacher’s feedback (turn 3) does more than mark the prior turn as correct; it makes the student aware that the teacher knows the student answered correctly. Within this emerging mutual understanding the student might, for example, feel pride. However, such pride is unintelligible within a first turn (unless the student is anticipating a congratulatory third turn). The initiation-response-feedback model thus reveals how intersubjective dynamics ‘ratchet-up’, with subsequent turns enabling novel dynamics (e.g., pride or guilt, or trying to make someone feel pride or guilt).

This broader role for third turns, to consolidate mutual understanding, is widespread. Bavelas and Colleagues’ (2017) analysis of students’ getting-acquainted conversations found that over half of all turns were part of three-turn calibration sequences. New information was demonstrated in the second turn and consolidated by follow-ups in the third turn. Such rapid, efficient, and often non-verbal third-turn follow-ups are routine sequences, not just for repairing or learning, but, for consolidating mutual understanding. Without these interactions, the levels of intersubjectivity identified in the structural quadrants would be unstable, unevidenced, and uncorrectable.

Takeaway: Three-Turn Interaction Sequences

Table 3 compares the interpersonal-interactional approaches to intersubjectivity. A key feature is the sequencing, or turns, of interaction. Enactive intersubjectivity does not distinguish turn types; instead, reciprocal action adjustments form a dynamic system in which each turn is equivalent; it gives no special status to third turns. Conversational repairs give special status to third turns for revealing and repairing problems in intersubjectivity, although these are deemed atypical. In contrast, third turns are essential in initiation-response-feedback sequences for learning, and more generally for mutual understanding.

Table 3.

Interpersonal-Interactional Approaches to Intersubjectivity

Turn	Enactive intersubjectivity	Conversational repairs	Initiation-response-feedback
1	Body-environment action	Trouble source	Initiation
2	Adjustive response	Repair-initiation or demonstration	Response
3	Adjustive response	Repair enables the conversation to continue	Feedback consolidates mutual understanding

Interpersonal-interactional approaches reveal the moment-to-moment embodied coordination of perspectives as an ongoing achievement. However, these approaches are also epistemologically fractured. Enactive intersubjectivity and conversation analysis are anti-representational, eschewing mental representations. While this narrowed focus powerfully highlights intersubjectivity as a practice, it provides an incomplete account of the psychological aspects of intersubjectivity (e.g., intra-psychological experiences of imagining other perspectives). While many view psychological concepts as antithetical to this approach, the next section will reveal striking similarities. Specifically, three-turn interaction sequences are frequently observed within a single turn. I will use this observation to argue that turn-taking scaffolds perspective-taking, and therefore first-turn self-repairs are observable instances of recursive perspective-taking.

Psychological-Interactional Approaches

This quadrant stems not from Descartes’ ideas, but from his method. Arguably, Descartes was the first phenomenologist, inspiring Husserl (1931). He analyzed the workings of his own mind from the inside. And although he focused on timeless truths, his actual method of meditation entailed a dynamic interplay of perspectives (Gillespie, 2006a). Continuing in this vein, psychological-interactional approaches examine the flow of phenomenological experience as punctuated with intersubjective elements. The focus is on the semiotics of how perspectives (variously termed I-positions, signs, or voices) interact within psychological experience. The key idea that I will use for my argument is that the stream of experience comprises inner dialogues of three-part semiotic sequences that mirror the three-part sequences identified by the interpersonal-interactional approaches.

The ‘I’ and the ‘me’

James (1890) conceptualized the stream of thought as the psychological flow of experience (e.g., images, impulses, thoughts) that passes before the mind’s eye during introspection (Levine, 2018). The ‘I’ is the subject of this stream of thought (i.e., the one who senses, thinks, and acts). The ‘me’ refers to moments of self-awareness within the stream of thought, when the self has become the object of experience (Woźniak, 2018). When the ‘I’ turns upon itself to observe itself, it only finds an inert ‘me’ – a hollow memory of a prior initiation. James (1890) vividly described the self as duplex, flipping over on itself, unable to catch its own experiencing, but, he did not explain its origin.

Mead (1913, 1934) proposed that the ‘me’ originates in social interaction. He argued that the ‘I’, the subject of action, is common to all organisms, but the ‘me’ is peculiar to humans and arises through perspective-taking. Just like we see the actions of others from the outside (as a ‘she’ or ‘them’), when we take the perspective of others toward ourselves, we see ourselves from the outside (as a ‘me’). Through socialization, Mead proposed, humans develop a situation-transcending structure of intersubjectivity (which he called the “generalized other”; Mead, 1934, p. 90), which is an internalization of the perspectives of significant others (e.g., family, friends, community). Crucially, the ‘I’ and the ‘me’ are not structures within the self; they are phases of a semiotic process turning over upon itself. The shift of perspective that turns the ‘I’ at time one into the ‘me’ at time two is, Mead (1934) argued, derivative of the shift of perspective occurring in social interaction between self and other (Gillespie, 2005).

Dialogical Self

The theory of the dialogical self develops the ideas of James and Mead, with inspiration from Bakhtin (1986). It conceptualizes the self as a landscape of I-positions within which phenomenological experience moves (Hermans, 2002). I-positions, like James’s ‘I’, are places from which the self thinks and speaks. These can either originate in the acting subject (i.e., impulses and attitudes of the self) or in the perspectives of others (i.e., the voices of significant others in the social world). This landscape of potentially discordant I-positions forms a “society of mind” (Hermans, 2002, p. 147).

Dialogical self research does not examine whether there is perspective-taking, or whether it is accurate; instead it examines how I-positions interact, for example, creating inner conflicts. This approach has been particularly useful in clinical contexts (Neimeyer, 2006; Stiles et al., 2004). The therapeutic encounter, it is argued, should aim to create new or strengthened I-positions that enable the client to reflect upon conflicting I-positions, potentially resolving inner conflicts (Hermans & Dimaggio, 2004).

Commonly, conflict between I-positions leads, over time or through therapy, to a third, meta I-position that reconciles the conflict (Kay et al., 2024). For example, Branco and colleagues (2008) analyzed an interview transcript with Rosane, a Catholic woman in Brazil. They identified two I-positions in conflict: Rosane was a Catholic daughter (first I-position) and a lesbian (second I-position). This tension was resolved by the emergence of a third I-position, where Rosane became committed to being a Catholic missionary working within the lesbian community.

Valsiner (2005) has theorized the three-part emergence of meanings within the dialogical self. For example, in response to conflicting meanings, such as being hungry (sign 1) and finding dirty bread (sign 2), third signs, or meta-positions, emerge to regulate the conflict. Thus sign 3 could be insisting that even dirty bread is still bread, arguing that it is not ‘so’ dirty, cleansing the bread, or conceiving of both dirt and bread as part of nature (Josephs & Valsiner, 1998, p. 6). These third signs in the semiotic sequence simultaneously address the prior tension and feed forward to promote and constrain the next sign in the stream of thought (Valsiner, 2018). A key insight about these semiotic sequences is that without a third sign, there would only be semantic tensions without any resolutions or circumventions.

Multivoicedness

Multivoicedness is related to the Dialogical Self approach but broader, as this tradition is not bound to a single theory or focused exclusively on the self. It studies talk and texts in terms of moment-to-moment perspective shifts. The idea is that a detailed analysis of people’s talk can reveal multiple, often colliding points of view that interact in real-time as they talk. This approach combines the insights of James and Mead with those of Bakhtin (1986), Rommetveit (1974), Linell (2009), and Marková (2016). It takes a micro-textual approach to studying voices within the mind as they manifest in people’s observable utterances.

Aveling and Colleagues (2015) developed a systematic approach to analyzing multivoicedness in three stages designed to reveal three-part semiotic sequences. These sequences usually take the form: I-think, they-think (which causes a tension), and then a response to the tension (e.g., to suppress it, resist it, or overcome it). Consider Excerpt 4, from a focus group discussion with second-generation Turkish youths living in London (Aveling & Gillespie, 2008, p. 215). Ahmet says to his Turkish friends: “This [London] is my home”. This seems to cause a tension. Ahmet orients to the idea that his friends think he has given up on his Turkish identity (“sorry” and “don’t get me wrong”). These two voices (what Ahmet thinks and what he thinks his friends think) are in tension. The resolution is in the third part of the sequence in which Ahmet capitulates to his friends (“I’m still Turkish”, “Turkish and proud”). Ahmet’s utterance is simultaneously embedded within the group discussion and reveals his own moment-to-moment dialogical stream of thought as he adjusts to his audience.

Excerpt 4.

Multivoicedness in a focus group

Turn	Speaker	Utterance	Conceptualization
1	Ahmet	This [London] is my home. Sorry boys, but it is [laughter] I mean don’t get me wrong, I’m still Turkish. D’you know what I mean? Turkish and proud	(1) Initial idea,(2) Disruptive idea,(3) Reconciliation

The tension between the first and second voices creates semantic contact between the speaker’s initial voice and a disruptive voice (e.g., a belief attributed to the outgroup; Gillespie, 2020). When the second voice is disruptive, the third typically attempts to quell or shut it down. Tactics include avoidance (e.g., “it is totally separate”), de-legitimizing (e.g., “but they are delusional”), and limiting (e.g., “but it’s not that bad”). The key point is that without the third part of the semiotic sequence, there could be no response to psychological tensions. Accordingly, again, we see the same three-part structure, mirroring not only what has been found in the Dialogical Self, but, also what has been observed in the interpersonal-interactional approaches.

Takeaway: Three-Part Semiotic Sequences

Table 4 compares the psychological-interactional approaches to intersubjectivity. Initially, these were sequences of two signs (‘I’ and ‘me’), but more recently the focus has been on three-part semiotic sequences. These approaches concur that the first part of the sequence is an initial impulse or expression. The second part can be a disruptive sign, originating within the self (dialogical self) or with others (multivoicedness). Repeatedly it is found that the third part of the sequence is an attempt to deal with this tension or conflict (e.g., capitulating, circumventing, resisting).

Table 4.

Psychological-Interactional Approaches to Intersubjectivity

Voice	Stream of thought	Dialogical self	Multivoicedness
1	‘I’: Initiation	I-position	Voice of self
2	‘me’: Response to ‘I’	Conflicting I-position	Tension with voice of other
3		Reconciling meta-position	Response to the tension & semantic barriers

While these approaches assume voices in the social world connect to voices in the stream of thought, the mechanism is usually stated vaguely as ‘internalization’ rather than specified in interactional terms. How do the voices of other people become voices in the mind? What is the relation between the inner and outer voices? What types of social relations scaffold this internalization? To address these questions requires specifying the active ingredient in social interaction. My proposal is that this ingredient is turn-taking. Because turns are publicly observable and because speakers also hear themselves as audiences, turn exchange provides a mechanism by which three-part interactional sequences can become three-part semiotic sequences.

Integrative Approaches

Although much of the literature falls into the four quadrants of Figure 1, there is also a substantial literature on how the subjective side of intersubjectivity is embedded in observable social relations (De Jaegher et al., 2010; Gillespie & Martin, 2014; Lawrence & Valsiner, 1993; Marková, 2016; Schilbach et al., 2013; Valsiner & Van de Veer, 2000; Zittoun et al., 2007). Many of these approaches stem from Mead (1913) and Vygotsky (1997), who both argued that cognitive functions begin between people, in social interaction, and only subsequently appear in the mind. Since their broad insights, researchers have sought to identify the specific social interaction patterns that might scaffold intersubjectivity. I review their insights before proposing turn-taking as a key ingredient.

Primary, Secondary, and Tertiary Intersubjectivity

The distinction between primary, secondary, and tertiary intersubjectivity (Bråten, 2009; Trevarthen & Aitken, 2001) conceptualizes young children’s social understanding within its interactional and cultural context. Each of these three forms of intersubjectivity is embedded in a different type of interaction: dyadic interaction, coordination around objects, and culture.

Primary intersubjectivity is dyadic attunement (Trevarthen, 1998). It arises through parent-infant face-to-face interaction or protoconversations, where nothing is communicated except connectedness itself. At this level, there is no shared object independent of the dyadic relation and thus there is no reading the mind or intention of others (i.e., no explicit theory-theory of mind). Primary intersubjectivity is built on the mirror neuron system and is comparable to the simulation-theory of mind approach (Ferrari & Gallese, 2007).

Secondary intersubjectivity is shared attention. It originates in triadic relations, with self and other coordinating around a shared object (e.g., child and adult playing with a ball), where each learns to read the intention of the other (e.g., expecting the ball). At this level, the mind of the other is explicitly represented in relation to the shared object and is thus comparable to a theory-theory of mind (Bråten, 2009).

Tertiary intersubjectivity incorporates culture (narratives, folk beliefs) that exists beyond the immediate triadic relation (Bråten, 2009). Narratives scaffold perspectives, providing a web of interacting perspectives that the audience gets drawn into and can experience vicariously. Additionally, language, it is argued, facilitates more complex intersubjectivity, enabling people to ask about people’s intentions, feelings, and perspectives.

Shared Intentionality

Tomasello (2019) has also proposed three different types of intersubjectivity, each embedded in a distinct social relation. These are broadly complementary with primary, secondary, and tertiary intersubjectivity, but put more emphasis on social coordination.

The first stage is emotion-sharing, such as protoconversations between parents and their infants. Although similar to primary intersubjectivity, Tomasello (1999) argues that this stage does not contain enough mutual understanding to be called properly intersubjective. There is dynamic attunement, and embodied co-presence, but no social coordination vis-à-vis an object.

The second stage is joint intentionality with recursive understanding within triadic relations; S and O both know that they are attending to X. Chimpanzees, Tomasello (2019) argues, can follow gaze direction and thus attend to the same object, but they do not know that they are attending to the same object. Mutual understanding that both parties are attending to the same object entails a theory-theory of mind.

The third stage is cultural collective intentionality that integrates multiple perspectives within a partially shared social world. At this level, S and O are aware of what is common and what is not; they each know that the other inhabits a recursive multi-perspectival and partially shared social world (Tomasello, 2020).

Tomasello’s (2019) contribution is to ground types of intersubjectivity in specific patterns of interpersonal interaction. He points to “dialogic interactions” (Tomasello, 2019, p. 188), the back-and-forth exchanges within triadic (S-O-X) relationships, through which attempts, failures, and requests for repair incrementally build triadic mutual understanding. This intersubjectivity is then further scaffolded, at the third level, by language, culture, and institutions.

This basic idea is also found among other scholars. Tuomela (2005) argues that to have a ‘we-intention’ implies mutual belief, not only that we know the others’ perspective, but also that they know ours, thus presupposing meta-meta-perspectives. But this structure must be actively coordinated in interaction. Individual intent is bound into shared intentionality, Gilbert (2015) argues, through reciprocal expressions of willingness that create normative commitments neither party can unilaterally rescind. So, again, the focus is on back-and-forth exchanges between S and O in relation to some X. But what is the precise interactional mechanism that bridges from these exchanges into psychological intersubjectivity? I will argue that it is turn-taking.

Position Exchange Theory

Position exchange theory (Gillespie & Martin, 2014; Martin & Gillespie, 2010) builds on Trevarthen’s and Tomasello’s models to propose a precise mechanism through which perspective-taking can arise in triadic social interaction. The idea is that we are socialized into positions that support perspectives; when we exchange positions, we effectively learn to exchange perspectives.

Initially the child interacts with objects and with other people, gradually differentiating themselves from the world, and learning action patterns vis-à-vis objects and people in the world. The child is engaging in social action, but there is no reflexive awareness of it. At this stage children can play a role, but it is not socially coordinated (i.e., they can’t regulate their play from the standpoint of others).

Reflexive intersubjectivity emerges within triadic interactions, when S and O have roles with respect to a task oriented to X. When S and O exchange roles within the task, they not only learn about the perspective of the other, but, they also learn to integrate these perspectives. Consider the game of hide-and-seek (Gillespie, 2006b): the hider regulates their hiding actions from the standpoint of the seeker – because they themselves have been the seeker. Equally, the seeker might find a good place to hide while seeking. Exchanging positions is widespread (e.g., giving-getting, talking-listening, fleeing-chasing, buying-selling), and the idea is that this both cultivates and integrates complementary perspectives, enabling perspective-taking.

Position exchange theory also specifies how narratives (stories, films, reverie) can scaffold intersubjectivity. Narratives invariably contain multiple perspectives (i.e., different characters), and, story-telling guides us through these (e.g., alternating between the perspectives of key protagonists). Narratives tend to introduce us first to one perspective, and then to a reciprocating perspective (e.g., little red riding hood and the wolf, the wolf and the three pigs, or goldilocks and the three bears). Within narratives the audience experiences position exchange, as they are guided through both sides of a social interaction, thus scaffolding their intersubjectivity.

Takeaway: Triadic Interaction

Table 5 shows strong convergence between the integrative approaches. They all propose three basic types of intersubjectivity (embodied, coordinated intentions, multi-perspective) scaffolded by three basic types of social interaction (dyadic, triadic, cultural). The differences are mainly emphasis: innate emotional attunement (Trevarthen, 2015), social coordination (Tomasello, 2019), and exchanging social positions within institutionalized interactions (Gillespie & Martin, 2014). However, although there is broad agreement on the key types of interaction, the exact interactional mechanisms remain under-specified. What is the ‘active ingredient’?

Table 5.

Integrative Approaches to Intersubjectivity

Social interaction	Stages of intersubjectivity	Shared intentionality	Position exchange theory
Dyadic (S-O)	Primary intersubjectivity. Innate imitation, non-verbal synchrony, protoconversations, rhythms, affective attunement.	Emotion-sharing. Face-to-face routines, mirroring.	Self-other differentiation. Repetitive alternating routines
Triadic (S-O-X)	Secondary intersubjectivity. Shared focus on X, gaze following, pointing.	Joint intentionality. Shared goals, task coordination, roles within tasks, coordination of intentions,	Perspective-taking. Exchanging positions within games and social routines.
Cultural (S-O-X-C)	Tertiary intersubjectivity. Culture, symbols, narratives, shared fictional worlds.	Collective intentionality. Language, norms, rules, ethics, institutionalized roles.	Complex perspective-integration. Moving positions within narratives.

S = Self; O = Other; X = any object; C = culture.

I build on these approaches, and especially position exchange theory, to argue turn-taking may be the active ingredient. It is present in all forms of social interaction. Turn-taking spans from non-verbal emotional adjustments to complex conversations and negotiations. As narratives progress, they give ‘turns’ to key characters. This focus on turn-taking develops the position exchange hypothesis. Turn-taking is the most fundamental and pervasive exchange of positions: initiators becoming responders, speakers becoming listeners.

Three-Level Intersubjectivity

What is the minimum number of levels a model of intersubjectivity needs to consider? Comparing the psychological-structural (Table 1) and the interpersonal-structural (Table 2) approaches reveals agreement on recursive levels of intersubjectivity, but disagreement about the number of levels. The interpersonal perception literature conceptualizes three levels, but often only studies two levels. Meanwhile the orders of intentionality literature has studied four (Kinderman et al., 1998), five, and even six (Dennett, 1989; Dunbar et al., 2016) levels. These inconsistencies, I suggest, stem from three methodological issues.

The first source is how implicit and explicit perspective-taking is treated. Kinderman et al. (1998) focused on explicit perspective-taking (i.e., talk about mental states). By contrast, Dunbar and colleagues (2016) counted implied perspectives in jokes. They maintained that before a joke is told, there are already three orders of intentionality (the speaker intends for the listener to understand that the speaker intends to tell a joke). Any mental states within the joke are then added to these “minimum obligatory three mindstates” (Dunbar et al., 2016, p. 133). If these assumed mindstates are removed, the number of explicit orders of intentionality in jokes drops to four, comparable to Kinderman et al.’s (1998) findings. This resolvable confusion reveals that we need to better conceptualize the implicit and explicit aspects of intersubjectivity.

A second source of discrepancy is the conflation of intersubjective depth with intersubjective breadth. Depth refers to the number of recursive perspective-taking levels within a dyadic exchange (e.g., S’s belief about O’s belief about S’s belief). Breadth refers to the number of agents whose perspectives are tracked simultaneously (e.g., tracking A’s, B’s, and C’s separate beliefs). Research on interpersonal perception measures depth within dyads, capping out at meta-meta-perspectives (Assa-Eley & Kimberlin, 2005; Heasman & Gillespie, 2017; Koerner & Schrodt, 2014). Research on orders of intentionality has tended to count the mindstates of all agents, combining intersubjective depth and breadth (Dunbar et al., 2016). A six-order story with three agents may have only three levels of depth. Distinguishing intersubjective depth and breadth explains why intentionality research reports higher orders of intentionality.

Finally, why have interpersonal-structural approaches neglected the third level of intersubjectivity despite conceptualizing it as critical for resolving misunderstandings? I suggest it is because they use questionnaires. Questionnaire items about meta-meta-perspectives are confusing to answer and produce messy results. The issue is that asking an explicit question about any level of intersubjectivity entails responding at one level higher (Gillespie & Cornish, 2010). In contrast, orders of intentionality have been observed in jokes (Dunbar et al., 2016) and in recalled story elements (Kinderman et al., 1998). Observational methods have an advantage because they don’t require participants to self-report on their perspective-taking; it is done by the researchers.

A self-report question at level N requires the respondent to report on level N from level N+1. The question ‘Do you believe X?’ requires awareness of one’s believing (level 2 operation on level 1 content). The question ‘Do you think she believes X?’ requires awareness of one’s meta-belief (level 3 operation on level 2 content). The question ‘Do you think she believes that you believe X?’ requires level 4 operation on level 3 content, which pushes respondents toward their intersubjective ceiling. Self-report methods are therefore unwittingly capped at studying one less intersubjective level than the method appears to target, explaining the systematic neglect of meta-meta-perspectives despite their theoretical importance.

Disentangling these three methodological issues reveals a deep convergence underlying superficial differences. Using different methods and theoretical traditions, these approaches converge on explicitly indexed (not implicit or self-reported) recursive (intersubjective depth, not breadth) perspective-taking having three levels. Why do social relations stabilize on three levels of perspective-taking? How are these anchored in social interaction?

Three-Turn Intersubjectivity

What is the minimum length of interaction sequences that a model of intersubjectivity needs to consider? Comparing the interpersonal-interactional (Table 3) and the psychological-interactional (Table 4) approaches reveals the importance of three-part interaction sequences. Although much literature on turn-taking and conversation analysis has focused on two-turn sequences, three-turn sequences (in interaction) and three-part semiotic sequences (in psychology) are required for complex intersubjective phenomena.

Turn-taking occurs in many non-human species (Mondémé, 2022), and in all human cultures (Nguyen et al., 2022). It is evident in non-verbal interactions, games, queues, and public debates. Turn-taking can be managed by umpires, traffic lights, and moderators. However, the primordial form is informal dialogue, when participants coordinate turns of talk (Sacks et al., 1974). Sometimes speakers select the next speaker (e.g., by asking a question), but more often turn transitions are managed implicitly via cues such as transition-relevance points, gaze, and gesture (Skantze, 2021). Successful turn-taking entails remaining on topic, responding to the prior turn, and not speaking over anyone else. Yet, despite this complexity, speakership is usually transferred smoothly within milliseconds (Templeton et al., 2022).

Turn-taking research has neglected third turns. Usually, it is studied as the transition between turn 1 and 2; any third turn is just another transition to be analyzed in relation to the prior turn. Similarly, enactivist, behaviorist, and cognitive approaches tend to focus on the first two turns (i.e., responses to actions). Conversation analysis, despite studying repairs in the third turn, takes two-turn adjacency pairs as the basic unit of analysis (e.g., question-answer, greeting-greeting, request-response; Schegloff, 2007). It conceptualizes third turn repairs as deviations.

In contrast, I argue that third turns are widespread, frequent, and fundamental to intersubjectivity. Third turn repairs pervade face-to-face and online dialogue (Dingemanse et al., 2015; Goddard & Gillespie, 2025). Moreover, three-turn calibration sequences, where the third turn demonstrates, elaborates, and consolidates understanding occurs many times a minute (Bavelas et al., 2017). Without these consolidations of mutual understanding, resolving misunderstandings, and even conducting strategic brinkmanship would be impossible (Laing et al., 1966; Schelling, 1966).

The third turn unlocks distinctive intersubjective phenomena. The first and second turn enable emotional resonance, empathy, basic simulation, perspective-taking and second order intentionality. But the third turn enables repairs, consolidation of learning, and feelings of being understood (or misunderstood). Most importantly, the third turn consolidates mutual understanding. Without a third turn, O would have beliefs about S’s perspective, but could not discover or comprehend a misunderstanding. No matter how inaccurate or fantastical O’s understanding of S’s perspective, S would be unable to correct it and, thus, complex social coordination would collapse. Without three-turn interaction we would be unable to mutually agree on the rules through which we coordinate, and thus institutional life, as we know it, would be impossible.

Comparably, three-turn semiotic sequences unlock distinctive psychological phenomena. If thoughts only responded to the prior thought there could be cognitive tension, but no resolution. Research on the dialogical self and multivoicedness shows how one voice prompts a second, creating a tension, that is resolved with a third (Table 4). Without the third voice there is only the tension between the first two voices. The tension between ‘I am hungry’ and ‘the bread is dirty’ is resolved by a 3^rd sign (e.g., ‘I’ll eat something else’). But this third sign only has meaning due to its position as a third sign; it can’t resolve the tension if acting as the first or second sign (Josephs & Valsiner, 1998). Thus it is the third sign in the sequence that enables resolving psychological tensions.

More speculatively, three-part sign sequences may be central to logical inference. Peirce (1955), who provided one of the most thorough analyses of semiotics and logic, emphasised ‘thirdness’. In his studies of logic and signs he argued for three-part sequences, with the third part being crucial for mediation, meaning, and inference. Consider the classic syllogism: All men are mortal, Socrates is a man, Socrates is mortal. There is no logical inference if the third sign is missing. The first sign states a rule. The second sign, in this case, is an observation. Inference arises with the third sign, which integrates the first (rule) and second (case) to yield a conclusion (e.g., therefore Socrates will die). Thus, even in formal streams of thought (i.e., logical inference), we can detect three-part semiotic sequences.

The minimum length of interaction sequences for intersubjectivity, I argue, is three. Three-turn interpersonal interaction is necessary for mutual understanding, realizing misunderstandings, and resolving misunderstandings; three-part semiotic sequences are necessary for resolving psychological tensions and making logical inferences. Is it just coincidence that the minimum number of levels and turns is the same? Are the three levels derivative of the three turns?

Dialogical Intersubjectivity: From Turn-Taking to Perspective-Taking

The self-other-object (S-O-X) triadic relation is a core unit of analysis across cultural, developmental, and social psychology (Zittoun et al., 2007). However, these triadic models rarely specify the type of relationships within the triad. What do the relations (lines in the triangle) denote? Is mere action sufficient? What occurs within triadic relations that might produce three-level intersubjectivity?

Triadic relations, I argue, only become genuinely triadic (i.e., more than the sum of the parts) when they are functionally unified through three-turn interaction sequences that build three-level intersubjectivity. Three-turn interaction sequences not only involve all components (S, O, X), they also entail a looping back (the third turn responds to the second turn, which responded to the first turn). This cumulatively and recursively builds the levels of intersubjectivity.

Figure 2 illustrates how three-turn interaction within triadic relations can scaffold three levels of intersubjectivity. Starting with the undifferentiated triangle (A), S has an action orientation toward X (B). O then reacts to S’s first turn (C). And finally, S reacts to O’s reaction to S (D). Because each turn responds to the prior turn, and to some extent incorporates it, each turn introduces another level of intersubjectivity: direct perspective (B), meta-perspective (C), and meta-meta-perspective (D). In the figure, only S in the first turn (B) is responding to X; the subsequent turns are responding to the previous responses (not X). The ratcheting of intersubjective levels occurs because the perspective of the prior turn becomes the object of the next turn.

Figure 2.

Three-turn interaction producing three-level intersubjectivity in a triadic relation

The key idea is that each turn of communication has the potential to introduce a higher level of intersubjectivity. Ichheiser’s (1949) expression-impression distinction helps conceptualize this mechanism. Each communicative expression, or turn of conversation, creates an impression on the audience that is one order of intentionality higher. Whatever S expresses, the impression created for O concerns the perspective of S. Thus, what begins as a direct perspective for S arrives as a meta-perspective for O.

This ratcheting of intersubjective levels via turn-taking is evident in Excerpt 5 (Linell, 2009, pp. 195–196). In turn 1, the judge asks the defendant about their plea. In turn 2, the defendant admits guilt, creating an impression for the judge that is one order of intentionality higher (the judge believes that the defendant admits guilt). The judge expresses this understanding as a question (“Admits?”), which in turn creates an impression one order of intentionality higher for the defendant (the defendant believes that the judge believes that the defendant admits guilt; DpJpDp(admit)). When the defendant confirms this mutual understanding, then the judge knows that the defendant knows that the judge knows that the defendant is pleading guilty (JpDpJpDp(admit)). Without this ratcheting there could be a misunderstanding (e.g., the judge mishearing turn 2 as “Yes, I deny it”).

Excerpt 5.

Ratcheting Levels of Intersubjectivity

Turn	Speaker	Utterance	Expression	Impression
1	Judge	Okay, does John Sigurdsson admit or deny all these deeds?
2	Defendant	Yes, I admit it	Dp(admit)	JpDp(admit)
3	Judge	Admits?	JpDp(admit)	DpJpDp(admit)
4	Defendant	Yes	DpJpDp(admit)	JpDpJpDp(admit)

J = judge; D = defendant; p = perspective; (admit) = the object around which the perspective-taking occurs.

The S-O-X triangle (like the three levels and three turns) is a simplification. It is a distilled set of ingredients, to aid focused theorizing. In reality, S-O-X triadic relations always occur in a context (see Table 5; Zittoun et al., 2007). This context does more than merely ‘surround’ interaction, it can structure who speaks, who interrupts, whether third turns are licensed, and how feedback is received. For example, misunderstandings often arise because social norms (politeness, communicative routines) or power structures (status, role-authority) inhibit third-turn consolidations. In the case of position exchange, the context is even more important, because it creates the entire experiences of S and O that are re-shaped each time they exchange social positions, or roles.

The key point is that each turn can take the prior turn, not X, as its object, thereby foregrounding the perspective in the prior turn (not the object of that perspective).

Each turn not only responds to the prior turn, but subsumes it, bringing another perspective on the prior perspective. The perspective of the prior turn becomes the object of the next turn, namely, a meta-perspective – which itself can become the object of the next turn (i.e., a meta-meta-perspective).

Making the Implicit Explicit

Ratcheting intersubjectivity through turn-taking explains how intersubjectivity moves from being implicit to explicit. Intersubjectivity begins with implicit embodied attunement (Trevarthen, 2015), but it culminates in highly nuanced narratives that dramatize the interplay of multiple perspectives (e.g., Shakespeare; Bakhtin, 1986) and highly reflexive and strategic perspective-taking (e.g., brinkmanship; Schelling, 1966). How does implicit intersubjectivity develop into explicit intersubjectivity?

Impressions always deviate from expressions: what is said diverges from what is heard. Sometimes this deviation is an error (e.g., O misunderstands S), but, sometimes it is a valid insight (e.g., O recognizes S’s implicit assumption). In these latter cases, the expression creates ‘surplus’ meaning (Gillespie, 2003), namely, meanings that S did not intend to give (i.e., their implicit assumption).

Consider the following example. While eating a cake, S exclaims “Yum!”. O responds: “You really like cakes!” To which S replies: “You think I eat too much cake!” In this example, both the second and third turns interpret meanings in the prior turns that were not explicitly given. O ‘takes’ the meaning that S likes cake (S did not explicitly say it), and S ‘takes’ the meaning that O thinks S eats too much cake (which again O did not explicitly say). Of course, O and S may have been hinting at these meanings. But, for our purpose, it is enough to point out that each instance of turn-taking has the potential to make explicit meanings that were implicit in the prior turn.

When a subsequent turn takes the prior turn as its object it has the potential to draw out of it meanings that were implicit. The range of meanings and assumptions that can be called out in subsequent turns is infinite, and many will be invalid. Thus turn-taking can both ratchet intersubjective levels and externalize implicit meanings because each turn introduces externality on the prior turn. This occurs when the subsequent turns are not referring to the object of discourse itself, but to the prior turns as turns (i.e., the object of talk shifts from X to SpX or SpOpX).

Reinterpreting Internalization

The proposed model of dialogical intersubjectivity reconceptualizes internalization. Since Vygotsky’s (1997) argument that psychological function is derivative of social interaction, there have been many attempts to explain the mechanism of internalization. But, there have also been critiques, arguing that the concept of internalization is an unhelpful byproduct of Descartes’ mistaken dualism (Gallagher, 2020; Stetsenko, 2017). Crossing from observable interactions to the mental substance of psychological perspectives, it is argued, is philosophically impure. Setting aside grand philosophical debates, dialogical intersubjectivity narrows the focus to a tractable question: how do three-turn interactions become three-part semiotic sequences?

Self-initiated self-repairs in the first turn, I argue, are a bridge between the psychological and interpersonal aspects of intersubjectivity. Despite being rarely studied, these are the most common type of repair (Purver et al., 2018). Let us return to Marcia’s self-repair in the first turn of Excerpt 2. What is interesting with such self-repairs is that they have the same sequencing as three-turn repairs (Ginzburg et al., 2007). “Becuz the to:p was ripped off’v iz car” is the trouble source (akin to a first turn), “which iz tihsay” is the repair initiation (akin to a second turn that asks clarification), and “someb’dy helped th’mselfs” is the repair (akin to a third turn). Thus, refracted within this single turn is a three-turn sequence.

From a dialogical standpoint Marcia’s self-repair can be reinterpreted as self-dialogue. Marcia hears her own utterance in the same way that it is heard by Tony. Marcia is her own audience. This enables Marcia to turn-take within herself. Marcia is simultaneously the subject (‘I’) of her utterance and an observer of her utterance (‘me’). She hears her own utterance as ambiguous. This self-reflection leaves a trace (“Which iz tihsay”), that leads to a clarification (“someb’dy helped the’mselfs”).

Internalization, from this dialogical standpoint, does not require any philosophically dubious dualism. All it requires is that people can hear themselves speak (or think), and that they can respond to themselves, in the same way they might respond to anyone else. This is what Mead (1913) called the peculiar significance of the vocal gesture. People observe themselves and respond to themselves as they would to another. Responding to oneself talking is turn-taking within a single turn.

Turn-taking within a single turn can be analyzed either from the non-mentalistic standpoint of conversation analysis, or from the more psychological standpoint of multivoicedness. Focusing on what is empirical, there is no mystical dualism; the text being analyzed is the same. Hearing oneself speak and then responding to oneself as an other (either publicly or privately) is a basis for three-part sign sequences and thus ‘inner’ dialogues. What is important is that a response or perspective becomes an object that can itself be responded to.

This reinterpretation responds to the enactivist objection that internalization replicates Descartes’ problematic dualism (Gallagher, 2020; Stetsenko, 2017). By focusing on self-repair as the empirical phenomenon, this model does not require commitment to internal mental states as ontologically distinct. These phenomena can be analyzed either as public self-repairs or as self-talk. The difference is one of analytic stance, not ontology. Intersubjectivity can be approached either from the inside (e.g., phenomenology) or from the outside (e.g., observing others). Textual excerpts in which speakers respond to their own utterances can be studied either as inner-dialogue or outward-facing self-repair. These are choices about analytic frame, not ontology. Accordingly, the paradigmatic differences between the quadrants in Figure 1 reveal less about the ontological status of things in the world, and more about our approach to them. Looking beyond these debates about analytic stance, I argue, reveals a striking isomorphism between three-turn sequences and three-level perspective coordination that is observable from both analytic stances.

Conclusion

I have attempted to re-assemble the elephant of intersubjectivity, by systematically reviewing approaches mapped in Figure 1. Each quadrant captures a distinctive and valid aspect of intersubjectivity. I have shown how, despite each having a distinctive epistemological stance, which has led to incompatible assumptions and methods, there is underlying convergence. Beneath this diversity of approaches there is convergence on three levels, three turns, and triadic relations. Identifying this convergence on ‘threes’ is itself a contribution, revealing common ground across traditions. Moreover, I have argued that it is not a coincidence; this convergence points to an underlying integrated architecture for intersubjectivity, that I term dialogical intersubjectivity.

The proposal that perspective-taking arises out of turn-taking is a contribution to the literatures at the intersection of the quadrants, namely, theories that attempt to explain how the psychological and interactional aspects of intersubjectivity are related. The driving mechanism, I propose, is turn-taking, where each turn that takes the prior turn as its object, ratchets up a level of intersubjectivity. Turn-taking is also a bridging mechanism that spans Descartes’ dualism. It is observable both ‘between’ individuals (e.g., conversation analysis) and ‘within’ an individual utterance (e.g., multivoicedness and self-repairs).

Dialogical intersubjectivity expands the basic unit of analysis beyond isolated levels, turns or triangles (Zittoun et al., 2007). Although many social interactions can be achieved with two-turn adjacency pairs (question and answer, greetings, commands), these only work by assuming intersubjectivity. Once human reflexive intersubjectivity breaks down, the minimal unit of analysis for conceptualizing its reconstruction is three levels of perspective-taking (enabling feeling misunderstood), three-turn interactions (enabling repairs), and triadic relations (embedded in institutional and cultural contexts). This expanded minimal unit of analysis enables conceptualizing all parts of the elephant of intersubjectivity simultaneously.

Dialogical intersubjectivity can be studied face-to-face, online, or in video. The data can be single cases, big data, or experiments. The key is not in the data or the method, but the integrative consideration of levels, turns, and triadic relations simultaneously. This integrated model opens new questions. When a prior turn is taken as the object of a subsequent turn, does this observably lead to indexing higher levels of intersubjectivity? Does ratcheting perspectives through turn-taking make implicit assumptions explicit? Do three-turn interactions develop in children alongside three-level intersubjectivity? Do specific types of turn-taking relations (e.g., cooperative, conflictual, hierarchical) foster comparable three-part semiotic sequences? When situational constraints limit interaction to one-turn or two-turn interactions, does it reduce mutual understanding? Conversely, does facilitating three-turn interactions enable feeling understood? If turn-taking can influence meta-meta perspectives, can it be used as an intervention for intergroup conflict, distrust, and polarization?

In applied contexts, dialogical intersubjectivity provides insight on how communication technologies can be designed to foster robust intersubjectivity. For example, chatbots that rely on two-turn interaction may fail to create robust intersubjectivity (Corti & Gillespie, 2016). Similarly, organizational communication channels that permit only unidirectional announcements, or meeting formats that allocate speaking time without back-channel opportunities, may structurally impair intersubjective alignment. These predictions can be examined by comparing mutual understanding, social-emotional engagement, and misunderstandings across two-turn versus three-turn communication scenarios and interventions.

Intersubjective coordination is central to our private and collective lives. However, it does not occur accidentally. Understanding the interactional infrastructure that can scaffold mutual understanding is a crucial task. To date we have tended to build our communication and interactional infrastructure on one-turn (sender-receiver) and two-turn (adjacency pairs) models. I propose that we can only address the challenges of intersubjective coordination with an expanded minimal unit of analysis, namely, three-levels of representation constructed through three-turn interaction embedded in triadic social relations.

Footnotes

ORCID iD

Alex Gillespie

Funding

The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Swiss National Science Foundation, 51NF40-205605.

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Author Biography

Alex Gillespie is a Professor of Psychological and Behavioural Science at the London School of Economics, a Visiting Professor at the Oslo New University, and an Editor of Journal for the Theory of Social Behaviour. His research examines communication problems, especially speaking-up, defensiveness, misunderstandings, distrust, and problems of listening to and learning from challenging feedback. He recently co-authored the book ‘Pragmatism & Methodology’ published by Cambridge University Press, and freely available for download.

References

Albert

de Ruiter

J. P.

(2018). Repair: The interface between interaction and cognition. Topics in Cognitive Science, 10(2), 279–313. https://doi.org/10.1111/tops.12339

Assa-Eley

Kimberlin

C. L.

(2005). Using interpersonal perception to characterize pharmacists’ and patients’ perceptions of the benefits of pharmaceutical care. Health Communication, 17(1), 41–56. https://doi.org/10.1207/s15327027hc1701_3

Aveling

E.-L.

Gillespie

(2008). Negotiating multiplicity: Adaptive asymmetries within second-generation Turks “society of mind”. Journal of Constructivist Psychology, 21(3), 200–222. https://doi.org/10.1080/10720530802070635

Aveling

E.-L.

Gillespie

Cornish

(2015). A qualitative method for analysing multivoicedness. Qualitative Research, 15(6), 670–687. https://doi.org/10.1177/1468794114557991

Bakhtin

(1986). Speech genres & other late essays. University of Texas Press.

Bavelas

Gerwing

Healing

(2017). Doing mutual understanding. Calibrating with micro-sequences in face-to-face dialogue. Journal of Pragmatics, 121, 91–112. https://doi.org/10.1016/j.pragma.2017.09.006

Branco

A. U.

Branco

A. L.

Madureira

A. F.

(2008). Self-development and the emergence of new I-positions: Emotions and self-dynamics. Studia Psychologica, 6(8), 23–39.

Bråten

(2009). The intersubjective mirror in infant learning and evolution of speech. John Benjamins Publishing.

Corti

Gillespie

(2016). Co-constructing intersubjectivity with artificial conversational agents: People are more likely to initiate repairs of misunderstandings with agents represented as human. Computers in Human Behavior, 58, 431–442. https://doi.org/10.1016/j.chb.2015.12.039

10.

Crossley

(1996). Intersubjectivity: The fabric of social becoming. Sage Publications Ltd.

11.

De Jaegher

Di Paolo

Gallagher

(2010). Can social interaction constitute social cognition? Trends in Cognitive Sciences, 14(10), 441–447. https://doi.org/10.1016/j.tics.2010.06.009

12.

De Jaegher

Peräkylä

Stevanovic

(2016). The co-creation of meaningful action: Bridging enaction and interactional sociology. Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1693), 20150378. https://doi.org/10.1098/rstb.2015.0378

13.

Dennett

D. C.

(1983). Intentional systems in cognitive ethology: The “Panglossian paradigm” defended. Behavioral and Brain Sciences, 6(3), 343–390. https://doi.org/10.1017/S0140525X00016393

14.

Dennett

D. C.

(1989). The intentional stance. MIT Press.

15.

de Ruiter

J. P.

Albert

(2017). An appeal for a methodological fusion of conversation analysis and experimental psychology. Research on Language and Social Interaction, 50(1), 90–107. https://doi.org/10.1080/08351813.2017.1262050

16.

Descartes

(1637). Discourse on the method for rightly conducting one’s reason and for seeking truth in the sciences. In Cress

D. A.

(Ed.), Discourse on method and meditations on first philosophy (pp. 1–45). Hackett Publishing Company.

17.

Dingemanse

Roberts

S. G.

Baranova

Blythe

Drew

Floyd

Gisladottir

R. S.

Kendrick

K. H.

Levinson

S. C.

Manrique

Rossi

Enfield

N. J.

(2015). Universal principles in the repair of communication problems. PLoS One, 10(9), e0136100. https://doi.org/10.1371/journal.pone.0136100

18.

Dunbar

Launay

Curry

(2016). The complexity of jokes is limited by cognitive constraints on mentalizing. Human Nature, 27(2), 130–140. https://doi.org/10.1007/s12110-015-9251-6

19.

Dunbar

Marriott

Duncan

N. D.

(1997). Human conversational behavior. Human Nature, 8(3), 231–246. https://doi.org/10.1007/BF02912493

20.

Epley

Keysar

Van Boven

Gilovich

(2004). Perspective taking as egocentric anchoring and adjustment. Journal of Personality and Social Psychology, 87(3), 327–339. https://doi.org/10.1037/0022-3514.87.3.327

21.

Eyal

Steffel

Epley

(2018). Perspective mistaking: Accurately understanding the mind of another requires getting perspective, not taking perspective. Journal of Personality and Social Psychology, 114(4), 547–571. https://doi.org/10.1037/pspa0000115

22.

Ferrari

P. F.

Gallese

(2007). Mirror neurons and intersubjectivity. In Braten

(Ed.), Advances in Consciousness Research (pp. 73–88). John Benjamins Publishing Company. https://doi.org/10.1075/aicr.68.08fer

23.

Fuchs

De Jaegher

(2009). Enactive intersubjectivity: Participatory sense-making and mutual incorporation. Phenomenology and the Cognitive Sciences, 8(4), 465–486. https://doi.org/10.1007/s11097-009-9136-4

24.

Gallagher

(2020). Action and interaction. Oxford University Press.

25.

Gallagher

(2023). Embodied and enactive approaches to cognition. Elements in Philosophy of Mind. https://doi.org/10.1017/9781009209793

26.

Gallese

Goldman

(1998). Mirror neurons and the simulation theory of mind-reading. Trends in Cognitive Sciences, 2(12), 493–501. https://doi.org/10.1016/s1364-6613(98)01262-5

27.

Gilbert

(2015). Joint commitment: What it is and why it matters. Phenomenology and Mind, (9), 18–26.

28.

Gillespie

(2003). Supplementarity and surplus: Moving between the dimensions of otherness. Culture & Psychology, 9(3), 209–220. https://doi.org/10.1177/1354067X030093003

29.

Gillespie

(2005). G.H. Mead: Theorist of the social act. Journal for the Theory of Social Behaviour, 35(1), 19–39. https://doi.org/10.1111/j.0021-8308.2005.00262.x

30.

Gillespie

(2006). Descartes’ demon: A dialogical analysis of Meditations on first philosophy. Theory & Psychology, 16(6), 761–781.

31.

Gillespie

(2006). Games and the development of perspective taking. Human Development, 49(2), 87–92. https://doi.org/10.1159/000091334

32.

Gillespie

(2020). Semantic contact and semantic barriers: Reactionary responses to disruptive ideas. Current Opinion in Psychology, 35, 21–25. https://doi.org/10.1016/j.copsyc.2020.02.010

33.

Gillespie

Cornish

(2010). Intersubjectivity: Towards a dialogical analysis. Journal for the Theory of Social Behaviour, 40(1), 19–46. https://doi.org/10.1111/j.1468-5914.2009.00419.x

34.

Gillespie

Glăveanu

de Saint Laurent

(2024). Pragmatism and methodology: Doing research that matters with mixed methods. Cambridge University Press. https://doi.org/10.1017/9781009031066

35.

Gillespie

Martin

(2014). Position exchange theory: A socio-material basis for discursive and psychological positioning. New Ideas in Psychology, 32, 73–79. https://doi.org/10.1016/j.newideapsych.2013.05.001

36.

Ginzburg

Fernández

Schlangen

(2007). Unifying self-and other-repair. In Artstein

Viue

(Eds.), Decalog 2007: Proceedings of the 11th Workshop on the Semantics and Pragmatics of Dialogue, 57–64. Citeseer.

37.

Goddard

Gillespie

(2025). Conversational repairs on Reddit: Widely initiated but often uncompleted. PLoS One, 20(1), e0316618. https://doi.org/10.1371/journal.pone.0316618

38.

Goffman

(1959). The presentation of self in everyday life. Penguin.

39.

Gopnik

Wellman

H. M.

(1992). Why the child’s theory of mind really is a theory. Mind & Language, 7(1), 145–171. https://doi.org/10.1111/j.1468-0017.1992.tb00202.x

40.

Heasman

Gillespie

(2017). Perspective-taking is two-sided: Misunderstandings between people with Asperger’s syndrome and their family members. Autism, 22(6), 740–750. https://doi.org/10.1177/1362361317708287

41.

Heider

(1958). The psychology of interpersonal relations. Lawrence Erlbaum Associates, Publishers.

42.

Hermans

H. J. M.

(2002). The dialogical self as a society of mind. Theory & Psychology, 12(2), 147–160. https://doi.org/10.1177/0959354302122001

43.

Hermans

H. J. M.

Dimaggio

(2004). The dialogical self in psychotherapy: An introduction. Routledge.

44.

Hinde

R. A.

(1997). Relationships: A dialectical perspective. Psychology Press.

45.

Hoever

I. J.

van Knippenberg

van Ginkel

W. P.

Barkema

H. G.

(2012). Fostering team creativity: Perspective taking as key to unlocking diversity’s potential. Journal of Applied Psychology, 97(5), 982–996. https://doi.org/10.1037/a0029159

46.

Hughes

Devine

R. T.

(2015). A social perspective on theory of mind. In Lamb

M. E.

Lerner

R. M.

(Eds.), Handbook of Child Psychology and Developmental Science: Socioemotional processes (7th ed., pp. 564–609). John Wiley & Sons, Inc. https://doi.org/10.1002/9781118963418.childpsy314

47.

Husserl

(1931). Cartesian meditations: An introduction to phenomenology. Kluwer.

48.

Ichheiser

(1943). Structure and dynamics of interpersonal relations. American Sociological Review, 8(3), 302–305. https://doi.org/10.2307/2085084

49.

Ichheiser

(1949). Misunderstandings in human relations: A study in false social perception. American Journal of Sociology, 55(suppl), 1–72.

50.

James

(1890). Principles of psychology. Harvard University Press.

51.

Jang

Lee

(2023). Introducing the Co-oriented Scansis (CoS) model: A case of chatbot, Lee-Luda. Public Relations Review, 49(4), 102360. https://doi.org/10.1016/j.pubrev.2023.102360

52.

Josephs

I. E.

Valsiner

(1998). How does autodialogue work? Miracles of meaning maintenance and circumvention strategies. Social Psychology Quarterly, 61(1), 68–83. https://doi.org/10.2307/2787058

53.

Kalla

J. L.

Broockman

D. E.

(2023). Which narrative strategies durably reduce prejudice? Evidence from field and survey experiments supporting the efficacy of perspective-getting. American Journal of Political Science, 67(1), 185–204. https://doi.org/10.1111/ajps.12657

54.

Kärkkäinen

(2006). Stance taking in conversation: From subjectivity to intersubjectivity. Text & Talk, 26(6), 699–731. https://doi.org/10.1515/TEXT.2006.029

55.

Kay

Gillespie

Cooper

(2024). From conflict and suppression to reflection: Longitudinal analysis of multivoicedness in clients experiencing depression. Journal of Constructivist Psychology, 37(3), 239–258. https://doi.org/10.1080/10720537.2023.2175752

56.

Kenny

D. A.

(1994). Interpersonal perception: A social relations analysis. Guilford Publications.

57.

Kenny

D. A.

Veldhuijzen

van der Weijden

LeBlanc

Lockyer

Légaré

Campbell

(2010). Interpersonal perception in the context of doctor-patient relationships: A dyadic analysis of doctor-patient communication. Social Science & Medicine, 70(5), 763–768. https://doi.org/10.1016/j.socscimed.2009.10.065

58.

Keysers

Gazzola

(2007). Integrating simulation and theory of mind: From self to social cognition. Trends in Cognitive Sciences, 11(5), 194–196. https://doi.org/10.1016/j.tics.2007.02.002

59.

Kinderman

Dunbar

Bentall

R. P.

(1998). Theory-of-mind deficits and causal attributions. British Journal of Psychology, 89(2), 191–204. https://doi.org/10.1111/j.2044-8295.1998.tb02680.x

60.

Koerner

Schrodt

(2014). An introduction to the special issue on family communication patterns theory. Journal of Family Communication, 14(1), 1–15. https://doi.org/10.1080/15267431.2013.857328

61.

Laing

R. D.

Phillipson

Lee

A. R.

(1966). Interpersonal perception: A theory and method of research. Tavistock.

62.

Lamm

Decety

Singer

(2011). Meta-analytic evidence for common and distinct neural networks associated with directly experienced pain and empathy for pain. NeuroImage, 54(3), 2492–2502. https://doi.org/10.1016/j.neuroimage.2010.10.014

63.

Lawrence

J. A.

Valsiner

(1993). Conceptual roots of internalization: From transmission to transformation. Human Development, 36(3), 150–167. https://doi.org/10.1159/000277333

64.

Lees

Cikara

(2020). Inaccurate group meta-perceptions drive negative out-group attributions in competitive contexts. Nature Human Behaviour, 4(3), 279–286. https://doi.org/10.1038/s41562-019-0766-4

65.

Lees

Cikara

(2021). Understanding and combating misperceived polarization. Philosophical Transactions of the Royal Society B: Biological Sciences, 376(1822), 20200143. https://doi.org/10.1098/rstb.2020.0143

66.

Levine

(2018). James and phenomenology. In Klein

(Ed.), The Oxford handbook of William James (pp. 103–120). Oxford University Press.

67.

Linell

(2009). Rethinking language, mind, and world dialogically: Interactional and contextual theories of human sense-making. Information Age Publishing.

68.

Linell

Marková

(1993). Acts in discourse: From monological speech acts to dialogical inter-acts. Journal for the Theory of Social Behaviour, 23(2), 173–195. https://doi.org/10.1111/j.1468-5914.1993.tb00236.x

69.

Livingstone

A. G.

(2023). Felt understanding in intergroup relations. Current Opinion in Psychology, 51, 101587. https://doi.org/10.1016/j.copsyc.2023.101587

70.

Livingstone

A. G.

Fernández

L. R.

Rothers

(2019). “They just don’t understand us”: The role of felt understanding in intergroup relations. Journal of Personality and Social Psychology, 119(3), 633–656. https://doi.org/10.1037/pspi0000221

71.

Marková

(1982). Paradigms, thought and language. Wiley.

72.

Marková

(2016). The dialogical mind. Cambridge University Press.

73.

Marková

Linell

(1996). Coding elementary contributions to dialogue: Individual acts versus dialogical interactions. Journal for the Theory of Social Behaviour, 26(4), 353–373. https://doi.org/10.1111/j.1468-5914.1996.tb00297.x

74.

Martin

Gillespie

(2010). A neo-meadian approach to human agency: Relating the social and the psychological in the ontogenesis of perspective-coordinating persons. Integrative Psychological and Behavioral Science, 44(3), 252–272. https://doi.org/10.1007/s12124-010-9126-7

75.

McLeod

J. M.

Chaffee

S. H.

(1973). Interpersonal approaches to communication research. American Behavioral Scientist, 16(4), 469–499. https://doi.org/10.1177/000276427301600402

76.

Mead

G. H.

(1913). The social self. The Journal of Philosophy, Psychology, and Scientific Methods, 10(14), 374–380. https://doi.org/10.2307/2012910

77.

Mead

G. H.

(1925). The genesis of self and social control. International Journal of Ethics, 35(3), 251–277. https://doi.org/10.1086/207491

78.

Mead

G. H.

(1934). Mind, self & society from the standpoint of a social behaviorist. University of Chicago Press.

79.

Merleau-Ponty

(1945). Phenomenology of perception. Routledge.

80.

Mondada

(2007). Multimodal resources for turn-taking: Pointing and the emergence of possible next speakers. Discourse Studies, 9(2), 194–225. https://doi.org/10.1177/1461445607075346

81.

Mondémé

(2022). Why study turn-taking sequences in interspecies interactions? Journal for the Theory of Social Behaviour, 52(1), 67–85. https://doi.org/10.1111/jtsb.12295

82.

Moore

Gillespie

(2014). The caregiving bind: Concealing the demands of informal care can undermine the caregiving identity. Social Science & Medicine, 116, 102–109. https://doi.org/10.1016/j.socscimed.2014.06.038

83.

Neimeyer

R. A.

(2006). Narrating the dialogical self: Toward an expanded toolbox for the counselling psychologist. Counselling Psychology Quarterly, 19(01), 105–120. https://doi.org/10.1080/09515070600655205

84.

Newcomb

T. M.

(1953). An approach to the study of communicative acts. Psychological Review, 60(6), 393–404. https://doi.org/10.1037/h0063098

85.

Nguyen

Versyp

Cox

Fusaroli

(2022). A systematic review and Bayesian meta‐analysis of the development of turn taking in adult–child vocal interactions. Child Development, 93(4), 1181–1200. https://doi.org/10.1111/cdev.13754

86.

O’Toole

Dubin

(1968). Baby feeding and body sway: An experiment in George Herbert Mead’s “taking the role of the other”. Journal of Personality and Social Psychology, 10(1), 59–65. https://doi.org/10.1037/h0026387

87.

Peirce

C. S.

(1955). Philosophical writings of Peirce. Dover Publications.

88.

Phelps

Komnæs

Gillespie

(2025). Perspective-getting, taking and integrating: Cultivating procedural justice turn-by-turn to manage heated conflicts. Organization Studies, Online ahead of print. https://doi.org/10.1177/01708406251400477

89.

Purver

Hough

Howes

(2018). Computational models of miscommunication phenomena. Topics in Cognitive Science, 10(2), 425–451. https://doi.org/10.1111/tops.12324

90.

Rommetveit

(1974). On message structure: A framework for the study of language and communication. John Wiley & Sons.

91.

Sacks

Schegloff

E. A.

Jefferson

(1974). A simplest systematics for the organization of turn-taking for conversation. Language, 50(4), 696–735. https://doi.org/10.2307/412243

92.

Saxe

(2005). Against simulation: The argument from error. Trends in Cognitive Sciences, 9(4), 174–179. https://doi.org/10.1016/j.tics.2005.01.012

93.

Scheff

T. J.

(1967). Toward a sociological model of consensus. American Sociological Review, 32(1), 32–46. https://doi.org/10.2307/2091716

94.

Schegloff

E. A.

(1992). Repair after next turn: The last structurally provided defense of intersubjectivity in conversation. The American Journal of Sociology, 97(5), 1295–1345. https://doi.org/10.1086/229903

95.

Schegloff

E. A.

(2007). Sequence organization in interaction: A primer in conversation analysis. Cambridge University Press.

96.

Schelling

T. C.

(1966). Arms and influence. Yale University Press.

97.

Schilbach

Timmermans

Reddy

Costall

Bente

Schlicht

Vogeley

(2013). Toward a second-person neuroscience. Behavioral and Brain Sciences, 36(04), 393–414. https://doi.org/10.1017/S0140525X12000660

98.

Schutz

(1932). The phenomenology of the social world. Heinemann Educational Books Ltd.

99.

Seltzer

Mitrook

(2009). Two sides to every story: Using coorientation to measure direct and meta-perspectives of both parties in organization-public relationships. Public Relations Journal, 3(2), 1–24.

100.

Sillars

Koerner

Fitzpatrick

M. A.

(2005). Communication and understanding in parent-adolescent relationships. Human Communication, 31(1), 102–128. https://doi.org/10.1111/j.1468-2958.2005.tb00866.x

101.

Sinclair

J. M.

Coulthard

(1975). Towards an analysis of discourse: The English used by teachers and pupils. Oxford University Press.

102.

Skantze

(2021). Turn-taking in conversational systems and human-robot interaction: A review. Computer Speech & Language, 67, 101178. https://doi.org/10.1016/j.csl.2020.101178

103.

Stetsenko

(2017). The transformative mind: Expanding Vygotsky’s approach to development and education. Cambridge University Press.

104.

Stiles

W. B.

Osatuke

Glick

M. J.

Mackay

H. C.

(2004). Encounters between internal voices generate emotions. In Hermans

H. J. M.

Dimaggio

(Eds.), The dialogical self in psychotherapy (pp. 91–107). Brunner-Routledge.

105.

Templeton

E. M.

Chang

L. J.

Reynolds

E. A.

Cone LeBeaumont

M. D.

Wheatley

(2022). Fast response times signal social connection in conversation. Proceedings of the National Academy of Sciences, 119(4), e2116915119. https://doi.org/10.1073/pnas.2116915119

106.

Thomas

K. A.

DeScioli

Haque

O. S.

Pinker

(2014). The psychology of coordination and common knowledge. Journal of Personality and Social Psychology, 107(4), 657–676. https://doi.org/10.1037/a0037037

107.

Tomasello

(1999). The cultural origins of human cognition. Harvard University Press.

108.

Tomasello

(2019). Becoming human: A theory of ontogeny. Belknap Press.

109.

Tomasello

(2020). The role of roles in uniquely human cognition and sociality. Journal for the Theory of Social Behaviour, 50(1), 2–19. https://doi.org/10.1111/jtsb.12223

110.

Trevarthen

(1998). The concept and foundations of infant intersubjectivity. In Bråten

(Ed.), Intersubjective communication and emotion in early ontogeny (pp. 15–46). Cambridge University Press.

111.

Trevarthen

(2015). Infant semiosis: The psycho-biology of action and shared experience from birth. Cognitive Development, 36, 130–141. https://doi.org/10.1016/j.cogdev.2015.09.008

112.

Trevarthen

Aitken

K. J.

(2001). Infant intersubjectivity: Research, theory, and clinical applications. The Journal of Child Psychology and Psychiatry and Allied Disciplines, 42(01), 3–48.

113.

Tuazon

G. F.

Wolfgramm

Whyte

K. P.

(2019). Can you drink money? Integrating organizational perspective-taking and organizational resilience in a multi-level systems framework for sustainability leadership. Journal of Business Ethics, 168(3), 1–22. https://doi.org/10.1007/s10551-019-04219-3

114.

Tuomela

(2005). We-Intentions revisited. Philosophical Studies, 125(3), 327–369. https://doi.org/10.1007/s11098-005-7781-1

115.

Valsiner

(2005). Scaffolding within the structure of dialogical self: Hierarchical dynamics of semiotic mediation. New Ideas in Psychology, 23(3), 197–206. https://doi.org/10.1016/j.newideapsych.2006.06.001

116.

Valsiner

(2018). The promoter sign: Developmental transformation within the structure of dialogical self. In Beyond the mind: Cultural dynamics of the Psyche (pp. 123–146). Information Age Publishing.

117.

Valsiner

Van de Veer

(2000). The social mind: Construction of the idea. Cambridge University Press.

118.

Van Riel

C. B.

Fombrun

C. J.

(2007). Essentials of corporate communication: Implementing practices for effective reputation management. Routledge.

119.

Vogeley

(2017). Two social brains: Neural mechanisms of intersubjectivity. Philosophical Transactions of the Royal Society B: Biological Sciences, 372(1727), 20160245. https://doi.org/10.1098/rstb.2016.0245

120.

Vygotsky

L. S.

(1997). The collected works of L. S. Vygotsky (Volume 4, edited by Rieber

R. W.

). Plenum Press.

121.

Waring

H. Z.

(2009). Moving out of IRF (Initiation-Response-Feedback): A single case analysis. Language Learning, 59(4), 796–824. https://doi.org/10.1111/j.1467-9922.2009.00526.x

122.

Warnell

K. R.

Redcay

(2019). Minimal coherence among varied theory of mind measures in childhood and adulthood. Cognition, 191, 103997. https://doi.org/10.1016/j.cognition.2019.06.009

123.

Wellman

H. M.

(2018). Theory of mind: The state of the art. European Journal of Developmental Psychology, 15(6), 728–755. https://doi.org/10.1080/17405629.2018.1435413

124.

Wittgenstein

(1953). Philosophical investigations. Blackwell.

125.

Woźniak

(2018). “I” and “Me”: The self in the context of consciousness. Frontiers in Psychology, 9, 1656. https://doi.org/10.3389/fpsyg.2018.01656

126.

Zittoun

Gillespie

Cornish

Psaltis

(2007). The metaphor of the triangle in theories of human development. Human Development, 50(4), 208–229. https://doi.org/10.1159/000103361

Dialogical Intersubjectivity: How Turn-Taking Scaffolds Perspective-Taking

Abstract

Keywords

Introduction

Psychological-Structural Approaches

Theory of Mind

Perspective-Taking

Orders of Intentionality

Takeaway: Recursive Levels of Intersubjectivity

Interpersonal-Structural Approaches

Misunderstandings

Interpersonal Perception

Coorientation Framework

Takeaway: Three Levels of Explicit Perspective-Taking in Dyadic Relations

Interpersonal-Interactional Approaches

Enactive Intersubjectivity

Conversational Repairs

Initiation-Response-Feedback Sequences

Takeaway: Three-Turn Interaction Sequences

Psychological-Interactional Approaches

The ‘I’ and the ‘me’

Dialogical Self

Multivoicedness

Takeaway: Three-Part Semiotic Sequences

Integrative Approaches

Primary, Secondary, and Tertiary Intersubjectivity

Shared Intentionality

Position Exchange Theory

Takeaway: Triadic Interaction

Three-Level Intersubjectivity

Three-Turn Intersubjectivity

Dialogical Intersubjectivity: From Turn-Taking to Perspective-Taking

Making the Implicit Explicit

Reinterpreting Internalization

Conclusion

Footnotes

ORCID iD

Funding

Declaration of Conflicting Interests

Author Biography

References