Abstract
Time is regarded as the immanent dimension for the social experience. This phenomenologically informed perspective of time is built into the ethnomethodological programme jointly proposed by Garfinkel and Sacks as they set out to uncover social orders through examining the temporal sequence in practical activity. However, Garfinkel and Sacks took different paths from this initial proposal in their separate development of Ethnomethodological Studies of Work and Conversation Analysis. Focusing on different forms of data, the two programmes adopted different approaches to time and action in constructing the time structures in their sociological description of activity. However, the difference has seldom been subjected to discussion and much less attempt to explore a possible synthesis of the two programmes from there. This article attempts to address this gap by proposing a perspective of multi-layered temporality in social interaction. The analysis examines three extracts from a university communication workshop for students and explicates different modes of how simultaneous sequences can constitute participants’ action in situ: (1) simultaneous sequences by different actors; (2) simultaneous sequences by the same actor; (3) simultaneous sequences within a participatory framework. Contending the social actors’ phenomenological potential to perceive simultaneous sequences in different time frames, we conclude that the ‘situational time’ in EM and ‘conversational time’ in CA can be commensurable. Interweaving different layers of temporality into an ethnomethodological description, practitioners can better reconstruct a ‘reasonable total picture’ of social activity to manifest its complex, seen-but-unnoticed endogenous social order. Beyond ethnomethodology, the multi-layered perspective of time provides the basis for a holistic approach to time, allowing the enquiry of broader social time through studying social life in vivo.
Introduction: The immanent dimension of social time in ethnomethodology studies
In Seeing Sociologically (1948/2006), Garfinkel articulated the ‘method’ 1 of conceiving social actions sociologically from the members’ point of view within their own lived situations. It is a ‘method’ derived from Aron Gurwitsch and Alfred Schütz’s phenomenological insight about how an observer makes claims about others’ experience from a third-person perspective. At the heart of it is their argument that humans are given conditions to conceive experience beyond the point of time–space in which they exist, i.e. a fat moment, while the human body and soul coexist at one spatio-temporal coordinate at each and every passing moment within a matrix of cosmic 2 spatio-temporal dimension. Adopting the ‘method’ in his own research, Garfinkel began his studies of Ethnomethodology (EM), explicating the ‘lay methods’, or practical reasoning, underlying social members’ own construction of classical sociological variables, e.g. suicide and gender. Later, Garfinkel and Sacks (1970/1986), together and separately, shifted the focus of ethnomethodology from the construct of social meaning to the endogenous structure of social activity. As their respective programmes developed, under the headings of Studies of Work (Garfinkel, 1986) and Conversation Analysis (CA; Sacks et al., 1974), ‘procedure’ and ‘sequence’ were used to capture their new methods of explicating social orderliness through describing how social activity is achieved by social members over time. Although time 3 remained an immanent dimension for the social experience of meaning or activity within these programmatic foci, it remains unembedded from the cosmic spatio-temporal dimension to different extents in their sociological description, leaving a gap between subsequent EM/CA studies and Garfinkel’s initial vision of the ‘method’.
While this may be somewhat surprising given the importance of time throughout EM’s focus on human action, and being inescapable in forms of sequential analysis, the study of time in EM/CA has suffered the same fate as the study of time in sociology. Despite being prominent in Durkheim’s early work and being central to Marx’s theory of dialectic history, the sociological study of time has remained a marginal pursuit, with notable exceptions (Gurvitch, 1964; Sorokin & Merton, 1937; Zerubavel, 1981). Within sociology the study of time has tended to coalesce around questions of how time is constructed and rationalised as a collective culture or the medium of culture in a society, and how different social images of time (i.e. clock time/social time, cyclical/linear time) intersect and embody themselves in the rhythm or tempo of collective life (Adam, 1990; Hassard, 1990). While the study of time has ebbed and flowed around these theoretical concerns, new technologies and forms of digital social life have now presented new opportunities for the study of time, but also added further complexity to it (Uprichard, 2012). In digitalised society members routinely leave digital traces of interaction in cyber space both actively and passively producing multimodal ‘big data’ (e.g. textual, visual and audio-visual) that capture the temporal stream of (digital) life in real-time (R. J. Smith, 2014), providing opportunities for sociologists to examine social life and its temporality empirically in more concrete terms. At the same time, the abundance of data collectable over a relative short period of time provides sociologists with the possibility of examining social acts as embedded in layers of social-historical time (Lewis & Weigart, 1981; Uprichard, 2012). In a similar way, for EM studies new technologies provide abundant data and offer ways to examine the lived detail of social action and time in unprecedented fidelity but also pose challenges of how to incorporate and address different levels of time relevant to the interaction analytically. In this article we return to the early focus on time in Garfinkel’s EM and explore ways in which layers of relevant time can be analytically embedded within studies of social action as its indigenous structure.
The study of time in EM/CA studies
In order to consider time in EM/CA studies, a distinction has first to be made between studies about time and the methodological discussion of time. Several studies have topicalised time, analysing how social members construct and/or refer to time in social activities (Black et al., 1983; Button, 1990; Goodwin, 2002; Raymond & White, 2017). Black et al. (1983) compared the time structure of interaction between face-to-face interaction and electronic messaging, which was still novel at the time; Button (1990) and Raymond and White (2017) demonstrated that time formulations/references are used in talk not only to index time, but also as an integral part of producing intelligible actions and relations between actions. Comparatively, Goodwin’s (2002) paper encompasses more aspects of time in interaction, including the communicative affordance of language to construct a projective time structure and to embody time in storytelling. Although these studies also offered methodological insight into how time can be analysed, they tended not to build upon each other (only Raymond and White referenced Button, 1990) and so contribute to a cumulative body of work on how to analyse time in interaction.
Compared to the number of EM/CA studies on time, there have been even fewer methodological papers that place time at the centre of their discussion and address it as the dimension of social activities. Besides Garfinkel’s writing, we can identify two methodological papers dedicated to time in ethnomethodology: ‘The Social Actor: Social Action in Real Time’ (Sharrock & Button, 1991) and ‘Garfinkel’s Conception of Time’ (Rawls, 2005). Sharrock and Button traced the conception of time back to Schütz, reasserting EM’s phenomenological ground of describing social actions as social actors’ achievement in real-time. Rawls drew on Garfinkel’s PhD proposal (published as Seeing Sociologically; Garfinkel, 2006) to unpack his formulation of time in social actions. The fact that both papers heavily relied on early texts from before Ethnomethodology was established for their discussion of time shows the gap that Garfinkel left by his abstention from theoretical discussion about time in social interaction in his early published work. 4
In the last decade, multimodal conversation analysis and interactional linguistics have shown an increasing methodological interest in time. Although these emerging fields build upon CA, the pioneers in these fields recognised the need to push the methodological boundary of conventional CA to deal with the multimodality of social interaction captured by video data (Deppermann & Günthner, 2015; Deppermann & Streeck, 2018; Mondada, 2018, 2021a). In particular, they observe that the turn-by-turn sequentiality in CA is limited in examining multimodality, because how multimodal resources are produced by actors simultaneously are not necessarily synchronous with talk, or often ‘distributed in time’ (Mondada, 2021a, p. 398). In a 2021 paper presented at the Discourse and Rhetoric Group (DARG), Mondada (2021b) reconsidered sequentiality by tracing how CA and EM conceptualised time and temporality, arguing that temporality is the foundation of sequentiality. In the discussion, Mondada argued that EM and CA’s respective conceptions of time (i.e. EM’s inner time in Garfinkel, 1967; and CA’s ‘why that now?’ in Schegloff & Sacks, 1973) are both phenomenological conceptions of time, or what might be described as kairos in contrast to chronos. While Mondada raises the notions of time in both traditions, to advance this discussion further requires a more detailed examination and comparison of these conceptions of time and from that the possibility of methodological and analytic synthesis for multimodal studies.
In this article we take up that challenge. Through examining how Garfinkel and Sacks respectively conceptualised time, our discussion aims to provide a greater understanding of the structural difference between their forms of analysis and their emergent programmes of work. In the first part of the article, we review the historical context of how Garfinkel and Sacks departed from their joint proposal for studying the formal structure of social actions and explicated them in terms of two distinct time structures. The second part further unpacks the time structures in Garfinkel and Sacks’ respective modes of ethnomethodological description of activity. From here, we argue that the different time structures are rooted in different ‘unit acts’ (Garfinkel, 2019, pp. 185–187), or ‘units of action’ (B. S. Reed & Raymond, 2013), applied to reconstruct social activity in the respective programmes. The third part takes up Mondada’s point that the moments in which a single unit of action is perceived are more of kairos rather than chronos, but we argue that the relationship between them is worth further examination in terms of how multimodal resources distributed along clock-ticking time can be delineated and perceived as a sequence of actions. The final part of the article contends that sequentiality in social interaction is not necessarily a single chain of actions through the analysis of three data extracts from a training workshop. The analysis demonstrates that simultaneous sequences by different actors, simultaneous sequences by the same actor, or simultaneous sequences within a participation framework could be constitutive in the meaning of actions in vivo. Through this we then suggest a form of analysis in which multimodality in interaction can be described as a layered texture, in which different layers of sequences are mutually constitutive to form a unified picture, potentially bridging Garfinkel’s ethnographic description of work and Sacks’ description of work in vivo.
The historical context of the two modes of ethnomethodological studies of practical activities 5
One of Garfinkel’s original contributions to sociology was explicating the sequentiality of social actions (Korbut, 2014). In essence, sequentiality is an aspect of the endogenous time structure of social activity, governing what to do now, what has been done before, and what next. The action sequence that Garfinkel came up with for his ethnomethodological programme was largely shaped by the technological context in the 1950s when Garfinkel started his academic career. At that time tape-recording technology was not yet easily accessible, and field observation and interview were still the default technology to record the actual production of social life. So, Garfinkel had to use those methods to begin with, compromising his early ambition to explicate the organisation of members’ experience of actions at their vivid present (Garfinkel, 1948/2006). Because the field notes or interview records produced are inevitably written accounts mediated by human minds, they would be un-embedded, or further removed, from the texture of how social life appeared in the historical time-space. That is, the temporality or the spatiality of how the texture appeared would be imprecise or even inaccurate, and the imprecision was inherited in the impressionistic sequentiality in the analyst’s (i.e. Garfinkel’s) description of social activity (e.g. the documentary method of fact-finding in social sciences; Garfinkel, 1967).
As technology progressed, Garfinkel’s idea of sequentiality took a different path in the hands of Harvey Sacks. Sacks was one of the first advocates of embracing audio-recording devices as sociological research aids. Sacks, for example, formed his initial analysis of conversational conduct in his lectures at UCLA with audio recordings from phone calls to a suicide hotline (e.g. Sacks et al., 1974) and a youth therapy session (e.g. Sacks, 1995). From that time, transcripts of spoken interaction became a routine form of data for ethnomethodologists and would figure prominently in the development of Conversation Analysis. The technology and practice of audio recording allowed the sounds produced in conversation to be recorded mechanically without relying on naked ears and memory. Playing back the recordings, or ‘media records’ as they were called, the historical appearances of members’ spoken conduct could be reproduced in their original texture and tempo.
Despite sharing the same phenomenological footing for the basis of their major paper ‘On Formal Structures of Practical Actions’ (Garfinkel & Sacks, 1970/1986), Garfinkel and Sacks went on to develop the implications of their joint proposal into two distinct ethnomethodological programmes – Ethnomethodological Studies of Work and Conversation Analysis. Both programmes follow the proposal in the ‘Formal Structures’ paper to explicate social norms behind the achievement of social activities by analysing the sequentiality in them. However, the courses of actions that were then constructed in the analysis of the two programmes of work differ fundamentally as each became shaped by their distinct interests and the analytical technology applied. As a result, how the studies in the two programmes represent the dimension of time in social activities also differed. On the one hand, field ethnographers of Ethnomethodological Studies of Work observed a selected formal situation and described actions that they noted as procedurally significant to the business in the situation. As a result, the descriptions do not include every act and speech in the situation. On the other hand, using audio recordings as data, Conversation Analysis transcribed every bit of speech recoverable from the tapes and examined the transcription exhaustively. However, because audio recordings can only recover speech, Conversation Analysis was shaped by almost exclusive attention to talk, prioritising verbal occurrence over other ‘silent’ acts that are produced simultaneously with the conversation but not captured in the recording.
To analyse and subsequently publish the recorded materials, Jefferson, together with Sacks and Schegloff, developed the convention for transcribing the flows of talks reproduced by tapes into line-by-line transcripts. In this system, now known as the Jefferson Transcription System (Jefferson, 2004) and the standard for transcribing talk, how ‘cosmic time’ (Garfinkel, 1948/2006, p. 182; Schütz, 1945) flowed is roughly represented by the lengths of lines in the transcripts plus the lapse of silence recorded in between. Sacks, Schegloff and Jefferson (1974) applied the Jefferson System in their landmark study, ‘A Simplest Systematics for the Organization of Turn-taking for Conversation’, to reconstruct courses of interlocutors’ actions in their extracts of conversation. The analysis treated the lines in transcripts as the fundamental unit that brackets the interlocutors’ vivid present relevant to their judgement about the appropriate next moves. In the courses of actions reconstructed, the flow of cosmic time is hardly significant unless that talk overlaps and the co-positioning of the overlapped talk in the cosmic time becomes significant. Lines in a line-by-line transcript are then like vessels containing the speech as a static stock, and readers are left to imagine how the speech flowed into the vessels.
While the two programmes developed their respective process of recording and representing the temporal experience of social actions, the descriptions produced in both programmes lose some parts or aspects of the totality of time in interaction. In Garfinkel’s case, the ethnographic records were always mediated by the recorder’s durée (inner time: Garfinkel, 1948/2006, p. 116), losing the time unnoticed or deemed insignificant by the observer. In Sacks et al.’s case, transcripts of sounds produced along the clock-ticking time flow down line by line, rendering locutions as the only scale of time in interaction and represent other sensorial occurrences as annotations to that scale.
Losing some aspects of time, both Garfinkel and Sacks described their observation in an abridged time structure differing from the activity’s native time, which we illustrate schematically in Figures 1 and 2. 6 In the figures, the coloured parts represent the descriptive components that constitute a social activity in fieri, and the black arrow represents the native cosmic time that the activity is supposed to recouple. In Garfinkel’s activity description (illustrated in Figure 1), an a priori conception of a situation (the pink underlying canvas) is used to give an underlying relevancy to the durée noticed by the observer into (the orange strips) broken up by situational reasoning in between (the red dots). The mutually constitutive parts and whole together form a procedure-like ‘texture of relevances’ (Garfinkel, 1967, p. 166).

Garfinkel’s course of situated actions.

Sacks et al.’s course of situated actions.
In contrast, Sacks et al.’s description prioritises the utterance-by-utterance sequentiality over the utterances’ relevancy to any specific project in the conversation. In their description (illustrated in Figure 2), neighbouring utterances in the transcript by two different speakers are treated as turns (orange and blue strips) and they are connected by reasoning on an utterance-by-utterance basis (the red dots). These turns are treated similarly to a ‘speech act’ (Austin, cited in Turner, 1970) with an intra-turn projectivity, and the reasonings are reified as ‘rules of turn-taking’ and ‘adjacency pairs’ governing when to take turns and what is the appropriate next turn respectively. Together they propel a conversation forward in a game-like system (Fitzgerald, 2019; Sacks [Harvey] Papers, n.d., accessed 2017) regardless of the underlying practicality of the conversation.
In the two forms of activity description, ‘time’ flows differently. Time in Garfinkel’s descriptions progresses in a formal ‘situation time’, and time in Sacks et al.’s descriptions runs along an exhaustive ‘conversation time’. In both cases, speech and acts become un-embedded from their native temporality, or chronos (the measurable periodicity of time). While Garfinkel and Sacks’ respective programmes both described social activities as sequences of meaningful moments (kairos), the sequences are decoupled from the native cosmic time (chronos) in which the multimodal resources are produced and perceived by interactants in their entirety making up their kairos. As a result, both Garfinkel and Sacks’ programmes must compromise Garfinkel’s early formulation of social actions – a noema-noesis pair 7 – so that they can develop an operational method to break down the ‘situation time’ and ‘conversation time’ respectively into parts then to recouple them into a procedural sequence, forming an activity.
Chronos vs kairos – What is time in ethnomethodological studies of practical activities?
The practical choices to adopt different time structures have implications for the phenomenological structures in their description. To understand the implications the two Greek terms for ‘time’ chronos and kairos need to be explored a little more. The former – chronos – embraces the idea of a uniform (or at least a standard) time of the cosmic system and expresses time in terms of measurement. The idea of time expressed by the clock time or calendar falls into this basket. The latter – kairos – literally means the time of opportunity or the ‘right time’, denoting the formal character of time. While the quantitative time can be used in expressing kairos, the description where the term is applicable is not about measuring the duration in terms of a standardised cardinal system, but about ordering significant moments along an ordinal scale (i.e. what now? what comes next?) (J. E. Smith, 1969).
While Mondada (2015, p. 268) rightly points out, ‘[i]nteraction time is kairos more than chronos’, this should not mean that the idea of chronos is not important for social interaction. Humans in modern society use clocks, calendars and other devices to time their actions and to coordinate time. Nevertheless, humans immanently exist at a point in an ever-flowing time dimension. They can only perceive a fat moment of ‘now’ at any point of the present to determine a next course of action. As Garfinkel (1952) in his doctoral thesis also pointed out:
Not only are there the events that have happened, are happening, or will happen as the actor experiences them and locates them in their positions of antecedence and consequence, but they are located for him
8
in the specific positions of Now, Before, or After that a scheme of time permits him to fix – a scheme of time, that he regards as valid not only for himself but for others as well. He is thus able to compare, and indeed it is
In this passage, Garfinkel highlights humans’ experience is conditioned upon the accountability of a scheme of time beyond the point of time they momentarily inhabit. Action is one such accountable experience. While an action is always a flow of occurrence ongoingly produced by an actor and lost in time, others can perceive the flow retrospectively as their immediate phenomenal field and as a unity of ‘doing something’. In other words, actors are capable of telling what is happening by the trace of experience they habitually acquired up to the present point of time. Garfinkel (1948/2006) referred to this intersecting consciousness of ‘the existence at the present moment’ and of ‘living through a flow of time’ as the experience of a vivid present (p. 182). Vivid present is the host for the total experience of social actions in vivo, i.e. a cognitive field over time, and is also the qualitative unit of any course of actions, i.e. a temporal unit of kairos.
Nevertheless, this conception of time is merely a phenomenological proposition from the first-person point of view. Ethnomethodological studies of social activity are an observational discipline designed to observe, record and describe social phenomena from a third-person perspective. When analysts apply the philosophical proposition to construct description in their studies, kairos is to be recoupled with its native chronos and they mutually constitute each other. In other words, in the analysis of a social activity, an analyst observes the publicly available appearance of the activity and locates flows in chronos, or frames of time, that can be corresponded with a sequence of kairos, or ‘fat moments’ (Garfinkel, 2019, p. 114), salient to the participants’ practicality in the activity by the analyst’s own cultural knowledge about the activity.
However, in multimodal interaction, the question of ‘what is now’ and ‘what to do next’ may not be answerable by a single sequence of kairos (Mondada, 2018). Multiple sequences may be relevant in the lived experience of the ‘now-ness’ for the participants which collects what temporally unfolds in chronos into simultaneous and mutually constitutive frames of contexture. The remainder of this article sets out to illustrate this point by analysing three extracts from a communication training workshop. The workshop took place at a university and was organised by the authors, who used multiple video and audio technology to record the workshop and provided assistance to the professional trainer (Au-Yeung, 2021). The analysis will highlight and examine three formal possibilities of parallel but relevant sequences of kairos in the same flow of chronos in real-time interaction, namely (1) simultaneous sequences by different actors; (2) simultaneous sequences by the same actor; (3) simultaneous sequences within a participatory framework.
Analysis 1: Simultaneous sequences by different actors
For ease of reference, the participating author will be denoted by his dual overt functions in the event: Researcher/organiser. In Extract 1, 9 the Researcher/organiser (hereafter R/O) (marked by white arrows with black outlines) walks across the room from Camera 3 to Camera 1. Before the start of this Extract, the R/O was at the position marked as Rr in the layout diagram in Figure 3. At 02:24, the R/O starts to appear in the frame of Camera 3 walking in front of it. That part of his movement track is shown by the broken lines marked by the time stamp 02:24 in Figure 3. Then, he keeps moving along the wall at the front of the room and walks in front of the projector’s screen and then walks behind the table at the front of the room. This part of the track is marked with the time stamp 02:28. Although the corner near Door 1 is not visible in the video clip, he takes a turn at that corner and then reappears in the angle recorded by Camera 3 walking along the wall on the left side. This part of the track is marked with 02:32. He eventually reaches Camera 1 shown at the bottom left of the layout diagram at 02:36. Then, he stops walking and starts looking at the LCD screen of the camera. In the course of his movement, the trainer and the trainees sustain their configuration at the centre of the room and talks are ongoingly produced by the trainer.

Researcher/organiser moving inside the room.

An illustration of Researcher/organiser’s spatial movement in the room in Extract 1.
The R/O’s path of movement is accountable for moving from his starting position near Camera 3 to Camera 1. In terms of time, the chronos between 2:24 and 02:36 can be coupled with the R/O’s kairos of heading from Camera 3 to Camera 1. But he does not move randomly. He moves with an accountable pattern of moving at the peripheral area of the classroom along the classroom’s wall, recognisably avoiding the central area of the classroom. This pattern can be explicated through constructing adequate reasoning by comparing the choice with hypothetic alternatives considering only the physicality of the space. The alternatives are marked in green broken lines in Figure 4.

An illustration of the Researcher/organiser’s spatial movement in the room with alternatives in Extract 1.
According to Figure 4, if the R/O took a path proximal to Path (II) or (III), he could have passed the area marked by yellow. Hence, he would move increasingly close to the trainer and trainees interacting in this area. Two cases could happen if he walked past the trainer. Either he would walk in front of her, cutting across the reciprocal gaze between the trainer and the trainees, or he would walk close behind the trainer, running into the risk of bumping into her if she stepped back at the same time. Also, in either case, he would show more of his front toward the trainees, showing an orientation toward the trainer–trainee configuration, and possibly disrupt their attention. If the R/O took a path proximal to Path (I) not walking along the walls, he would still be recognisably avoiding the central area. But considering the visual field of the trainees radiating from their configuration, the path would not maximise his distance from the ‘hot zone’ of their focal area, which is also the central area where the trainer is standing.
In both reasonings, the trainer–trainee interaction going on at the centre of the classroom is relevant and constitutive to the accountability of the R/O’s movement in the shared space. However, in terms of kairos, the R/O’s action is not salient in the sequence of verbal interaction between trainer and trainees. In other words, the R/O’s action is technically situated in a sequence of kairos in parallel to the trainer–trainees sequence. Although they collect different contextures in the space, they occur in the same flow of chronos and the generic projectivity of the trainer and trainees’ sequence constituted the meaning of R/O’s actions. In short, co-present actors can have their respective kairos in parallel along a common chronos while they can constitute each other’s accountability.
Analysis 2: Simultaneous sequences by the same actor
In Analysis 1 the trainer and trainees’ interaction continues without disruption in the classroom despite the R/O’s action of moving in the classroom, thereby treating their simultaneous speech and acts as one generic sequence of kairos. However, as Extract 2 will show, the trainer is also able to play with the tempo of how she produces speech and acts to project diverted kairos, or a ‘liminal zone’ of meaning for her practical end.

The trainer delivering a learning point in an extended monologue.
Extract 2 is a monologue by the trainer occurring near the end of her delivery of the PowerPoint slide projected on the screen at the front of the classroom. 10 Between (1) to (2) in the Extract, a fourth point appeared in the lower cluster of points on the slide. Figure 5 shows the slide along with the English translation added for the analysis. This monologue was situated within layers of context. First, the generic educational interaction observed in Analysis 1 was still ongoingly accountable. Second, the monologue was situated in a learning activity, with the structure outlined on the slide. The activity was planned as the experiential roleplay exercise, glossed by the title ‘練習[Practice]’, about the transactional analysis model in psychology – PAC (standing for Parent-Adult-Child categorisation of ego states). Before actually roleplaying with the ‘案例[Case]’ listed, she asked two trainees, Andy and Stella (pseudonyms are used for all the participants), to simulate some prototypical transactions between the ego states. Extract 2 was situated at the end of the extended monologue, which followed the simulation as she went through the lower cluster of points under ‘探討[Discussion]’.

The PowerPoint slide at (2) in Extract 2 with English translation (red arrows added to index the new point).
As the Extract starts, the trainer transits from the last point to this point by producing the transition marker: ‘還有呢[Also]’ and clicks her pointer to make the fourth point under ‘[Discussion]’ appear on the slide. The point reads read ‘C-C最放鬆[C-C most relaxing]’. Starting the delivery, she almost reads out the point literally, producing ‘C對C是最放鬆[C to C is the most relaxing]’. Then she produces ‘剛才我們看到這個例子了[Just now we have seen this example le]’ while she moves near one of the roleplayers in the simulation, Stella, and puts her left hand onto Stella’s shoulder. Then, she reiterates the point adding personal pronouns ‘他[s/he]’ and ‘你[you]’. When she produces ‘他[s/he]’, the trainer stretches her right hand toward the other roleplayer, Andy; when she utters ‘你[you]’ she pats Stella’s shoulder with her left hand. The multimodal expressions so far can be heard as invoking the simulation, and the reiteration of the point can be heard as a reflection about it. As a result, the delivery of the final discussion point now has a generic structure of from-general-to specific and portends a potential ending of it. At this time, the trainer’s left hand remains on Stella’s shoulder.
Our specific interest in this Extract is concerned with what came after this delivery – a 0.85 s pause between (3) and (4), followed by the utterance ‘ah ok’. As we observed above, upon completing the reiteration of the point on the slide, a ‘learning’ about Stella and Andy’s simulation is hearable and portends an end to the point. In light of this structural position, an elicitation-response-feedback (ERF; Heap, 1985) sequence became relevant to solicit questions about the learning account. Then, the proceeding 0.85 s pause can be hearable as the trainer’s method of making-the-floor-accessible (Mondada, 2013) 11 for questions. But at the same time, her hand gesture does not return to its ‘home position’ (Sacks & Schegloff, 2002). She keeps her left hand on Stella’s shoulder, sustaining the projectivity of her delivery.
Arguably, the trainer’s action during the 0.85 s silence projects two potential sequences of kairos. If the trainees raise a question about the learning point, then the silence could be reflexively heard as making-the-floor-accessible, forming the first two parts of an ERF sequence. If no trainees ask a question, she can invoke the projectivity sustained by her hand gesture to resume her speakership, and the silence can be heard as her silence but not the students’. Such a here-and-now moment can be analytically referred to as a ‘liminal zone’ of projectivity, referring to an actor invoking sequences at different levels and which produce ambivalence in her action at that moment projecting an indefinite action at the third-part position. In this way, the liminal zone can be seen as the trainer’s method to mitigate a strong preference for a question (i.e. a moral organisation in which the absence of a question in the next turn is dis-preferred and consequential) while producing an opportunity for one.
As the trainer removes her hand from Stella and simultaneously produces ‘ah ok’, the liminal zone is ended. Producing new verbal and gestural expressions at the same time, she ends both the projectivities of ‘making-the-floor-available’ and ‘holding-the-floor’. But it is not as simple as one of the potential sequences gets realised. The dual projectivities still remain constitutive in making sense of ‘ah ok’. First, producing ‘ah’ and unfreezing her posture, the trainer terminated the projectivity sustained during the silence, resuming the floor to talk. Second, producing ‘ah’ and then the transition marker ‘ok’, and simultaneously stopping to index Stella, the trainer signals a shift away from content delivery and projects an end of the current point. Third, the making-the-floor-available silence and the ‘ah’ together made accountable an absence-of-question-and-acknowledgement pair that glossed the learning point as ‘understood’, i.e. a ‘passing device’ (Garfinkel, 1967, p. 167) that reified the ‘learning’ interactively. By achieving ‘learning’, she also projected the ending of the ongoing instruction and made the transition to the next action possible. Following ‘ok’, the trainer then instructed the trainees to go into groups and prepare for the roleplay to be examined in the next and final analysis.
Although this analysis shows that a sense of two simultaneous sequences of kairos can be made out of the trainer’s speech and act in diverted tempo, it does not mean that she was producing two actions at one time. The ambivalence in the liminal zone is not an ambiguity of what she did, but rather an etiquette of her candidly accountable action of making-the-floor-available. In the next analysis, we will further unpack the nature of simultaneous sequences of kairos. As the above analysis has hinted, these simultaneous sequences are nested and projected into the present in different temporal frames. In the case just examined, the trainer invoked her ongoing speakership and the more generic sequence of delivering a learning point to produce a form of soliciting questions designed for the learners’ level (i.e. university students). In the next analysis, we will demonstrate how these nested sequences of kairos can be projected interactively by analysing how the trainees commenced roleplaying in the next learning activity.
Analysis 3: Simultaneous sequences within a participatory framework
Extract 3 shows a brief exchange between two trainees, May and Andy, which can be seen as they start roleplaying. It was situated after the trainer appointed May and Andy’s group to commence the roleplay following the preparatory discussion. Extract 3a shows respectively the Extract and the key for the participants to be named in the following description: the trainer (Tr), May and Andy. Extract 3a transcribes the talk-in-interaction line by line and is signposted by a snapshot from the video during the Extract. It starts with a 4.3 s silence on line 0 while the other trainees are reconfiguring to orient their attention to May and Andy. Then, Andy counts 1 to 3 on line 1, and begins tapping on the desk with his both hands and fingers at line 2, mimicking typing on a keyboard. As soon as the mimicking becomes recognisable, May asks a question ‘(Andy), 這麼晚你在幹嘛呀? [(Andy), this late what are you doing ah?]’ in a rising tone. Hence, the line-by-line transcript breaks down the time in the Extract into a sequence of four parts: (1) the silence, (2) the counting by Andy, (3) the mimicking of typing by Andy, and (4) May’s question overlapping the later part of Andy’s ‘typing’. This sequence can be coupled with the conversational order of doing an ‘opening’ of a talk.

The Extract transcribed line by line.
Though it is perspicuous that Andy and May have begun the roleplay exercise by this point of time, the following analysis is interested in the precise timing in the Extract at which an analyst can say that the start of roleplaying has been achieved. From this the analysis will then explicate how simultaneous sequences of kairos within the ongoing ‘participatory framework’ (Goffman, 1979; Goodwin & Godwin, 2005) can constitute the understanding of the timing. To do so, the linear ‘conversation time’ represented by the line-by-line transcript is not sufficient. While the transcript represents the interaction in a way as if only the interlocutor producing sounds on each line was producing action in that segmented time, all participants, including the interlocutors, are acting simultaneously to sustain and elucidate the ‘perspicuous setting’ (Garfinkel, 2002, p. 225) that predicated and constituted the talk.
To overcome this, Extract 3b shows a timeline transcript of the same time frame, which transcribed all speech and acts horizontally along clock time showing how speech and acts in the wider participatory framework were distributed in the time frame. In Extract 3b, the time of line 0 in Extract 3a now includes embodied acts (although Extract 3b omits the first two seconds because of the constraint of space). Above the textual transcription, the snapshots with blue outlines show the images recorded from an observer’s perspective. They are overlaid in two frames to illustrate the participants’ movement, with yellow arrows added to show the movement direction. The snapshots with green outlines in the row underneath were images recorded from the trainer’s perspective captured by a head-mounted camera. Speech and acts are organised into arrays by their actors. Some actors such as the trainer’s were further broken down by their modes of actions (e.g. ‘steps’ and ‘gaze’). Typographic symbols (‘→’, ‘ >’ and ‘-’) are used to represent different projectivities in the non-verbal acts. Annotated snapshots are added corresponding to the time scale. Vertical broken lines numbered 1, 2, 3 correspond to the beginning of line 1 to 3 in Extract 3a, showing co-timing between simultaneously produced speech and acts. The line numbered 0.5 corresponds to a timing between line 0 and 1 in Extract 3a. To assist readers’ understanding of the following description, Figure 6 shows a bird’s-eye view of the room again together with the key for the participants named in the Extract.

The Extract transcribed in a timeline with annotated video snapshots.

A bird’s-eye view of the room with the key for the participants in the description.
Incorporating the embodied features of the interaction, levels of kairos of now-ness are accountable, predicating ‘nested’ membership categorisation devices (Housley & Fitzgerald, 2015; Sacks, 1995) for the participants. First and foremost, the generic kairos of an educational setting, i.e. a class, observed in Analyses 1 and 2 remains self-explicating. This setting predicates a quasi-teacher-and-student between the trainer and the trainees, which can constitute the sense of how they acted during the time of this Extract, to which can be predicated further devices, in this case ‘roleplaying’.
Before the time of the Extract, the trainer instructed the group at Table 1 to start the roleplay, and the reconfiguration in the Extract can be seen as the trainees follow the trainer’s instruction as predicated by their quasi-student category. At the beginning of the Extract, as shown by the first overlaid snapshots, Angie and Kate (cf. Figure 6 for their positions) are sliding away from Table 1 on their chairs. At the same time, only May and Andy are staying near the table. While they all sustain some embodied patterns as the students, i.e. sitting on their seats, they reconfigure to categorially map (Evans & Fitzgerald, 2017) a new categorisation device onto themselves on top of their student category. By creating recognisable distance to May and Andy at the table while maintaining their orientation to the duo, the withdrawn trainees together with those in the other group encircle the duo at a distance, embodying a ‘stage’ for the becoming roleplayer to roleplay. As a result, the members of the quasi-student category recategorise themselves into roleplayer and audience, a categorisation device, relevant to roleplaying.
This achievement can be seen as a fulfilment of the trainer’s instruction, constituting how the trainer then acted. At first, she is standing behind Andy during most of the 4.3 s silence. As the achievement became unambiguous, she steps away from Table 1. The stepping back is see-able as her relinquishing the new focal space and ceding her embodied speakership, or again the floor. 12 But the relinquishment is not done in one go but two stages. The stepping back before Andy begins counting is the first stage, which leaves the floor ‘open’ for the roleplayers to start the roleplay. The second stage is done after Andy’s ‘typing’ becomes hearable when she moves a step to her left toward Table 2. The two-stage relinquishment can be seen as another form of ‘liminal zone’, not of projectivity but of membership categorisation. After the first relinquishment, the trainer positions herself between the stage of the upcoming roleplaying at Table 1, and the audience formation which circumscribes the stage from a distance. Arguably, the relinquishment in the first stage is similar to the method of making-the-floor-available discussed in Analysis 2, allowing the trainer to re-assume the floor if the roleplayers had not taken up the speakership as instructed and to let it go unambiguously when they start as instructed.
Still, the trainer’s action of making-the-floor-available does not start the roleplaying. It is similar to the curtain-up in staged plays, by which the audience should know their right to speak is temporarily surrendered to the ‘stage’ from that moment onward. Although it predicates that the acting will start ‘very soon’, the acting does not necessarily start immediately after it. Similarly, in the Extract the acting does not start from the trainer’s relinquishment, but from the roleplayer’s first roleplaying act. And because the interaction is now embedded in the new kairos achieved by the categorial mapping, the participants would see the roleplayers’ acts with a footing (Goffman, 1979) of acting. So, Andy’s counting ‘one-two-three’ can be heard as counting down for the acting and his tapping on the table can be seen as acting out ‘typing’. At this point, the start of roleplaying can be seen as adequately accomplished, and this is also when the trainer fully relinquishes her control of the floor, letting the audience trainees watch the roleplaying with full attention.
In summary, there are at least three accountable levels of kairos that constituted the sense that Andy and May’s roleplaying had started as soon as Andy finished his counting and began typing. The class constituted the achievement of starting the roleplay as a non-sequential now-ness. It predicated the quasi-teacher-and-student device that enabled and constituted the lower-order sequential preference between the trainer’s instruction, and the trainees’ subsequent actions to get ready and start roleplaying. At a more granular level of kairos, the trainer’s instruction, the trainees’ reconfiguration, the trainer’s then relinquishment, and Andy himself counting ‘one-two-three’ all portended and framed a kairos of Spielwelt (game-world; Brincher & Moutinho, 2021) so that any of Andy’s subsequent acts that was adequately simulative can be taken for granted as roleplaying. Hence, as Andy’s ‘typing’ became accountable, the roleplaying could be seen as perspicuously started and that membership device becoming operational (Butler & Fitzgerald, 2010; Sacks, 1995). That is to say, although these actions unfolded a single timeline they were not necessarily a single sequence analytically. The subsequent actions did not immediately fulfil the projectivity of the precedent actions. Instead, each action pre-empted the preceding time frame for roleplaying with increasing granularity, ending with Andy marking the start precisely. From this perspective, the start was predicated and achieved within nested, mutually constitutive, and hierarchically organised sequences of kairos rather than a single sequence.
Concluding remarks
This article began by highlighting how the study of time has been both a central yet under-researched topic of enquiry in sociology including ethnomethodology. While for the sociology of time new technologies now make it possible to study time as real-time lived action (Uprichard, 2012), for Garfinkel’s ethnomethodology, where time has always been inherent to any analysis of lived social action, it was the limitations of available technology in the 1970s that proved an obstacle in preserving the immanent natures of time in social activity. As neither the naked eye nor audio recordings could preserve the full texture of what participants experience in their immediate time-space, both Garfinkel and Sacks in their respective programmes relied upon abridged time structures to represent the temporal organisation in social activities and social interaction. However new recording technologies have opened up the possibility of re-engaging with time in social action, for both the sociology of time and ethnomethodological approaches to examine further the granular nested quality of actions and time within a lived temporal flow (Mushin & Doehler, 2021; Yagi, 2021).
For this study we began by drawing upon the concepts of chronos and kairos to distinguish the two phenomenological aspects of time in practical activities. In particular we demonstrated how it was analytically possible to show how while interactants’ acts and speech are temporally produced along the clock-ticking chronos, interactants produce and perceive these multimodalities as sequence(s) of kairos, or ‘fat moments’. From this our analysis highlighted the possibility of showing how a member’s actions can be oriented to producing a sequence parallel to and relevant to a sequence simultaneously produced by another group of actors (see Analysis 1), multiple sequences to produce ambivalence (see Analysis 2), or nested sequences collaboratively with other actors within a participatory framework to pre-empt the start of a new mode of interaction (see Analysis 3).
Finally, we want to draw attention to the form of analysis adopted in presenting the data and pursuing the analysis. The analyses transcribed the multimodality in video data into multi-layer timelines to be able to represent and explore more sophistically coordinated, multi-layered temporal structures in social activities. For example, in the roleplay extracts, how the unfolding talk-in-interaction is intertwined with silent-in-background projectivity in broader temporal frames, producing the gestalt-contexture of ‘what is now’ perceivable for the participants. Moreover, by arguing that the meaning of actions can be constituted by multiple sequences of kairos produced simultaneously, the analyses deliberately move away from the ‘next-turn proof procedure’ in contemporary conversation analysis. Instead, the analysis adopted Sacks’ early constitutive logic of producing sociological description, of which the strength does not reside in any particular link in the description but in the ‘reasonable total picture’ (Sacks, 1995) interwoven by those links. This logic allows bringing in an ethnographic level of description (Jimenez & Smith, 2021) of context into the analysis of the contingency in interaction as long as the ‘context’ is analytically accountable through some multimodal resources and stably constituted practical sense of some actions for the participants. These inter-sequence constitutive links together with the intra-sequence links (e.g. adjacency pairs) form the totality of the description of social activity in a situation ‘beyond reasonable doubt’. The return to constitutive logic for ethnomethodological description combining with new analytical techniques is designed to contribute to uniting the methodological difference between Ethnomethodology Studies of Work and Conversation Analysis and between their current practices and Garfinkel’s original vision. For sociology in general, the multi-layered intra-constitutive temporal structure offers a bridge between the live interactional time recoverable from temporal data and the broader social times that are fuzzily delineated and only ethnographically available to analysts – allowing a data-driven holistic approach to time.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
