Abstract
This article explores the complexity of team interactions, emphasizing its multimodal layers containing verbal, paraverbal, and nonverbal cues. It highlights the potential of AI, particularly social signal processing, for understanding and enhancing team dynamics. Future research should embrace genuine interdisciplinary collaboration that combines expertise from social and computer science to address the messiness of real team interactions.
Introduction
Whether groups collaborate in face-to-face, hybrid, or fully virtual settings, and with or without the help of AI tools—this basic tenet still holds true and will continue to keep us busy for at least another decade of groups and teams research: “for members to achieve the collaboration and interdependence that make them a group rather than co-present individuals, they must interact” (Bonito & Sanders, 2011, p. 343). To me, taking this notion seriously means that we need to study actual group interactions, not subjective individual experiences or emergent states or group “processes” that are then captured using static self-report surveys.
So, the question of what the next decade of teams research will look like made me think of AI, sure, but also about finding new ways to tackle seriously messy social interaction data in teams and other interesting interpersonal constellations. Bear with me as I invite you to picture my kids’ bedroom for an illustration (I promise this will make sense in a bit). When my 5-year-old looks for a toy or an essential piece of Lego, this tends to have an explosive effect, both physically and emotionally. He dismantles every available Lego structure, empties all the boxes, and mixes his toys in an apparently random, but rapid manner into a wild soup of shapes, colors, and materials. The toy is typically not found until an adult (usually me) helps him dissect the chaos. This situation is further complicated by my 8-year-old offering to “help”. By the time the little Lego villain/plastic part of Optimus Prime’s leg/monster truck 1 is found, the paraverbal signals in the room have escalated significantly and I am secretly counting the days until my next escape to the INGRoup conference.
Multimodal Team Interactions
Digging through team interaction data is decidedly more fun for me than digging through toy soup. But there are a few common elements. Like my kids’ bedroom after several hours of intense play, team processes are fascinating, dynamic, complex, frequently unpredictable, and seriously messy phenomena—especially when we study real teams in the wild (see also Klonek, 2026). Like the moment of actually locating the missing piece of Lego, identifying a micro-level behavioral mechanism that explains successful team collaboration can feel like a serious triumph. And like first combing through the top layer of apparently random mess and starting to group similar toys (monster trucks on one pile, pirate gear on another pile), quantitative team interaction research also often starts with identifying broader categories of team behavior (e.g., problem-solving versus relational communication), then narrowing it down to more specific types of behaviors within each category. Indeed, this is what my colleagues and I have been doing for much of the past decade, trying to understand the interaction behaviors and patterns underlying successful teamwork (e.g., in this journal: Allen & Lehmann-Willenbrock, 2024; Kauffeld & Lehmann-Willenbrock, 2012; van der Meer et al., 2022).
Notably, all of these SGR examples focus on one modality of team interaction behavior: speech, analyzed as functions of various verbal statements. 2 Focusing on only modality, however, is like only looking at the vehicles in my kids’ bedroom, while ignoring all the other fun items and thus probably never finding the missing but essential piece of Lego.
Team interactions are beautifully messy multimodal puzzles. Especially when groups collaborate face-to-face, but also in virtual settings, their interaction is a multilayered composition of verbal statements (e.g., “Hey, good idea!”), paraverbal cues (e.g., the voice pitch that accompanies “Hey, good idea!”), and nonverbal cues (e.g., the accompanying facial expressions and gestures). Analyzing all of these cues simultaneously may make your head hurt; however, this reflects the true complexity of real interactions in groups and teams, and therefore, we need to account for it in our research.
Leveraging AI to Understand Multimodal Group Interactions
As one way to address this complexity and implement a “high-resolution” approach to team interaction (Klonek et al., 2019), social signal processing holds great promise. Social signal processing is a subdomain of computer science that uses sensing methodology (e.g., cameras, microphones, individual movement trackers) and machine learning to model, analyze, and synthesize so-called social signals in human as well as human-machine interactions (Vinciarelli, 2017). The core idea is to automatically extract behavioral cues from the sensor data (e.g., automatically trace individual movement) and then train machine learning models to predict meaningful behaviors from those cues. This work process essentially still requires human annotators in order to establish a “ground truth” for a machine learning model—especially when the model is tasked with predicting dynamic group phenomena, compared to the relatively simpler task of automatically detecting individual members’ behavioral conduct (e.g., individual members’ overall dominance in a group interaction; Bai et al., 2019).
For example, in two recent interdisciplinary collaborations, we applied social signal processing to detect dynamic cohesion in team meetings (Lehmann-Willenbrock & Hung, 2024) and to model convergence and divergence in group affect (Prabhu et al., 2025). What I hope to illustrate with these two examples is how interdisciplinary efforts that leverage social signal processing can provide much more fine-grained, complex, multimodal empirical analyses and theory testing than previously possible. There is a wealth of untapped potential for interdisciplinary collaborations leveraging social signal processing (for an overview, see Kozlowski et al., forthcoming). Once this research area becomes more populated by interdisciplinary efforts, “killer apps” (Buengeler et al., 2017) for understanding—and eventually enhancing—group processes become more likely. Of note, if we want to collaboratively build the basis for such killer apps, we need to be willing to invest serious time and energy into true interdisciplinary efforts. For example, inserting off-the-shelf AI methodology into group research projects, such as using large language models to facilitate analyses of group and team interactions (for an overview, see Kush et al., 2025), will only get us so far and cannot address the multimodal nature of group and team interactions. Interdisciplinary research projects that really push the frontier at the intersection of group process research and computer science need to be mutually beneficial and move away from producer-consumer types of collaboration (for more detailed discussions, see Allen et al., 2017; Lehmann-Willenbrock & Hung, 2024).
Around the Corner (or a Little Further): AI to Enhance Multimodal Group Interactions
The not-so-new, but continuously relevant quest to study interactions as core mechanisms of collaboration in groups and teams (e.g., Bonito & Sanders, 2011; Keyton, 2017) also has implications for the potential of AI to eventually function as a group member. Human-AI synergy is a neat but frequently not achieved idea (e.g., Vaccaro et al., 2024), and certainly no small feat when you consider the complexity of group interactions. AI should eventually be able to insert seamlessly and understand as well as synthesize complex group interaction behavior. Synthesis in this context means that an AI would be able to understand and respond to multimodal signals by group members just like a human would. Though the advance of intelligent virtual agents in virtual reality settings promises new opportunities for multimodal agentic AI, we are decidedly so not there yet (and maybe that’s a good thing). But if we want to get there, the community of groups and teams researchers needs to frequently and happily mingle with the social signal processing crowd (a partnership we called “geeks and groupies” a while ago; Lehmann-Willenbrock et al., 2017). From my own experience, I can report that this can be a ton of fun, while challenging a lot of the often implicit assumptions of our own disciplines (in my case, a tendency to obsess about constructs; debatable templates for a “research contribution”; etc.) and frequently owning up to feeling dazed and confused (e.g., when terms such as “coding” and “modeling” can mean very different things; or when discussing feature extraction and combination possibilities).
As a final thought and point of caution, AI may also distract us even more than we often already are, especially in remote group collaborations where multitasking is a serious challenge (Cao et al., 2021). Moreover, increasingly habitual AI usage binds group member attention that is then not available to focus on the group interaction. This potentially also challenges the group work skills of the generations that follow. I think we can avoid this by remembering that AI should serve humans, not the other way around; by educating our students how to balance AI usage; and by cherishing real-life, messy group interactions, even when they come in the shape of crazy toy soups (remind me later).
Footnotes
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
