Abstract
Teams are central to human accomplishment. Over the past half-century, psychologists have identified the Big-Five cross-culturally valid personality variables: Neuroticism, Extraversion, Openness, Conscientiousness, and Agreeableness. The first four have shown consistent relationships with team performance. Agreeableness (being harmonious, altruistic, humble, and cooperative), however, has demonstrated a non-significant and highly variable relationship with team performance. We resolve this inconsistency through computational modelling. An agent-based model (ABM) is used to predict the effects of personality traits on teamwork, and a genetic algorithm is then used to explore the limits of the ABM in order to discover which traits correlate with best and worst performing teams for a problem with different levels of uncertainty (noise). New dependencies revealed by the exploration are corroborated by analyzing previously unseen data from one of the largest datasets on team performance to date comprising 3698 individuals in 593 teams working on more than 5000 group tasks with and without uncertainty, collected over a 10-year period. Our finding is that the dependency between team performance and Agreeableness is moderated by task uncertainty. Combining evolutionary computation with ABMs in this way provides a new methodology for the scientific investigation of teamwork, making new predictions, and improving our understanding of human behaviors. Our results confirm the potential usefulness of computer modelling for developing theory, as well as shedding light on the future of teams as work environments are becoming increasingly fluid and uncertain.
Keywords
Using a novel interdisciplinary computational modelling approach based on agent-based modelling and genetic algorithms that is grounded in personality psychology, we identify that higher average team agreeableness correlates with better performance for tasks with uncertainty. Agreeableness is a personality trait that corresponds to compliance, trust, tendermindedness, and social harmony. Psychologists have found inconsistent results for this trait in teamwork—some teams benefit from it while others do not. We resolve these conflicting findings and show a new role of agreeableness through the use—for the first time in this field—of an agent-based model to simulate how individuals with different personality traits perform as a team. We confirmed the prediction with data gathered over more than a decade comprising 3698 individuals in 593 teams on tasks with and without uncertainty.Significance Statement
Introduction
Human behavior was thought by many to be largely unpredictable because of factors such as randomness and individual differences (Cziko, 1989), despite decades of research by psychologists. These ideas now seem dated. With enough data, it is possible to be remarkably accurate making predictions about us, some of which form a fundamental part of our modern world. Using data, we can predict the likes or dislikes of individuals for specific products or online content—a fact exploited by recommender systems worldwide (Resnick and Varian, 1997). Modeling human behavior is also a commonplace in architecture design (e.g., when analyzing access routes to fire exits (2011)) and economics (e.g., when studying trust networks or styles of auction (Smith, 1991)).
However, some human behaviors remain inherently opaque. Gather individuals together as a team and ask them to complete a task—especially if uncertainty is involved—and the likely success of one team compared to another equally qualified team becomes far less clear. Why is it that one team might break down into intense debate or heated arguments that stall progress, while another might gel as a team and excel in finding good solutions rapidly? Any method that might provide some clues to the likely success or failure of teams could have significant impact and help organizations allocate appropriate individuals to form more effective teams.
Task uncertainty adds to the challenge of understanding teams. It is defined as a lack of predictability associated with inputs, processes, and outputs of the broader system within which the work is performed (Cordery et al., 2010). Task uncertainty may thus be differentiated from task complexity (Wood, 1986), in that it is a property of the immediate task environment, rather than the task itself. In optimization, this is equivalent to the distinction between problem noise and difficulty. At the extreme, task uncertainty may include tasks with deep uncertainty (Marchau et al., 2019)—tasks with outcomes that may have multiple plausible futures and where it may be hard to determine key factors within models such as interaction between variables or desirability of outcomes; we focus on tasks where such factors can be defined.
To date, most approaches to the problem of predicting the performance of human teams have been largely empirical (Muthukrishna and Henrich, 2019). Statistical studies of data produced by individuals as they attempt different tasks are used to derive statistical models or psychological theories. Some attempts to use computer modelling have been made, but their predictions are largely unverified with data (Ahrndt et al., 2015; Lim and Bentley, 2018, 2019b; Salvit and Sklar, 2012). Neither approach explains teamwork adequately—the former suffers from a lack of data on teams and may not offer insights into why certain combinations of individuals may work better than others, while the latter suffers from a lack of rigorous corroboration.
For the first time, this work combines and enhances both approaches by constructing an agent-based model to explicitly represent the assumptions made by the well-established five-factor model of personality and using a genetic algorithm in order to make predictions of the relative success of different teams when tackling problems of differing uncertainty. The quality of these predictions is then assessed through in-depth analysis of one of the largest team performance datasets comprising 3698 individuals in 593 teams collected over a 10-year period.
This research aims to show that evolutionary computation can explore the extremes of an agent-based model to result in (A1) valid predictions as confirmed by the literature and (A2) novel predictions that can be corroborated by real-world data and used to help explain real phenomena.
In the course of this work, we make the following contributions to the field of evolutionary computation: (i) A novel particle swarm optimization (PSO) algorithm that uses different strategies per particle (agent) to form an agent-based model (ABM) of human behavior. This is the first work in the field that rigorously maps personality traits to agent behavior. (ii) The validation of predictions arising from the use of a genetic algorithm (GA) with the ABM is performed by comparing with the literature (on known trends), and with a previously unseen dataset (to corroborate new findings). Such validation is rarely performed in the literature. (iii) The application of evolutionary computation (GAs, PSO, and ABMs) on a new domain of psychology. (iv) A demonstration of a scientific methodology based on evolutionary computation: using GAs to search the limits of an ABM to suggest hypotheses, which can then be proven by the analysis of real-world data (Bryson et al., 2007).
While these contributions highlight an epistemological approach to explore relationships, this work also establishes clear ontological relationships between agreeableness, task uncertainty, and team performance, making a significant contribution to the field of team psychology. Agreeableness (being harmonious, altruistic, humble, and cooperative) has to date demonstrated a non-significant and highly variable relationship with team performance: some studies report agreeableness to improve team performance (Barrick et al., 1998; Bell, 2007; Neuman et al., 1999; Neuman and Wright, 1999; Peeters et al., 2006), others find no correlation between agreeableness and team performance (Barry and Stewart, 1997; Bell, 2007; Day and Carroll, 2004; Mohammed and Angell, 2003; Peeters et al., 2006). This work resolves this inconsistency: the combination of GA and ABM predicts that team member agreeableness improves team performance in tasks with uncertainty but not in tasks without uncertainty, and we corroborate the prediction with a new analysis of real-word data on teamwork.
The rest of the paper is organized as follows. Section 2 describes background literature, Section 3 describes the method, Section 4 the results, Section 5 the analysis, Section 6 limitations and future work, and Section 7 concludes.
Background
Statistical and probabilistic models of human behavior
The increasing prevalence of data has led many researchers to focus on data-driven approaches. Gonzalez et al. (2008) studied the trajectory of mobile phone users and found that human trajectories show a high degree of temporal and spatial regularity. Almeida and Azkune (2018) created a deep learning architecture based on long short-term memory networks that offers a probabilistic model to identify anomalous user behaviors and help detect risks related to mild cognitive impairment and frailty. Sagl et al. (2012) used user-generated data in mobile networks and voluntarily published information on social media platforms to understand how collective social activity shapes urban systems. Fabi et al. (2013) modelled user behavior in the context of real energy use and applied to a case study. The methodology, based on a medium/long-term monitoring, is aimed at shifting toward a probabilistic approach for modelling the human behavior related to the control of indoor environment.
Despite these works on individual behaviors, data on team performance are still almost non-existent, as it is difficult to gather in any meaningful scale or with any useful degree of accuracy.
Evolving agent-based models
Agent-based models (ABMs) are increasingly used to investigate the processes, mechanisms, and behaviors of complex social systems due to their ability to capture the nonlinear dynamics of social interactions. The interaction between rules and agents can often lead to emergent behavior useful in modelling complex systems such as teams, and ABMs are used in a wide range of research areas including sociology, biology, epidemiology, and ecology (Bentley, 2009; Cuthbert et al., 2017; Farmer and Foley, 2009). Evolutionary computation techniques have been used extensively with ABMs, as they exhibit similar properties to biological systems (e.g., genotype/phenotype mapping, adaptation, and evolution), and they can handle large search spaces efficiently (Morrall, 2003; Whigham and Fogel, 2006). Existing models of nature-inspired evolutionary ecosystems include Echo (Holland, 1992), Swarm (Minar et al., 1996), and SugarScape (Epstein and Axtell, 1996).
There has also been increasing research that uses evolutionary ABMs to explore and understand human interactions (Bentley and Lim, 2022). Nikolic and Dijkema (2010) developed an evolutionary modeling process for creating evolving, complex ABMs to understand the evolution of large-scale sociotechnical systems such as infrastructures for energy, water, and transport. Lim et al. (2015) evolved app developer strategies in the app store to understand the relationship between stability and fitness. Other examples include financial markets (Palmer et al., 1994; Takadama et al., 2001), human resource and organizational culture (Chen et al., 2002), foreign exchange markets (Izumi and Ueda, 2001), labor markets (Tassier and Menczer, 2001), and electricity markets (Bunn and Oliveira, 2001; Nicolaisen et al., 2001).
Agent-based models of personality in collaboration
Agent-based modelling has a long history of success in many related fields from economics and cooperative behaviors, to social conflict, civil violence, and revolution. However, its use remains very limited in studies of human psychology and their effects on teamwork.
The personality of individuals in a team provides a key role in understanding whether the team will be successful or not (Barrick et al., 1998; Barry and Stewart, 1997; Bell, 2007; Mohammed and Angell, 2003; Moynihan and Peterson, 2001; Neuman and Wright, 1999; Neuman et al., 1999; Peeters et al., 2006; Peterson et al., 2003). Human personality is classically divided into five basic, universal traits known collectively as the Big-Five, namely, (i) neuroticism—a tendency to experience negative emotions such as anxiety and anger, (ii) extraversion—an inclination to be gregarious, assertive, and active, (iii) openness to experience—the propensity to be intellectually curious and prefer variety, (iv) agreeableness—a predisposition toward working collaboratively and building social harmony, and (v) conscientiousness—the propensity for orderliness and self-discipline (Costa and MacCrae, 1992; Goldberg, 1990).
Recently, researchers in agent-based modelling have begun using personality to determine agent behaviors. Early models have interpretations that loosely match psychology literature, are context dependent, and largely unverified (Ahrndt et al., 2015; Salvit and Sklar, 2012). More recently, Lim and Bentley (2018) created a more general, problem-independent computational model aiming to improve our understanding of the effects of personality on teamwork. Initial work used Jung’s type theory as the personality model to determine agent behavior (Lim and Bentley, 2018) and examined the effects of team diversity in individual backgrounds (Lim and Bentley, 2019b), the effects of dynamic problems (Lim and Bentley, 2019a), and the effects of reward sensitivity (Guo et al., 2020).
In this work, we created a new model using the Big-Five trait theory as the personality model, following the principles established in previous works (Guo et al., 2020; Lim and Bentley, 2018, 2019a, 2019b). Big-Five is widely used in modern psychology research (Costa and MacCrae, 1992) and regarded as having better functional evidence that the different dimensions of human personality may be associated with specific structural neuroanatomic correlates (Sampaio et al., 2014).
Method
In order to investigate the two hypotheses, we developed an agent-based model (ABM) combined with a genetic algorithm to investigate the effects of personality on team performance, for tasks with different levels of noise. Our ABM is inspired by particle swarm optimization (PSO) (Kennedy et al., 2001), which simulates social behavior represented by the movement of individual organisms in a swarm (Arrow et al., 2000). In our model, the swarm represents the team, and each agent represents an individual in the team searching for a good solution to their team task, each with their own perceptions, memory, actions, and personalities.
Agent-based model
We developed an ABM of human collaboration to simulate the differing behaviors of people according to their Big-Five personality. Our ABM has the following key abstractions. (i) Team goal: Inspired by previous works (Lim and Bentley, 2018, 2019b), we model the shared goal of each agent as the group task to optimize a simple two-dimensional parabola function with domain comprising the possible positions of the agent and range corresponding to the solution quality. (ii) Agent behavior: Agent behavior is determined by its personality. We model the current mental state of each agent by giving it a position in a two-dimensional solution space (denoting the solution its mind has found so far), a velocity vector (denoting the direction and speed of its thought process), and acceleration vectors (representing the force of ideas and influences that modify the direction and speed of thought, determined by its Big-Five personality traits). (iii) Information exchange: We model information sharing between agents as they work as a team to solve the problem. The exact type of information perceived by each agent and its use is determined by its Big-Five personality traits, for example, extraverted individuals interact with more members of their team than introverted individuals, individuals high on openness explore more diverse solutions than less open individuals, and agreeable individuals take other team members’ opinions into account when making decisions. (iv) Task uncertainty: To model tasks without uncertainty, agents perceive the quality of solutions perfectly; to model tasks with uncertainty, agents perceive the quality of solutions with random error, that is, the degree of noise in the problem is increased.
All abstractions were derived from psychology literature and the effects of individual traits validated (described later). No attempt was made to calibrate the model in order to “design in a desired result” for the experiments, and indeed the complexity of the model prohibits such parameter tuning. The dataset used for corroboration of final results was not used for model design or validation.
The algorithm of the model is illustrated in Figure 1 and detailed in the following subsections. Algorithm of the model.
Initialize
The model is initialized with a problem space
Update
For each timestep
Velocity
Each trait is modelled as a scale between 0.0 to 1.0, where the likelihood of the behavior corresponding to the trait is proportional to the agent’s score in the trait (Fleeson and Gallagher, 2009). For example, if agent ALGORITHM 1: Agent i behavior at timestep t If With probability Otherwise, velocity Else if Else: Set of acceleration vectors If Else If If Finally, acceleration End Model equations and implementation. Model constants and values.
Evaluate
Agent
Finally, the agent’s personal best
Settings and function
The model was initialized with an objective function
The function was normalized such that Visualization of task used by the agent-based model to represent a simple team task, with solution found in the center of the solution space; 
Model validation
In order to validate the accuracy of predictions made by the model and ensure its behaviors are representative of reality, a series of preliminary validations were performed, following principles of good modelling practice (Bentley, 2009). These involved running the simulation with the settings in Table 2.
Figure 3 shows the results of the validation, which demonstrated that the model trends match published trends. We describe the published trends and corresponding model trends as follows: 1. Team performance deteriorates as uncertainty increases (Cordery et al., 2010). In Figure 3, as 2. Variability in performance increases as uncertainty increases (Cordery et al., 2010). In Figure 3, as 3. Neuroticism decreases team performance, especially when levels are very high (Peeters et al., 2006). Figure 3 Row 1 shows that when 4. Extraversion improves team performance (Barrick et al., 1998; Bell, 2007; Peeters et al., 2006), but high levels of extraversion in simple tasks decrease team performance (Moynihan and Peterson, 2001; Peeters et al., 2006). Figure 3 Row 2 shows that when 5. Openness to experience increases team performance in challenging tasks (Bell, 2007; Peterson et al., 2003) but less so in simple tasks (Peeters et al., 2006). Figure 3 Row 3 shows that when 6. Literature findings for agreeableness have been inconsistent, with some finding positive relationships between agreeableness and team performance (Barrick et al., 1998; Bell, 2007; Neuman et al., 1999; Neuman and Wright, 1999; Peeters et al., 2006), others finding no relationships between agreeableness and team performance (Barry and Stewart, 1997; Bell, 2007; Day and Carroll, 2004; Mohammed and Angell, 2003; Peeters et al., 2006). Figure 3 Row 4 shows that when 7. Conscientiousness increases team performance and also decreases variability (Barrick et al., 1998; Bell, 2007; Neuman and Wright, 1999; Peeters et al., 2006). Figure 3 Row 5 shows that in all 3 noise levels, higher conscientiousness results in higher performance and smaller standard deviation. Model validation results plotting team performance averaged across 100 runs against teams with mean levels of each trait from 0.0 to 1.0 (with the other traits randomized) for

Using a genetic algorithm to explore the limits of the ABM
Having established that the ABM performs as expected by the literature when examining each personality trait independently, we now revisit the aims of this work, which are to show that evolutionary computation can explore the extremes of an agent-based model to result in (A1) valid predictions as confirmed by the literature and (A2) novel predictions that can be corroborated by real-world data and used to help explain real phenomena. The validation in the previous section shows that each personality trait, when considered independently, appears to correlate with behavioral trends reported in the literature. However, in reality, every member of a team has specific levels of each trait that may interact and alter the way each member interacts in their team. Understanding this complex set of interactions requires a different approach.
The agent-based model is based on PSO, with every agent attempting to find the optimal solution according to its own personality. It follows that for a given problem type, there will be extreme team combinations: specific combinations of agents will perform better on average, other combinations will perform worse. We therefore need a new experimental tool to use in order to examine the limits of the ABM model. When forced to behave as effectively as it is capable of—or as ineffectively—what combinations of agent traits are best? We use a genetic algorithm as our experimental tool for this purpose.
The genetic algorithm (GA) was used to evolve an optimal (and least optimal) combination of agent personalities within the ABM for solving noisy and non-noisy problems (i.e., problems with and without uncertainty, respectively) and in this way predict which teams in the ABM are likely to work more effectively (and less effectively) for different problems. GAs are known to be successful in optimizing noisy and discontinuous problems; the problem of finding optimal team members for the ABM is difficult as any small modification made to a personality trait may result in significant changes to interactions in a team, and because the model is probabilistic, results may vary across runs. For a team of
0.2, 0.3, 0.5, 0.8, 0.4, 0.1, 0.4, 0.6, 0.4, 0.7, 0.8, 0.2, 0.4, 0.7, 0.6, 0.3, 0.3, 0.5, 0.3, 0.2
corresponds to the phenotype
Agent 1 = {N = 0.2, E = 0.3, O = 0.5, A = 0.8, C = 0.4}
Agent 2 = {N = 0.1, E = 0.4, O = 0.6, A = 0.4, C = 0.7}
Agent 3 = {N = 0.8, E = 0.2, O = 0.4, A = 0.7, C = 0.6}
Agent 4 = {N = 0.3, E = 0.3, O = 0.5, A = 0.3, C = 0.2}
To find best ABM teams, a standard canonical GA is used with a population size of
To find worst ABM teams, instead of taking
We ran the GA 60 times to find 60 best ABM teams for
We define the general population to be a typical group composed by randomly selecting individuals from normal human population, which removes extremes such as every individual in the model exhibiting a certain trait all of the time or none of the time. Settings to simulate the general population are as follows
The experiments were run on Amazon EC2 Compute Optimized c5.large instances (2 vCPU 4 GiB memory), with Ubuntu Server 16.04 LTS (HVM), SSD Volume Type (C5 instances are designed for running advanced compute-intensive workloads including scientific modelling, distributed analytics, and machine learning inference). The code was written in Python and available to be downloaded and run. 2 There was a total of 240 evolutionary runs, with a total computation time of 318 h. Each run took an average of 1.33 h (SD = 0.29). The fastest run took 49.83 min, and the slowest run took 1.74 h.
We compared the averages between best and worst ABM teams and used t-test to see if the differences between the means are significant.
Team performance data
We use a previously unseen real-world dataset of team performance to assess the validity of predictions made by the model. As with all such real-world data, the signal-to-noise ratio is low and the number of features is high, meaning that it is infeasible to apply statistical methods checking for correlations and dependencies without first having some idea of what to look for. The predictions of the ABM provide such cues.
We collected one of the largest team performance data to date. Participants comprised London Business School MBA (Masters of Business Administration) students across 10 academic years, from 2007 to 2016. A total of 3698 participants were involved in the study (male = 2623 [71%], female = 1075 [29%]), with an average age of 28.5 (SD = 2.5). Participants were assigned to 593 groups, and performance data consist of average group project scores for multiple courses. Group performance was assessed using group task scores for courses that exist over all 10 years, and assignments remained consistent. A summary of the courses and their group tasks can be found in Supplemental Materials Table 1.
The data were separated into tasks with and without uncertainty (i.e., problems with and without noise) using the criteria outlined by Laughlin and Ellis (Laughlin and Ellis, 1986) where tasks with uncertainty correspond to courses where there is faculty judgment in the assessment of correctness and those with multiple paradigms such as tasks involving the establishment of a new business strategy (i.e., Management, Strategy, Operations, Entrepreneurship, and Marketing) and tasks without uncertainty correspond to courses where there is a shared conceptual system for evaluation of quality (e.g., quantitative problem sets to work through), such as tasks involving mathematics and predefined rules (i.e., Accounting, Economics, and Finance).
Average z-scores were used to determine group performances. For each course, we calculated the z-score per stream per year (as each stream can have different instructors and marked out of different totals), so that they are comparable across years and streams. Each group completed all group projects in their original assigned groups. The group performance is the average z-scores for tasks with uncertainty (from the courses Strategy, Entrepreneurship, Managing Organizational Behavior, Operations Management, and Marketing) and tasks without uncertainty (from the courses Management Accounting, Managerial Economics, Financial Accounting, and Finance). We selected the top 60 teams (∼10% of total teams) over the 10 years as the best teams and bottom 60 teams (∼10% of total teams) over the 10 years as the worst teams. We do so for tasks with and without uncertainty.
Group personality is calculated as the average of individual personalities following (Woolley et al., 2010) and described as follows. Individual personality was assessed using the NEO-PI-R personality inventory (Costa and MacCrae, 1992) (one of the most widely used instruments for measuring Big-Five in psychology research and practice). This test measures the five primary domains of adult personality: Extraversion, Agreeableness, Conscientiousness, Openness to Experience, and Neuroticism. A total of 240 items were answered on a 5-point Likert-scale ranging from strongly disagree to strongly agree. The mean for each scale was calculated by summing up the six facets that compose each five factor for each individual. Scores on each domain were averaged for the group as a whole. 3
Tasks without uncertainty: Descriptive statistics and correlations among study variables.
Note. N = 593 teams; * significant at p < .05, two-tailed; ** significant at p < .01, two-tailed.
aIn some teams, a student did not do one of the courses, so the minimum group size, which is the average group size for a specific team, is not a whole number.
Tasks with uncertainty: Descriptive statistics and correlations among study variables.
Note. N = 593 teams; * significant at p < .05, two-tailed; ** significant at p < .01, two-tailed.
aIn some teams, a student did not do one of the courses, so the minimum group size, which is the average group size for a specific team, is not a whole number.
Results
A1. Evolutionary computation can explore the extremes of an agent-based model to result in valid predictions as confirmed by the literature
Figure 4 shows the average fitness of teams per generation. It shows that the GA was able to explore the extremes of the ABM successfully. The ABM is behaving as an optimizer (each team tries to find the optimal solution by moving and communicating according to their unique combination of traits); when the GA is used to find the ABM team with the best traits, only small improvements to the best team performance can be achieved, as performance was already good on average. When the GA is used to find the worst performing ABM teams (making the optimization worse), it fares much better, achieving large decreases in performance. The evolved traits for best and worst teams comprise our predictions. Average fitness of best and worst teams across 60 runs over generations. Plots start at first generation.
The scatterplots in Figure 5 show the relationship between team performance and the Big-Five traits of each individual team member. As expected, the ABM performs worse for noisy problems, resulting in a clear separation visible in performance on the y-axis. Also visible is the convergence (left or right on the x-axis) caused by the GA optimizing traits toward specific values. Scatter plots showing the relationship between team performance and the Big-Five traits of each team member for the 60 optimized teams. For clarity, we use different y-axis scales for Best Teams (top row) and Worst Teams (bottom row).
Figure 6 shows all traits together to enable visual comparison; and the t-test results comparing traits and noise levels can be found in Supplemental Materials Table 2. The trends are consistent with the literature. For example, for tasks with uncertainty, best teams have higher extraversion than worst teams because more interaction between team members can make team members more aware of different views, and best teams also have higher openness to experience than worst teams because the tendency to explore more different solutions will produce better results in uncertain environments. In tasks without uncertainty, best teams have lower extraversion than best teams in tasks with uncertainty because interaction is less important. Worst teams have very low conscientiousness because being consistent is a good way to perform well when the task has no uncertainty. Average team Big-Five generated by model. Grey regions: mean ± SD. (t-test results show p = .000 comparing all traits and noise levels, see Supplemental Materials Table 2).
In psychology literature, extraversion, openness, and conscientiousness consistently positively predict team performance (Barrick et al., 1998; Neuman and Wright, 1999) (a finding mirrored in the ABM). Neuroticism consistently negatively predicts team performance (Barrick et al., 1998; Neuman and Wright, 1999), also mirrored in the model. Thus, the predictions made by the ABM are validated by the literature.
A2. Evolutionary computation can explore the extremes of an agent-based model to result in novel predictions that can be corroborated by real-world data and used to help explain real phenomena
While many of the predictions made by the model are well-known in psychology, one result is novel: the combined effects of noise and agreeableness. Intriguingly, findings from psychology literature for agreeableness have been inconsistent, with field studies generally predicting positive relationships (Barrick et al., 1998; Bell, 2007; Neuman et al., 1999; Neuman and Wright, 1999; Peeters et al., 2006) (except for two field studies, one involving profitability of pizza franchisees (Hofmann and Jones, 2005) and another involving drilling crews (Van Vianen and De Dreu, 2001)), and lab studies generally producing no relationships (Barry and Stewart, 1997; Bell, 2007; Day and Carroll, 2004; Mohammed and Angell, 2003; Peeters et al., 2006) (except for a lab study where student groups worked on research projects (Van Vianen and De Dreu, 2001), and a written assignment for a business project (Bradley et al., 2013)). These inconsistencies imply that there are moderators affecting agreeableness and team performance, but it was unclear what they were.
Our result demonstrates for the first time that task uncertainty is a primary contingency in the relationship between agreeableness and team performance.
In more detail, the model predicted that agreeableness positively predicts group performance for tasks with uncertainty (Figures 5 and 6). Specifically, the model predicted (i) significant higher agreeableness for best teams than worst teams for tasks with uncertainty [
In order to establish whether this prediction has merit, the previously unseen team performance dataset was analyzed in detail. The results of this analysis confirm the same effects as predicted by the model. For tasks with uncertainty, best teams have higher agreeableness than worst teams [
Results of OLS regression analysis.
Note. N = 593 teams. Table entries represent unstandardized parameter estimates with standard errors in parentheses.
* significant at p < .05, two-tailed; ** significant at p < .01, two-tailed.
Results of binary logistic regression analysis.
aBinary variable of team performance where worst performing team is coded as 0 and best performing team is coded as 1, adj. R2 = .176.
bBinary variable of team performance where worst performing team is coded as 0 and best performing team is coded as 1, adj. R2 = .192.
The predictions made by the model assumed a general population, but the dataset comprised a very specific sample of individuals—London Business School MBA students—who have a range of traits that differ in important ways. For example, the individuals in the dataset were on average more conscientious and showed less variability in their other traits than the general population. In order to check if the model still produced the same predictions when calibrated to match the sample data, the system was run again, with the following settings
There were a total of 240 simulations, with a total computation time of 584 h. Each simulation took an average of 2.43 h (SD = 0.34). The fastest simulation took 1.9 h, and the slowest simulation took 2.97 h (almost double the time required previously as it was necessary to increase
The predictions made by this more constrained model were consistent with the original model. Similar differences in Figure 7 can be observed, caused by the smaller ranges of traits in the sample population. The prediction for agreeableness remained the same (see Figure 8). Average team Big-Five generated by model for sample population. Grey regions: mean ± SD (t-test results show p = .000 comparing all traits and noise levels, except for conscientiousness in worst teams, see Supplemental Materials Table 4). Average group agreeableness for model general population, model sample population, and actual groups, best and worst teams, tasks and without uncertainty. Error bars: mean ± SE.

These results provide substantial and novel evidence that task uncertainty moderates when agreeableness is predictive of team performance, demonstrating the context dependence of the effects of agreeableness. The results also provide an explanation for the inconsistency and non-replicability of past research findings—reconciling the findings that most studies that reported positive effects of agreeableness were conducted in the noisy environment of the field, while most studies that reported no correlations were conducted in lab environments without uncertainty. Even the anomalies to those general rules fit. For example, the field sites where agreeableness did not predict performance were those with fairly straightforward answers where a correct answer is relatively easy to verify once it is put forward (e.g., the profitability of pizza franchisees (Hofmann and Jones, 2005)), and laboratory studies where agreeableness did predict performance were ones using judgment-based tasks with a strong degree of uncertainty in measuring performance (e.g., student groups work on research projects (Van Vianen and De Dreu, 2001)). Finally, our findings from this study are consistent with Deming (2017), which suggest that the labor market increasingly rewards social skills.
Analysis
While the results of the model show new findings that are corroborated by analysis of real-world data, one key advantage of using ABMs is the fact that their behavior can be analyzed in more detail in order to understand why the results are as reported. In this section, we examine the effects of the GA on the ABM and how the different components of the ABM then cause the results.
Figure 9 shows the change of average Big-Five traits in each team as evolution progresses, clearly showing convergence toward different values for each trait for each experiment. For example, for best performing teams tackling noisy problems, neuroticism is generally lower with the other traits higher. But this is not the same for problems with no noise. The worst teams tackling problems with less noise generally have low openness and low conscientiousness and higher values for the other traits. Looking at the more constrained model, the sample population (Figure 10) displays the same trend as the general population (Figure 9) except for conscientiousness, which is relatively high regardless of best or worst teams, with or without uncertainty. This is because the individuals modelled on the dataset had a higher-than-average level of conscientiousness (replicating the fact that the MBA students in the dataset were more conscientious than the average population). Average team Big-Five traits over generations (model general population). Plots start at first generation. Average team Big-Five traits over generations (model sample population). Plots start at first generation.

Figure 11 shows representative runs of each agent’s path as it navigates the solution space to find the optimal solution in its team of four. Noise affects the path of agents, making them slower to converge. Best teams reliably converge on the optimal solution (0,0). Worst teams converge on an incorrect solution, or struggle to converge at all in the allocated time. Representative runs showing each agent’s path as they navigate the solution space to find the optimal solution. Black circle indicates their position at t = 0, and black dot indicates their position at t = 100. The maximum is located at the center of each image.
In the example runs, it is possible to see the effects of each personality: the best teams have generally lower average N values (neuroticism) than the worst teams. Higher average N values correspond to impulsiveness and volatility, simulated by accelerating at a random speed toward a random direction—shown in the chaotic and jagged paths for the worst compared to the smoother, cleaner paths for the best teams. As can be seen in Figure 9, the need for lower average N values was reduced with higher noise (since neuroticism adds a type of noise, for noisy problems you effectively need less neuroticism to achieve the same behavior).
The best teams tend to have higher average O values (openness) than the worst teams, corresponding to greater intellectual curiosity, simulated by having a larger area of exploration. This makes such agents more likely to find better solutions and less likely to be trapped in local optima (which might arise from the added noise), as can be seen from the better convergence to the optima for the best teams and the very poor convergence to a non-optimal for the worst teams with noise (see Figure 11).
Again, in general the better teams have higher average C values (conscientiousness) corresponding to lower spontaneity, more reliability, and less procrastination, which is simulated by less random disruption to the acceleration vector and less likely to stop moving. In Figure 11, the paths of the agents illustrate the difference, with only the worst teams showing the random, staccato movement toward random directions that low values for this trait cause.
In the model, extraversion (sociability and assertiveness) is simulated by agents either being influenced more by their neighbors’ positions (especially the extroverted ones) for high average C values, or by being influenced more by their own experience for low average C values. Our results (Figures 11 and 9) show that the best teams tend to have higher average extraversion for noisy problems, which is likely to be because of the improved sharing of information, which helps such teams discover better solutions. Conversely, the worst teams tend to have lower average extraversion for noisy problems for the same reason—poor communication helps make them even worse. Interestingly, the best team had low extraversion for teams with no noise—good communication is a distraction when the path to the solution is already clear. Likewise, the worst teams had high extraversion for teams with no noise. Here, good communication helps spread misinformation caused by chaotic behavior better, and makes them worse overall.
Finally, looking further into the novel contribution to the field of psychology from this work, agreeableness (social harmony and willing to compromise) is simulated by having its direction strongly influenced by its neighbors heading (more for more assertive extroverted agents) for higher average A values (agreeableness). The best teams have low average A values for problems without noise, and high average A values for problems with noise. In this problem, the strong influence of neighbors is only helpful when your neighbors have found the solution, so for zero noise it is simply a distraction which slows convergence, but for noisy problems it becomes helpful to encourage all agents to converge to the solution. Conversely, the worst teams have high average A values for problems without noise and low for problems with noise for the same reason—ignoring your neighbors when tackling a noisy problem is a good way to converge on a bad solution, as can be seen in Figure 11.
Figures 9 and 10 show that there is more complexity here still to be unraveled as there are specific values for each trait that the GA converges to—at no point does it find that a trait should be eliminated or made 100% dominant in a team. The complex interplay of the five personality traits must be carefully balanced in order to achieve a team capable of producing a desirable result. From the perspective of psychology, this points at the complexity of putting teams together with individuals having appropriate complementary traits. From the perspective of Evolutionary Computation, it shows the benefits of using a GA to discover such nuances—hand-designing an ABM would never reveal these findings.
Limitations and future work
Although our dataset is one of the largest teamwork datasets, extensive in terms of duration (10 years) and number of teams (N = 593), it is still one dataset and from one organization. As such, there must be some caution in generalizing the results. Future work could help to establish the collection of other major teamwork datasets over many years to enable this and other models to be evaluated with multiple datasets.
Our work has shown the effects of personality on teamwork through modelling and analysis of real-world data. Personality traits may be affected by context or environment, limiting the applicability of our findings. Nevertheless, research suggests that there is a strong genetic influence on all traits (for Neuroticism, Extraversion, Openness, Agreeableness, and Conscientiousness, broad genetic influence was estimated as 41%, 53%, 61%, 41%, and 44%, respectively), with most variance in traits caused by differing environmental influences (Jang et al., 1996); here, we assume shared environments as our team members work closely together to achieve their shared goal.
We use a GA to optimize the ABM because of its ability to handle noisy and discontinuous problems. Despite the suitability of GAs for this task, other optimizers could also be used; however, it should be noted that the purpose of this work is not to find the optimal optimizer for this ABM, it is to use the GA as a method of investigating the ABM.
In this paper, a simple objective function is used because we wanted to limit the number of variables being tested and make the investigation tractable, that is, we study problem noise rather than problem difficulty in this work. There is much scope to investigate different types of group tasks and map them to different objective functions. This model is just a promising beginning: there is also an opportunity to use it to answer a variety of questions that are relevant, for example, fix some agents’ personalities and find optimal compositions for the rest (i.e., supplementing an existing team with new team members for specific problems).
The model is specific for interaction of personalities for teamwork. Future work could look at advancing agent-based modelling so that the model can become more generic and applicable into other types of interaction without needing to rebuild the model. Also, there are many aspects that can affect teamwork, in addition to personalities of individual members. How best to add new elements, while still verifying the original components remain valid? The broader picture for the work is to build a simulated laboratory where we can answer questions and test hypothesis related to human behavior.
Conclusion
Our study demonstrates the usefulness of combining evolutionary computation with ABMs for predicting the effects of personality on teamwork. By using a GA to examine the extremes of the ABM, we were able to discover the dependencies of this complex system. For noisy problems, extraversion, openness, and conscientiousness consistently positively predict team performance, while neuroticism consistently negatively predicts performance. The most surprising and novel results are related to the role of agreeableness. For tasks with noise, the best teams have higher agreeableness than worst teams, and worst teams in non-noisy tasks have lower agreeableness than worst teams in noisy tasks. These results were corroborated using a dataset that was not used for model design or validation, comprising data from one of the largest datasets on team performance published to date, comprising 3698 individuals in 593 teams collected over a 10-year period on more than 5000 group tasks with and without uncertainty. For the first time, task uncertainty (modelled as noise in this work) has been identified as a moderator of the dependency between team performance and agreeableness. This is significant because uncertainty is pervasive around the world. Technology change appears exponential, job security is low, political systems are unstable, and environmental change is accelerating. Our work implies that agreeableness may be important to facilitate teamwork and organizational performance in this new world.
Finally, this work has made use of evolutionary computation to explore the limits of the ABM for the purposes of scientific investigation. The approach provides a new methodology for the scientific investigation of teamwork, making new predictions, improving our understanding of human behaviors, and even improving team performance for organizations.
Supplemental Material
Supplemental Material - Kill chaos with kindness: Agreeableness improves team performance under uncertainty
Supplemental Material for Kill chaos with kindness: Agreeableness improves team performance under uncertainty by Soo Ling Lim, Randall S Peterson, Peter J Bentley, Xiaoran Hu, and Joellyn Prouty Mclaren in Collective Intelligence.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Data Availability Statement
Supplemental Material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
