Abstract
In recent years, personality detection — the use of computational methods to automatically determine an individual’s personality from various data sources — has seen widespread adoption across a variety of fields. This paper argues that despite their widespread use, conventional personality detection methods are limited in their ability to grasp human personality. Specifically, three limitations of conventional personality detection methods are discussed: (1) their limited ability to grasp the complexity of human personality due to their reliance on predetermined categories, known as pre-structured methods; (2) their inability to grasp the impact of social and cultural context on human personality, and (3) their disregard of the performative nature of human personality in online environments. Drawing on insights from anthropology and social psychology, three solutions to these limitations are proposed: (1) embracing naturalistic inquiry to capture the complexity of human personality, (2) considering the contextual influences on personality expression through multimodal methods and ethnographic research; and (3) accounting for the systematic biases present in personality in online environments in how we present our results and draw conclusions. Integrating these solutions would allow researchers to develop a more comprehensive and accurate understanding of human personality in a wide variety of fields.
Keywords
Introduction
The study of personality has recently received much scholarly attention in the growing field of personality detection, which aims to computationally determine individuals’ personality traits from a variety of sources (for an overview, see Fang et al., 2022; Phan & Rauthmann, 2021; Štajner & Yenikent, 2020) — as shown in Figure 1. In particular, the rapid developments in the field of text mining and natural language processing that have allowed technologies like ChatGPT to enter the mainstream have also led to a growing body of literature on text-based personality detection, a subset of personality detection that uses text data, such as social media posts and profiles (Aung & Myint, 2019; Howlader et al., 2018; Ong et al., 2017), essays (Kazameini et al., 2020; Mohammad & Kiritchenko, 2021), and interview transcripts (Bounab et al., 2024). As personality detection methods improve, they hold increasing potential in supplementing — or even supplanting — the use of traditional questionnaires. It should be emphasized, however, that since personality detection methods are evaluated based on personality trait self-reports, the validity of these methods is limited by that of self-reports, and biases present in self-reports may therefore also be present in personality detection methods. Google Scholar results for the phrase ‘“personality detection” OR “Personality computing”’, by year (2014–2025). The figure shows an increasing trend, from 44 results in 2014 to 554 results in 2025
Personality detection has found applications in a wide variety of fields, including marketing and product recommendations (Chen et al., 2017; Jaimes Moreno et al., 2019; Roshchina et al., 2011; Tkalcic & Chen, 2015) and job candidate screening (Liem et al., 2018), with potential applications in many other fields, including health care counseling and forensics (Mehta et al., 2020) and voice assistants and AI agents (Huang et al., 2026; Kazameini et al., 2020). The variety and significance of these applications raises the question: to what extent are commonly used personality detection methods able to truly capture human personality? What are their limitations, and how can we address these? In this paper, then, three limitations of conventional personality detection approaches are discussed: (1) their limited ability to grasp the complexity of human personality due to their reliance on predetermined categories, known as pre-structured methods; (2) their inability to grasp the impact of social and cultural context on human personality; and (3) their disregard of the performative nature of human personality in online environments. Drawing on anthropological understandings of ‘identity’ and ‘performance’ as well as on social-psychological theory on personality, this paper addresses these limitations by arguing for (1) capturing the complexity of human experience by embracing naturalistic inquiry — the study of social life as it presents itself to the members of a society under ordinary, everyday circumstances; (2) considering the contextual influences on personality expression through multimodal methods and the integration of ethnographic research alongside prevailing qualitative and quantitative methods, and (3) accounting for the systematic biases present in personality in online environments in how we present our results and draw conclusions.
Although the notion of ‘identity’ is distinct from that of ‘personality’, understanding how anthropologists have conceptualized and studied identity can be of value to the development of personality detection methods, since scholars in both anthropology and personality detection are interested in self-concept and self-expression. In particular, scholars of personality in social psychology have long recognized the role of external factors in structuring a person’s personality, emphasizing the role played by cultural context (Adamopoulos & Kashima, 1999), interpersonal relationships (Veroff, 1983), and different (and potentially conflicting) identities (McAdams et al., 2021). Similarly, scholars of identity in cultural anthropology have developed a long tradition of understanding persons’ identities holistically, considering the cultural context in which individuals live (Finke & Sökefeld, 2018) and their social relations (Ahmed, 2000) while also acknowledging the reality of contradictory identities (Sökefeld, 1999). Anthropologists study identity primarily through ethnographic methods (from ethnos [people] and graphei [to write]), a research methodology that makes use of qualitative (and, occasionally, quantitative) methods and is characterized by a prolonged engagement with a community or a field site, the aim to grasp lived social worlds, and a commitment to a cross-cultural, ‘emic’ perspective. As a result, ethnography enables one to get a deep understanding of another’s perspective, identify ‘unknown unknowns’ and cultural blind spots, and “‘get inside’ the way each group of people sees the world” (Hammersley, 1985, p. 152). Cultural practices, in perspective, produce identity rather than solely shaping it (Hall, 2007). Finally, it is important to note that ‘personality’ does not equate ‘personality traits’ or ‘the Big Five’ as they are frequently conflated (Rauthmann, 2024). In this paper, we focus on the trait understanding of personality that is at the foundation of personality detection methods. While the Big Five is by far the most widely used trait model, the arguments in this paper are independent of any given trait model and apply to trait-based personality detection in general.
This paper is structured as follows. First, the recent emergence of personality detection and the trait-psychological models that form the foundation of the discipline are discussed. Secondly, the range of currently employed personality detection methods are discussed. Subsequently, the way insights from anthropological approaches to identification can help us to understand personality naturalistically, contextually, and performatively, and how to address these challenges in the context of personality detection are discussed.
Finally, a note on terminology. In this paper, the term ‘personality detection’ is used rather than terms such as ‘personality computing’, since the focus is on the determining of personality traits from various data sources, and not, for instance, on the generation of appropriate text output for a chatbot given particular personality traits (e.g., Qian et al., 2017). In line with previous research, the ‘OCEAN’ or ‘five-factor’ model of personality is referred to as the ‘Big 5’.
Human Personality and Trait Psychology
The study of human personality has a long history and is characterized by a wide variety of theoretical perspectives. One approach that has seen especially widespread application in both academic and non-academic fields is trait theory, which aims to capture one’s personality in terms of a limited number of personality traits. These traits are understood to differ across individuals, to be relatively stable over time, and to influence behavior (Kalimeri et al., 2013). They are generally considered to exist as an entity that, while often measured through self-report, can be externally verified (McAdams et al., 2021). A variety of empirical models of personality traits exists; most popular are the five-factor model of personality, known also as the ‘Big 5’ or ‘OCEAN’ model (openness to experience, conscientiousness, extraversion, agreeableness, neuroticism), and the Myers-Briggs Type Indicator (MBTI). Although the latter has been criticized for lacking empirical and scientific rigor (Nowack, 1996; Pittenger, 1993), in one review of the literature, 14 out of 60 personality detection papers were found to use MBTI (Fang et al., 2022), suggesting that it remains popular despite its shortcomings.
In using the Big Five (and, to a lesser degree, MBTI), scholars of personality psychology differ on various fundamental questions, such as the importance of biological versus environmental factors (Specht et al., 2014) and the relative stability of personality traits (Asendorpf & Aken, 2003; Caspi et al., 2005; Roberts & DelVecchio, 2000; Roberts et al., 2006). In spite of this theoretical fragmentation, trait theory has seen widespread application in fields beyond personality psychology, and in both academic and nonacademic settings (Lloyd, 2012; Moyle & Hackston, 2018). In academic settings, trait-psychological models are used primarily for correlational research that investigates, e.g., the relationship between personality traits and, for instance, academic achievement (e.g., Komarraju et al., 2011; Wang et al., 2023) or mental health (Bucher et al., 2019). In nonacademic settings, on the other hand, trait theory is used primarily for recruitment and employee assessment purposes (Christiansen & Tett, 2013), though it has also found use in fields such as marketing and product recommendations (e.g. Chen et al., 2017; Jaimes Moreno et al., 2019; Roshchina et al., 2011; Tkalcic & Chen, 2015).
Traditionally, the assessment of an individual’s personality traits has been accomplished using standardized questionnaires, such as the NEO–PI–3 (McCrae et al., 2005) and the Ten-Item Personality Inventory (Gosling et al., 2003). Over the years, these have become so commonplace that they have entered the collective consciousness as ‘personality tests’. What all of these tests have in common is that they are time-consuming (and thus expensive) to administer (Yang et al., 2021), hindering their applicability. Additionally, traditional personality assessments are often subject to social desirability bias, where respondents portray themselves in a socially favorable light (Bäckström & Björklund, 2013; Sandal et al., 2005; although see Pelt, 2019 for a critical perspective). With “[t]he social sciences [having] entered the age of data science” (Schwartz, 2013, p. 1), however, has come the development of personality detection, also known as ‘personality computing’. Harnessing the potential of computational natural language processing, personality detection methods allow researchers to automatically and computationally extract psychological traits from pre-existing (and often publicly accessible) data, such as social media posts and profiles (Aung & Myint, 2019; Howlader et al., 2018; Ong et al., 2017), essays (Kazameini et al., 2020; Mohammad & Kiritchenko, 2021), and interview transcripts (Bounab et al., 2024). With personality detection methods improving over time (e.g., Ren et al., 2021; cf. Feizi-Derakhshi et al., 2022), this has, to some extent, obviated the need for personality questionnaires.
The Landscape of Personality Detection Methods
The field of personality detection is characterized by a wide variety of methods, the development of which remains a source of scholarly attention (for a recent overview, see Perera & Costa, 2023). Personality detection originates from ‘affective language processing’, a subfield of computational linguistics that focuses on the computational analysis of subjective features of text. Early work in personality detection focused on classifying author personalities from creative texts, such as blog posts (Oberlander & Nowson, 2006) and essays (Mairesse et al., 2007): what is arguably the first work in the field of personality detection managed to correlate linguistic style with author personality in diary entries, writing assignments, and journal abstracts (Pennebaker & King, 1999). Subsequently, scholars developed a variety of supervised machine-learning methods, many of which are still in use in one form or another (Fang et al., 2022). Supervised machine learning algorithms (or ‘supervised learning’) constitute a subset of machine learning that involves the ‘training’ of computer algorithms on human-annotated data (‘labeled data’). During the training process, the algorithm learns to recognize particular (textual) patterns that map onto particular personality traits. As an example, an algorithm may be trained on social media posts that are labeled as particularly exemplifying extraversion or neuroticism. The aim of training is for the algorithm to detect these patterns on social media posts it has thus far not seen. Their major downside, however, is their dependence on labeled training data, which can be time-consuming and expensive to obtain. Additionally, these labels can originate from multiple sources (self-diagnosis, self-measurement, instrument-based, and annotated by others), each with distinct methodological implications and strengths and weaknesses.
In contrast to supervised methods, Celli and Poesio (2014) have pioneered the use of unsupervised methods in personality detection. These aim to find naturally existing clusters or patterns in data and thus have the advantage that they do not require data that has been labeled by humans ahead of time, eliminating manual annotation effort. Concretely, the model incorporates features such as use of punctuation and numbers to detect patterns that correspond to personality traits. Downsides of unsupervised approaches, however, include the fact that they may be more difficult to interpret and evaluate, as well as their sensitivity to noise (Watson, 2023) and the fact that they require more data.
A further distinction can be made with respect to the modality of data used. Rather than focusing on a single form (or ‘mode’) of data, such as text, multimodal approaches integrate several forms of data, such as audio and video (Kindiroglu et al., 2017; Pianesi et al., 2008; Sidorov et al., 2014), audio, video, and text (Alam & Riccardi, 2014; Güçlütürk et al., 2016; Milić & Mladen, 2023), or a variety of smartphone data, such as anonymized call and SMS logs and Bluetooth and app usage (Chittaranjan et al., 2011). The major advantage of multimodal approaches is that they achieve higher accuracy than methods based solely on text.
It is clear, then, that over the past two decades, personality detection has been a source of major scholarly attention and technological development. The following section will explore the limitations of personality detection, and how insights from anthropology and social psychology could further enhance the field’s ability to capture the complexity human personality.
Three Solutions for Advancing Personality Detection
As outlined previously, personality detection methods based on trait-psychological models — and particularly the Big 5 — have achieved a significant degree of success in determining personality from written texts, such as social media posts, essays, and interview transcripts. Traditional personality questionnaires, while often brief and inexpensive (e.g., TIPI; Gosling et al. (2003)), are less reliable than longer instruments and must be administered individually. By contrast, computational personality detection methods can be applied at scale 1 .
The utility of personality detection methods has led to their rapid adoption, urging us to critically consider their conceptual foundations and methods. Specifically, this section draws on anthropological inquiry into ‘identity’ and scholarship on ‘personality’ from the field of social psychology to discuss three limitations of the widely used personality detection methods outlined in the previous section: (1) their limited ability to grasp the complexity of human personality due to their reliance on predetermined categories, known as pre-structured methods; (2) their inability to grasp the impact of social and cultural context on human personality, and (3) their disregard of the performative nature of human personality in online environments.
Although the notion of ‘identity’ is distinct from that of ‘personality’, scholars in personality psychology and social psychology have recognized the interconnectedness between the two. Understood as a process of social interaction, rather than as something one ‘possesses’ (Buckingham, 2008), the notion of ‘identity’ in personality psychology “[shifts] attention to the outside social and cultural world” (McAdams et al., 2021, p. 3), and allows us to more fully contextualize the person and their personality (McAdams et al., 2021). Social psychologists such as Mark Snyder (Deaux & Snyder, 2018; Snyder, 2006) have previously argued for a closer integration of the disciplines of social psychology and personality psychology, claiming that in order to understand persons, we need to understand them as social beings, as people’s personalities are shaped by the social situations they find themselves in. Additionally, the link between personality traits, self-perception, and impressions conveyed to others has been investigated by McAbee and Connelly (2016). Their Trait-Reputation-Identity model suggests that personality is not only a matter of stable traits but also of how individuals perceive themselves and how they are perceived by others. This further highlights the potential value of combining the study of identity and personality traits to capture the complexities of human behavior.
Similarly, anthropologists have long pointed to the importance of social context in shaping human identity, emphasizing the ways in which identity is relational, contextual, and performative. For instance, van Meijl (2008) emphasizes how “in increasingly multicultural contexts identity obtains its meaning primarily from the identity of the other with whom self [sic] is contrasted” (p. 173). Similarly, in an attempt to disentangle the multiple meanings of identity, Brubaker and Cooper (2000) divided identity into a number of specific terms, including ‘identification’, ‘self-understanding’ and ‘groupness’. These contributions from both anthropology and psychology highlight the interconnectedness of personal identity, personality, and social context, and the importance of understanding individuals as social beings. Understanding personality may thus benefit from a broader approach that includes not only trait measurement but also a more nuanced appreciation of the distinct ways individuals behave socially.
Incorporating these perspectives into personality detection research may provide novel directions for future research and provide a more comprehensive and nuanced understanding of people’s personalities. This raises the question: what insights might a reorientation towards identity offer for personality detection methods? In what follows, three potential directions for personality detection are discussed that may be addressed by incorporating these perspectives: personality detection without pre-structuring, contextual personality detection, and performance-sensitive personality detection. While the first draws primarily on anthropological research, the second and third draw on both anthropological and social-psychological scholarship (Figure 2). Proposed personality detection solutions
Aim to Understand Personality Naturalistically
A first important facet of the anthropological study of identity, and of anthropological methods in general, is its commitment to what Joost Beuving and Geert de Vries call ‘naturalistic inquiry’: “[the] study [of] social life as it presents itself to the members of a society under ordinary, everyday circumstances” (2015, p. 37). In contrast to positivist research designs, naturalistic designs aim to be unobtrusive and reactive, and are, in a sense, ‘researcher-led’: the researcher observes social phenomena as they present themselves, “[responding] to whatever pieces of information the research situation presents to her” (Beuving & Vries, 2015, p. 38), rather than imposing predetermined observational categories. This allows researchers to observe social phenomena in their natural context, providing more insight into the complex nature of human social behavior. Positivist research designs, on the other hand, typically employ a high degree of what Verschuren (2001) calls ‘pre-structuring’: the systematic recording of observations into predetermined categories by using, e.g., closed questions with pre-coded answers and observational scoring categories prior to doing the observing.
Both of these approaches have their advantages. The aim of positivist, quantitative research is to generalize, and in order to generalize, one needs a basis for systematic comparison, making pre-structuring necessary. As such, it is impossible for those employing naturalistic inquiry to make generalizable claims. On the other hand, researchers employing naturalistic inquiry are more able to capture the complexity of social life. In the context of personality, this translates to a diminished ability for pre-structured research to capture personality in a manner that corresponds to an individual’s lived experience, as an individual’s personality is reduced to a pre-defined conceptual model, such as the Big 5. The advantage of such an approach is that it is able to systematically compare differences between individuals in a reliable and generalizable manner. The downside of such an approach is that relying on these pre-structured conceptual models may fail to capture richer insights, since pre-structured methods are not able to capture unforeseen phenomena. Such an approach may thus overlook the nuances and intricacies of individual personalities, as they are constrained within predefined conceptual models, highlighting the trade-off between systematic comparability and capturing the complexity of human experiences. Consciously weighing these approaches is crucial for advancing the field of personality detection and deepening our understanding of human behavior.
What distinguishes personality detection from traditional personality assessment methods is its reliance on naturalistic data, rather than on pre-structured data. Unlike questionnaire items with predetermined options, social media posts are spontaneously generated by individuals without researcher intervention. Current approaches to personality detection aim to reduce these naturalistic data to pre-structured trait scores (such as the Big 5), trading complexity and nuance for systematic comparability, by, for instance, predicting Big 5 personality traits from Facebook statuses (Liu et al., 2016). This approach is effective when the goal is to assess a specific trait model like the Big Five, but the danger is one of over-reliance where other conceptualizations of personality might be more revealing. Personality detection that prioritizes understanding personality without pre-structuring would be able to capture more of the dynamic and multifaceted nature of human experience, thus bridging the gap between systematic comparability and the fluid reality of human identity.
It is important to note that the more commonly used Big Five is a lexical model, meaning its taxonomy was developed from the words people use to describe others in a given language. This lexical foundation partly explains why text-based methods for personality detection are particularly well-suited to the Big Five framework: the taxonomy itself was extracted from language, and the words individuals use carry the very signals that define its dimensions (Goldberg, 1981). However, this also highlights a limitation: when applying these methods to other languages or cultural contexts, personality structures may differ, complicating the universality of Big Five–based approaches. Our critique, therefore, is not of text-based detection per se, but of over-reliance on pre-structured trait models where other conceptualizations of personality might be more appropriate or revealing.
How Can We Achieve a More Naturalistic Approach to Personality Detection?
One approach would be the use of machine-learning methods not aimed at reducing naturalistic data to a predetermined trait model. Instead, these methods could identify patterns within the data itself. For instance, large language models (LLMs) allow researchers to understand personality more naturalistically by showing how and why certain personality traits emerge from text data. Recent studies confirm the feasibility of using LLMs for personality detection (Ji et al., 2023; Murphy, 2024), and as Hu et al. (2024) show, LLMs are capable of generating label descriptions that provide a rationale for their classification decisions, bridging the gap between naturalistic inquiry and systematic comparison. Unlike traditional methods that solely classify, LLMs can thus analyze a piece of text (e.g., a social media post) and not only infer a personality trait but also demonstrate which words, tones, or themes led to this interpretation. For example, an LLM might suggest that a text indicates extraversion due to frequent use of social and positive language and provide specific representations (Hu et al., 2024), thus offering a reasoned output that goes beyond raw trait classification. LLMs thus provide a balanced approach that blends the organic understanding of personality with the rigor of systematic comparison. It is important to note, however, that LLM’s are no ‘fire-and-forget’ magic bullet, as they are subject to a variety of biases in their training, development, and evaluation (Ranjan et al., 2024). For instance, LLMs often exhibit a disproportionate focus on Western cultural narratives, which could lead to biased interpretations of personality traits (for example, classifying direct self-presentation as extraversion while overlooking cultural contexts where modesty or indirectness is the norm) (Ranjan et al., 2024). As is the case with any research method, is it important for personality researchers using LLMs to be aware of their potential biases, mitigating them where this is possible and reporting on them where it is not.
Aim to Understand Personality Contextually
A second distinguishing element of anthropological approaches to understanding identity, and a second potential avenue for personality detection research lies in anthropologists’ understanding of identity as contextual. Anthropologists generally prefer the term identification over identity, highlighting how identities are positional, fragmented, not disparate, and always ‘in progress’ (du Gay, 1996), and how identity does not signify some eternally stable self (Brubaker & Cooper, 2000). Here, identification is a process that always happens in relation to other social beings, and is therefore heavily context-dependent. Personality detection using trait-psychological models of personality starts from the inverse: it emphasizes stability, treating traits as enduring tendencies across time and contexts (e.g. Bleidorn et al., 2021; Cobb-Clark & Schurer, 2012; Stein et al., 1986). Much like anthropologists, however, scholars in personality and social psychology have long emphasized the context-dependent nature of personality, showing the ways personality changes after unemployment (Boyce et al., 2015), between different historical, cultural, developmental, organizational, and interpersonal contexts (Veroff, 1983), and across different everyday situations (Fleeson, 2001). The importance of social relationships in contextualizing personality has also been emphasized (Back et al., 2023). Indeed, the recognition of the significance of context dates back as early as 1936 with Kurt Lewin’s proposition that behavior is a function of both the individual and their environment (Lewin, 1936). This may also hold online, as the digital contexts of different online communities determine which behaviors are considered ‘deviant’ (Fichman & Sanfilippo, 2015). In trait-theoretical terms, human personality can be conceptualized as a distribution around a mean trait value, rather than as a set of fixed points (Fleeson & Noftle, 2008), with the distribution of personality traits varying across different social contexts.
In fact, the very notion of ‘personality’ as understood in trait-psychological models may be culturally contingent. As De Raad (1998) shows, the translation of trait terms such as ‘agreeableness’ to other languages may not be at all straightforward, and while the Big 5 has proven to be useful in WEIRD (Western, educated, industrialized, rich, and democratic) populations, its validity outside this context is by no means certain (Gurven et al., 2013; Laajaj et al., 2019). This is partly due to its lexical origins in English, which may not translate to the personality structures of other cultures. This implies that the cross-cultural application of trait-psychological frameworks is not unproblematic. For text-based approaches to personality detection, this problem is compounded by the issue of ‘low-resource’ languages — those for which little training data is available. With less training data available, the quality of machine-learning models suffers, diminishing these models’ ability to accurately and comprehensively capture human personality. An approach to personality detection that takes into account the contextual and temporal dynamics that shape human personality may enhance our ability to adapt personality assessments to diverse contexts, thus providing more nuanced insights into the dynamics of human personality and fostering a more culturally inclusive approach to personality detection.
How Can We Begin to Understand Context in Personality Detection?
This, of course, depends very much on what we mean by the word ‘context’. If we aim to capture the dynamics present in small-scale behavioral and social contexts, it may be fruitful to use multimodal methods, which, as illustrated earlier, consider various forms of data. For instance, in one study, Pianesi et al. (2008) incorporated acoustic features into their model, by which they are able to capture the various verbal characteristics of different person-to-person interactions. Additionally, Kalimeri et al. (2013) used sociometric badges to capture body movements, speech features, interpersonal proximity line of sight, and face-to-face interactions, allowing them to capture what they term ‘multimodal social context’, and Stachl et al. (2020) showed that a wide range of smartphone sensor and usage data can be used to detect personality traits. Finally, Fang et al. (2022) argue for the inclusion of demographic information, such as age, education, and gender, as additional features in text-based personality detection models.
These works show us that it is feasible to measure, to a degree, social context: that the sights, sounds, movements, and interactions that make up our everyday lives can be captured and analyzed as modalities of data. These contextual modalities allow us to capture social context and are thus crucial for understanding how personality is shaped by the social contexts in which individuals find themselves on a daily basis. The incorporation of these socio-contextual factors into a text-based personality model would allow us to develop a greater understanding of social contextual factors, and to — drawing on Fleeson and Noftle (2008) — broaden our understanding of human personality as distributions around a stable mean, towards understanding how particular social contexts determine this distribution of personality traits. Like any methodological choice, this entails trade-offs. Not every personality detection model needs to incorporate sensor or acoustic data; methodological design always depends on the research question. Nevertheless, ignoring context altogether is itself a methodological decision, and one that risks overstating universality. When context cannot, or need not, be measured, researchers should be explicit about that limitation rather than leaving context unacknowledged.
Just as personality is shaped by immediate social situations, it is also shaped by broader cultural contexts. As noted above, the Big Five has largely been validated in WEIRD populations, and we should be cautious about assuming its cross-cultural universality. Psychology has long recognized this limitation (Arnett, 2008), and although progress is being made toward greater inclusivity, our models of personality remain disproportionately shaped by WEIRD perspectives. Thalmayer and colleagues’ cross-cultural work (Thalmayer et al., 2020; Thalmayer & Saucier, 2014), for instance, demonstrates that expanding beyond the Big Five to the Big Six yields more robust results across nations. Yet their approach remains lexical, relying on dictionary-based descriptors of traits. Ethnographic methods can provide a complementary contribution: rather than starting from existing lexical categories, they allow us to ask more fundamental questions about what “personality” means in a given cultural context. This generates new conceptualizations of personality that may not be visible through lexical approaches alone, and which can later inform the construction of new trait models. In this sense, ethnography functions not as a replacement for psychometric approaches, but as a model-generating method that broadens the theoretical foundations of personality science.
Beyond multimodal approaches that capture social context, the integration of ethnographic methods into the development of personality models may provide a way to understand personality traits within broader sociocultural contexts. Ethnographic methods, which “[seek] to holistically understand and express the lived experiences of actors in a sociocultural context” (Paff, 2022, p. 8), excel at capturing the nuances and complexities of human minds. Rather than simply adding more sensor data, ethnographic methods provide a way to understand the social and cultural contexts in which personality is situated, thus providing a fundamentally different means of contextualizing personality and personality detection. While ethnographic and computational methods may seem to be at odds with one another, their incorporation is not entirely new, and, as Nelson (2021) has pointed out, both methods share a similar inductive logic of data gathering, data analysis, and theory development. Integrating ethnographic theory development with personality research would provide a way to understand personality traits within broader sociocultural contexts and thus enrich our understanding of how personality manifests in different settings.
It is important to note at this juncture that the use of qualitative research methods in personality psychology is not without precedent. Scholars in personality psychology have made extensive use of methods such as narrative identity assessment (Raggatt, 2006), case studies (Runyan, 1997), interviews (Querstret & Robinson, 2013; Weidmann et al., 2024) and diary methods (Nezlek, 2012). These methods are used to contextualize and explain quantitative research findings (Seaver, 2015), and their value as such has been firmly established. The incorporation of ethnographic research into the model development step, however, serves a different purpose: it enables a holistic understanding of how the very notion of personality manifests differently in different sociocultural contexts. Ethnography is well positioned to achieve this aim because it goes beyond the use of qualitative methods alone in its commitment to cross-cultural, ‘emic’ perspectives that empower the researcher to identify cultural blind spots. Rather than merely serving to interpret model outputs, ethnographic research informs the conceptualization of personality itself and makes personality detection models more cross-culturally and contextually sensitive.
Aim to Account for the Performativity of Online Personality
A third and final insight from the anthropological study of identity concerns the relationship between a person’s personality and the data that are collected about the person. Social scientists have long emphasized the crucial role of ‘performativity’ in social life. Introduced by gender scholar Judith Butler (1990), performativity refers to the ways in which both verbal and nonverbal communication serve to define one’s identity, and has its roots in the works of literary theorist Kenneth Burke (1945) and, especially, of sociologist Erving Goffman (1959). Goffman referred to ‘frontstage’ and ‘backstage’ behavior to analyze how behavioral norms are internalized and ‘performed’ when others are watching, implying that our behaviors are not a direct reflection of our internal workings, but rather, that our behaviors are mediated by what we presume others’ expectations to be (Goffman, 1959).
The notion of performativity is especially relevant in the context of personality detection methods that derive personality traits from the text individuals post on social media. Psychologists have frequently highlighted the nature of social media feeds as a ‘highlight reel’ of private life, referring to the fact that individuals tend to showcase only or primarily the positive aspects of their personal lives (Faelens et al., 2021; Steers et al., 2014). It is important to take this into account when interpreting social media data, which cannot be unproblematically understood as a neutral reflection of one’s internal personality.
A further challenge with regard to the interpretation of social media data is that researchers frequently suggest a direct relationship between the text one produces and the underlying personality of the individual where there may not be one. For instance, Štajner and Yenikent (2020) claim that Facebook “[likely] contains more personal statements and is thus more suited for automatic text-based personality detection.” However, the concept of a ‘personal statement’ is open to interpretation. Similarly, Howlader et al. (2018) claim that “social media personalities of users mirror their true personalities”. (p. 340), and Ong et al. (2017) aim to “[extract] the personality trait of the user” (p. 65) from social media data.
It remains an open question to what extent such a direct relationship between text and individual personality can be reliably inferred, and while social media data can offer valuable insights into individuals’ personalities, it is wise to take caution in drawing conclusions on users’ personalities from these data. Although an older study claims that Facebook profiles reflect the actual personality of their users (Back et al., 2010), more recent research suggests that personality traits differ significantly between on-and offline contexts, as well as from social medium to social medium. Taber and Whittaker (2018) find individuals’ personality traits on Facebook to be less neurotic, open to experience, and agreeable than offline personality, while personality traits on Snapchat are more extraverted as compared to offline personality. Additionally, ‘Finsta’ accounts — secondary Instagram accounts with fewer followers where users post more candid or unfiltered content — were more socially undesirable, less conscientious, and less agreeable, possibly due to differing audience perceptions (Taber & Whittaker, 2020). Research has also suggested that these differences are gendered: women’s perceived higher agreeableness and extraversion is more pronounced on social media than offline, while women’s perceived higher neuroticism than men is less strong on social media than offline (Bunker et al., 2021).
How Can We Account for the Performativity of Personality in Online Environments?
While we are starting to get an idea of the systematic differences in on- and offline personality, much remains unclear. More research is needed to determine the precise nature of these systematic biases, thus allowing us to develop methods that correct for them, enabling more accurate measurement of personality from social media data. What is clear, however, is that the performative nature of personality expression online is an essential aspect to consider when interpreting social media data, and it is thus not unproblematic to interpret social media data as a direct reflection of one’s internal personality. By recognizing the performative nature of online interactions, researchers can better understand how users strategically express their identities and how these expressions may differ from their offline personalities. Taken together, these findings suggest that we would do well to recognize the performative nature of online interactions and the curated nature of social media profiles. Rather than viewing social media data as a direct reflection of one’s internal personality, we should approach it as a way to “analyze how users performatively and strategically express their identities” (Xi et al., 2022, p. 1437). A more holistic, anthropological perspective on the matter can show us that conceptualizing online representations of personality as merely a matter of measurement error or ‘noise’ fails to account for the co-constitutive relationship between inner and outer worlds, and that the notion of a ‘true’ personality is more complex than that. Going back to Goffman (1959) and Butler (1990), ‘distortion’ of true personality is a simplification that fails to take into account how performance (‘front stage’ behavior, in Goffman’s terms) shapes and is shaped by identity and personality.
Discussion
Rethinking Personality Detection
Personality detection methods have recently come to see widespread use in a variety of fields, and their advancements have begun to challenge traditional approaches by offering scalable and automated alternatives for personality assessment. This paper has outlined three limitations of conventional personality detection methods in grasping human personality: their limited ability to grasp the complexity of human personality due to their reliance on pre-structured methods; their inability to grasp the impact of social and cultural context on human personality, and their disregard of the performative nature of human personality in online environments. Drawing on insights from anthropology and social psychology, three solutions to these limitations are proposed. Each of these also raises particular methodological challenges that provide fertile ground for future research. While addressing these challenges will be no easy feat, personality detection methods are well-suited to address these issues, as they are not limited to a single modality of data and can thus take into account naturalistic, multimodal, and ethnographic methods.
Naturalistic Inquiry
Firstly, while positivist research designs aim for systematic comparison and generalization through pre-structuring, they may overlook the complexity of individual personalities. Embracing naturalistic inquiry and developing models that go beyond pre-structuring would allow researchers to better capture the complexity of human personality. We suggest using large language models for this purpose, as they can ‘bridge the gap’ by suggesting trait scores while also demonstrating how this determination was made. This approach bridges the gap between pre-structured and naturalistic inquiry, allowing for both systematic comparison and a richer understanding of personality. However, like any research approach, this method comes with important considerations. Large Language Models (LLMs) are subject to various biases in their training, development, and evaluation (Ranjan et al., 2024), and researchers should be aware of these potential biases, mitigating them where this is possible and reporting on them where it is not. To give a concrete example using LLMs, a researcher could use them to process personal diaries in order to reveal how personality is expressed in use of optimistic language, references to group activities, or expressions of caution or wariness, offering interpretive insights beyond fixed categories. Moreover, one could feed audio interviews to an LLM to analyze how tone, pace, and emphasis correlate with personality traits. This provides insights grounded in naturalistic data while still allowing the model to connect them to trait dimensions. How best to interpret such unstructured data and ensure the reliability of model outputs remains an important methodological question, but recent work is beginning to make progress in this area (e.g. Zhang et al., 2024).
Social and Cultural Context
Secondly, researchers should consider the impact of social and cultural context in personality expression, as has been emphasized by scholars in personality psychology. This can be achieved by incorporating multimodal and ethnographic methods that take into account different kinds of contextual clues. For instance, a researcher could collect video recordings of team meetings to analyze how speech patterns, gestures, and interpersonal proximity correlate with trait expressions such as extraversion or conscientiousness, or wearable sensors or smartphone data could be used to track social interactions, physical activity, or patterns of movement across daily contexts, revealing how personality varies in different social settings. Ethnographic methods can complement these approaches by providing a deeper understanding of cultural and social meaning. For example, a researcher might conduct participant observation in a workplace or community setting, taking detailed field notes on social norms, communication styles, and local definitions of personality-related behaviors. These observations can then inform how multimodal data are interpreted, highlighting which behaviors reflect enduring personality traits versus context-specific social performance. Another scenario involves cross-cultural research: when aiming to apply a personality detection model in a new cultural context, a researcher could conduct ethnographic interviews to identify locally relevant personality descriptors before training or adapting models. This could prevent misclassification arising from cultural differences in trait expression or language. Choosing among these methods requires careful consideration. Multimodal data collection can be resource-intensive and raises privacy concerns; ethnographic methods are time-consuming but provide detailed contextual understanding. Even if not all contextual modalities can be captured, explicitly acknowledging which aspects of context are included strengthens the study’s interpretative validity.
Though qualitative methods have seen use in personality psychology, they should, as noted above, not be conflated with ethnography. The use of qualitative methods in personality psychology has primarily served to provide contextual insight to quantitative findings, while the benefit of incorporating ethnographic methods at the inception of the research process would be the generation of conceptually stronger models. As such, ethnographic methods can inform the development of personality detection models that are more culturally and socially sensitive, helping to prevent the uncritical transfer of models from one cultural or social setting to another.
Performativity in Online Environments
Thirdly, acknowledging and addressing the systematic biases inherent in personality expression in online environments is crucial for accurately presenting results and drawing conclusions. Research has consistently shown differences in personality expression between online and offline contexts, with platforms like social media often amplifying extraverted traits. By accounting for the ‘highlight reel’ effect and considering how it influences personality expression, personality detection methods can better capture the true essence of an individual’s personality across various contexts, ensuring more robust and reliable results.
In practice, several strategies can be employed to account for performativity in online personality detection. For instance, one might consider multiple platforms, as personality expression can vary widely between platforms, and comparing behavior across different social media sites, or even between primary and secondary accounts, can help distinguish consistent traits from context-driven performance. Similarly, incorporating offline or external reference points, such as self-reports, peer ratings, or behavioral observations, could provide a benchmark against which to evaluate online behavior. Even limited qualitative insight, such as interviews or field observations, could show how users perceive their own online behavior and provide context that purely computational methods cannot. The key takeaway is that online behavior should not be treated as a straightforward reflection of personality. Approaching social media data with an awareness of its performative nature would allow researchers to interpret these data more accurately and produce more nuanced and credible assessments of personality.
Conclusion
Taken together, these approaches offer potential, but it is important to emphasize that these solutions are not without their challenges. Embracing naturalistic inquiry may lead to difficulties in interpreting unstructured data, while incorporating multimodal and ethnographic methods may require researchers to navigate a wide range of methodological choices. Additionally, it is crucial to recognize that our approach focuses on capturing the fluidity and contextuality of personality. This means that the approach presented here may not lend itself to the same level of systematic comparability that structured assessments can provide. In such cases, utilizing structured methods is not a limitation but rather a necessary aspect of the assessment process. For these purposes, traditional assessments are more fitting. We acknowledge that pre-structured models like the lexically-derived Big Five are highly effective for text-based detection when the goal is systematic comparability, but we argue that they may not be sufficient for capturing the full complexity of human personality in all contexts. As such, researchers should carefully consider the specific research questions and contexts they are working within when choosing their methods.
This paper has presented several insights that can help us to develop a more comprehensive, nuanced and inclusive understanding for personality detection. Specifically, this paper has shown how naturalistic inquiry can capture more of the dynamic and multifaceted nature of human personality; how multimodal methods excel at providing a more contextual understanding of personality; how ethnographic methods enable researchers to develop more cross-culturally sensitive models; and how performativity is co-constitutive of personality. These insights can help us develop more comprehensive, nuanced, and inclusive understandings of human personality for personality detection. Although addressing the limitations of current personality detection methods is a challenging task, it is essential for developing a more comprehensive understanding of human personality, which is especially crucial considering the growth of social media and the increasing accessibility of machine learning technology. It is imperative that we continue to refine our methods and deepen our understanding of human personality, ultimately enriching not only the field of personality detection but our general comprehension of human nature as well.
Footnotes
Author Note
The handling editor for this paper was Dr. Atsushi Oshio.
Acknowledgements
The authors of this paper made use of AI for grammar and spelling correction. We thank the peer reviewers for their valuable and constructive feedback throughout the process.
Author Contributions
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the Faculty of Social and Behavioural Sciences, Utrecht University.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Accessibility Statement
Not applicable.
