Abstract
Identities are fundamental to our understanding of social and political behavior, but are challenging to measure and are rarely observed in real-world settings. We introduce a method for measuring the identity-relevant aspects of brief self-descriptions regularly used online (e.g., on social media). Our approach combines the benefits of word embeddings for finding related identity terms with the ability of clustering algorithms to aggregate terms into discrete categories. To illustrate our approach, we apply it to daily observations of bios from millions of US Twitter/X users. We present three applications of our approach with substantive findings. First, we track users’ social and political identities over time and find, among other things, that direct expressions of political affiliations are rare. Second, we map the identities that are most characteristic of each US state. Third, we show that users’ political identities are highly predictable based on non-political identity markers. With the growing availability of user self-descriptions on social media platforms and elsewhere, our approach enables researchers to map and analyze expressions of identity at scale.
Get full access to this article
View all access options for this article.
