Sage Journals: Discover world-class research

Abstract

Identities are fundamental to our understanding of social and political behavior, but are challenging to measure and are rarely observed in real-world settings. We introduce a method for measuring the identity-relevant aspects of brief self-descriptions regularly used online (e.g., on social media). Our approach combines the benefits of word embeddings for finding related identity terms with the ability of clustering algorithms to aggregate terms into discrete categories. To illustrate our approach, we apply it to daily observations of bios from millions of US Twitter/X users. We present three applications of our approach with substantive findings. First, we track users’ social and political identities over time and find, among other things, that direct expressions of political affiliations are rare. Second, we map the identities that are most characteristic of each US state. Third, we show that users’ political identities are highly predictable based on non-political identity markers. With the growing availability of user self-descriptions on social media platforms and elsewhere, our approach enables researchers to map and analyze expressions of identity at scale.

Keywords

identity text-as-data word embeddings unsupervised learning social media

Get full access to this article

View all access options for this article.

References

Agadjanian

. 2022. “How Many Americans Change Their Racial Identification Over Time?” Socius 8, 1–3. doi:10.1177/23780231221098547.

American Press Institute. 2015. “How People use Twitter in General.” Retrieved May 2022 (https://www.americanpressinstitute.org/publications/reports/survey-research/how-people-use-twitter-in-general/).

Angelov

2020. “Top2vec: Distributed Representations of Topics.” arXiv preprint arXiv:2008.09470.

Ashmore

Deaux

McLaughlin-Volpe

. 2004. “An Organizing Framework for Collective Identity: Articulation and Significance of Multidimensionality.” Psychological Bulletin 130, 80–114.

Baden

Pipal

van der Velden

Schoonvelde

M. A. G.

. 2022. “Three Gaps in Computational Text Analysis Methods for Social Sciences: A Research Agenda.” Communication Methods and Measures 16(1): 1–18.

Barberá

Casas

Nagler

Egan

P. J.

Bonneau

Jost

J. T.

Tucker

J. A.

. 2019. “Who Leads? Who Follows? Measuring Issue Attention and Agenda Setting by Legislators and the Mass Public Using Social Media Data.” American Political Science Review 113(4): 883–901.

Blei

D. M.

A. Y.

Jordan

M. I.

. 2003. “Latent Dirichlet Allocation.” Journal of Machine Learning Research 3(1): 993–1022.

Boesinger

Horta Ribeiro

Veselovsky

West

. 2024. “Tube2vec: Social and Semantic Embeddings of YouTube Channels.” Proceedings of the International AAAI Conference on Web and Social Media 18(1): 2084–2090. doi:10.1609/icwsm.v18i1.31450.

Bojanowski

Grave

Joulin

Mikolov

. 2017. “Enriching Word Vectors With Subword Information.” Transactions of the Association for Computational Linguistics 5, 135–146.

10.

Brown

G. K.

Langer

. 2010. “Conceptualizing and Measuring Ethnicity.” Oxford Development Studies 38(4): 411–436.

11.

Brubaker

Cooper

. 2000. “Beyond ‘identity.”’ Theory and Society 29(1): 1–47.

12.

Bullock

J. G.

Gerber

A. S.

Hill

S. J.

Huber

G. A.

. 2013. “Partisan Bias in Factual Beliefs About Politics.” Technical Report, National Bureau of Economic Research.

13.

Cappos

Peddinti

S. T.

Ross

K. W.

. 2017. “User Anonymity on Twitter.” Retrieved July 2023 (https://www.infoq.com/articles/user-anonymity-twitter).

14.

Carney

D. R.

Jost

J. T.

Gosling

S. D.

Potter

. 2008. “The Secret Lives of Liberals and Conservatives: Personality Profiles, Interaction Styles, and the Things they Leave Behind.” Political Psychology 29(6): 807–840.

15.

Carter

2012. “Republicans Like Golf, Democrats Prefer Cartoons, TV Research Suggests.” Media Decoder Blog. New York Times (https://archive.nytimes.com/mediadecoder.blogs.nytimes.com/2012/10/11/republicans-like-golf-democrats-prefer-cartoons-tv-research-suggests/).

16.

Choi

D. D.

Harries

J. A.

Shen-Bayh

. 2022. “Ethnic Bias in Judicial Decision Making: Evidence From Criminal Appeals in Kenya.” American Political Science Review 116(3): 1–14.

17.

Cohen

1996. “Law, Social Policy, and Violence: The Impact of Regional Cultures.” Journal of Personality and Social Psychology 70(5): 961.

18.

Conover

P. J.

1984. “The Influence of Group Identifications on Political Perception and Evaluation.” The Journal of Politics 46(3): 760–785.

19.

Converse

P. E.

1964. “The Nature of Belief Systems in Mass Publics.” Pp. 206–261 in Ideology and Discontent, edited by D. E. Apter. New York: Free Press.

20.

Corso

Pierri

Morales

G. D. F.

. 2024. “What We Can Learn From Tiktok Through its Research API.”

21.

Dahl

R. A.

1961. Who Governs?: Democracy and Power in an American City. New Haven, CT: Yale University Press.

22.

DellaPosta

Shi

Macy

. 2015. “Why do Liberals Drink Lattes?” American Journal of Sociology 120(5): 1473–1511.

23.

Devlin

Chang

M. W.

Lee

Toutanova

. 2018. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” arXiv preprint arXiv:1810.04805.

24.

Diamond

E. P.

2020. “The Influence of Identity Salience on Framing Effectiveness: An Experiment.” Political Psychology 41(6): 1133–1150.

25.

Dredze

Paul

M. J.

Bergsma

Tran

. 2013. “Carmen: A Twitter Geolocation System with Applications to Public Health.” Pp. 695–698 in Workshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence. New York, NY: ACM.

26.

Eady

Hjorth

Dinesen

P. T.

. 2022. “Do Violent Protests Affect Expressions of Party Identity? Evidence from the Capitol Insurrection.” Working Paper.

27.

Egan

P. J.

2020. “Identity as Dependent Variable: How Americans Shift Their Identities to Align With Their Politics.” American Journal of Political Science 64(3): 699–716.

28.

Elazar

D. J

. 1984. American Federalism: A View From the States. 3rd Edition. New York City, NY: Harper & Row.

29.

Elder

Greene

. 2012. The Politics of Parenthood: Causes and Consequences of the Politicization and Polarization of the American Family. Albany, NY: SUNY Press.

30.

Essig

DellaPosta

. 2024. “Partisan Styles of Self-presentation in US Twitter Bios.” Scientific Reports 14(1): 1077.

31.

Ethier

K. A.

Deaux

. 1994. “Negotiating Social Identity when Contexts Change: Maintaining Identification and Responding to Threat.” Journal of Personality and Social Psychology 67(2): 243.

32.

Freelon

2018. “Computational Research in the Post-API Age.” Political Communication 35(4): 665–668.

33.

Fukuyama

. 2018. Identity: Contemporary Identity Politics and the Struggle for Recognition. London: Profile Books.

34.

Gallup. 2022. “U.S. Political Ideology Steady; Conservatives, Moderates tie.” Retrieved June 2022 (https://news.gallup.com/poll/388988/political-ideology-steady-conservatives-moderates-tie.aspx).

35.

Gallup. 2023. “Party Affiliation.” Retrieved July 2023 (https://news.gallup.com/poll/15370/party-affiliation.aspx).

36.

Gallup, Inc. 2017. “2017 U.S. Party Affiliation by State.” Gallup News (https://news.gallup.com/poll/226643/2017-party-affiliation-state.aspx).

37.

Gebru

Krause

Wang

Chen

Deng

Aiden

E. L.

Fei-Fei

. 2017. “Using Deep Learning and Google Street View to Estimate the Demographic Makeup of Neighborhoods Across the United States.” Proceedings of the National Academy of Sciences 114(50): 13108–13113.

38.

Gennaro

Ash

. 2022. “Emotion and Reason in Political Language.” The Economic Journal 132(643): 1037–1059.

39.

Gift

. 2015. “Does Politics Influence Hiring? Evidence From a Randomized Experiment.” Political Behavior 37, 653–675.

40.

Green

D. P.

Palmquist

Schickler

. 2004. Partisan Hearts and Minds: Political Parties and the Social Identities of Voters. New Haven, CT: Yale University Press.

41.

Green

M. C.

Visser

P. S.

Tetlock

P. E.

. 2000. “Coping with Accountability Cross-Pressures: Low-effort Evasive Tactics and High-Effort Quests for Complex Compromises.” Personality and Social Psychology Bulletin 26(11): 1380–1391.

42.

Grootendorst

2022. “Bertopic: Neural Topic Modeling With a Class-Based TF-IDF Procedure.” arXiv preprint arXiv:2203.05794.

43.

Güss

C. D.

Boyd

Perniciaro

Free

D. C.

Free

Tuason

M. T.

. 2023. “The Politics of COVID-19: Differences Between US Red and Blue States in COVID-19 Regulations and Deaths.” Health Policy OPEN 5, 100107.

44.

Halpern

L. W.

2020. “The Politicization of COVID-19.” AJN The American Journal of Nursing 120(11): 19–20.

45.

Hare

Jones

. 2023. “Slava Ukraini: Exploring Identity Activism in Support of Ukraine Via the Ukraine Flag Emoji on Twitter.” Journal of Quantitative Description: Digital Media 3, 1–26. doi:10.51685/jqd.2023.005.

46.

Hopkins

D. J.

Lelkes

Wolken

. 2024. “The Rise of and Demand for Identity-Oriented Media Coverage.” American Journal of Political Science 62(2): 483–500.

47.

Huddy

Mason

Aarøe

. 2015. “Expressive Partisanship: Campaign Involvement, Political Emotion, and Partisan Identity.” American Political Science Review 109(1): 1–17.

48.

Iyengar

Krupenkin

. 2018. “Partisanship as Social Identity; Implications for the Study of Party Polarization.” Pp. 23–45 in The Forum, Vol. 16. De Gruyter.

49.

Jones

J. J.

2023. “A Dataset for the Study of Identity at Scale: Annual Prevalence of American Twitter Users with Specified Token in Their Profile Bio 2015–2020.” PLoS One 16(11): e0260185. doi: 10.1371/journal.pone.0260185.

50.

Joulin

Grave

Bojanowski

Mikolov

. 2016. “Bag of Tricks for Efficient Text Classification.” arXiv preprint arXiv:1607.01759.

51.

Keith

B. E.

Magleby

D. B.

Nelson

C. J.

Orr

E. A.

Westlye

M. C.

. 1992. The Myth of the Independent Voter. Oakland, CA: University of California Press.

52.

Kinder

D. R.

Adams

G. S.

Gronke

P. W.

. 1989. “Economics and Politics in the 1984 American Presidential Election.” American Journal of Political Science 33(2): 491–515.

53.

Klar

2013. “The Influence of Competing Identity Primes on Political Preferences.” The Journal of Politics 75(4): 1108–1124.

54.

Kuhn

M. H.

McPartland

T. S.

. 1954. “An Empirical Investigation of Self-attitudes.” American Sociological Review 19(1): 68–76.

55.

Leach

C. W.

Van Zomeren

Zebel

Vliek

M. L.

Pennekamp

S. F.

Doosje

Ouwerkerk

J. W.

Spears

2008. “Group-Level Self-Definition and Self-Investment: A Hierarchical (Multicomponent) Model of In-Group Identification.” Journal of Personality and Social Psychology 95(1): 144.

56.

Lee

2009. “Between Social Theory and Social Science Practice: Toward a New Approach to the Survey Measurement of ‘Race.”’ Pp. 113–144 in Measuring Identity: A Guide for Social Scientists edited by R. Abdelal, Y. M. Herrera, A. I. Johnston and R. McDermott. Cambridge, UK: Cambridge University Press.

57.

Longinos

Wilson

Magdy

. 2020. “Emoji and Self-Identity in Twitter Bios.” Pp. 199–211 in Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science. Kerrville, TX: Association for Computational Linguistics.

58.

Magleby

D. B.

Nelson

C. J.

Westlye

M. C.

. 2011. “The Myth of the Independent Voter Revisited.” Pp. 238–263 in Facing the Challenge of Democracy: Explorations in the Analysis of Public Opinion and Political Participation, edited by P. M. Sniderman and B. Highton. Princeton: Princeton University Press.

59.

Margolis

M. F.

2018. “How Politics Affects Religion: Partisanship, Socialization, and Religiosity in America.” The Journal of Politics 80(1): 30–43.

60.

Mikolov

Sutskever

Chen

Corrado

G. S.

Dean

. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” Advances in Neural Information Processing Systems 26, 1–9.

61.

Mosleh

Martel

Eckles

Rand

D. G.

. 2021. “Shared Partisanship Dramatically Increases Social Tie Formation in a Twitter Field Experiment.” Proceedings of the National Academy of Sciences 118(7): 1–3. doi:10.1073/pnas.2022761118.

62.

Mounk

. 2023. The Identity Trap. A Story of Ideas and Power in Our Time. New York, NY: Penguin Press.

63.

Mukerjee

Jaidka

Lelkes

. 2022. “The Political Landscape of the US Twitterverse.” Political Communication 39(5): 565–588.

64.

Mutz

D. C.

2002. “Cross-cutting Social Networks: Testing Democratic Theory in Practice.” American Political Science Review 96(1): 111–126.

65.

Osmundsen

Bor

Vahlstrup

P. B.

Bechmann

Petersen

M. B.

. 2021. “Partisan Polarization is the Primary Psychological Motivation Behind Political Fake News Sharing on Twitter.” American Political Science Review 115(3): 999–1015.

66.

Osnabrügge

Hobolt

S. B.

Rodon

. 2021. “Playing to the Gallery: Emotive Rhetoric in Parliaments.” American Political Science Review 115(3): 885–899.

67.

Pennington

Socher

Manning

C. D.

. 2014. “Glove: Global Vectors for Word Representation.” Pp. 1532–1543 in Empirical Methods in Natural Language Processing (EMNLP) (http://www.aclweb.org/anthology/D14-1162).

68.

Pew Research Center. 2019. “Sizing up Twitter Users.” Retrieved June 2022 (https://www.pewresearch.org/internet/2019/04/24/sizing-up-twitter-users/).

69.

Pew Research Center. 2021. “70% of U.S. Social Media Users Never or Rarely Post or Share About Political, Social Issues.” Retrieved June 2023 (https://www.pewresearch.org/short-reads/2021/05/04/70-of-u-s-social-media-users-never-or-rarely-post-or-share-about-political-social-issues/).

70.

Pew Research Center. 2022. “Jobs, Hobbies Top the List of Things U.S. Adults Put in their Twitter Profiles; References to Politics Relatively Rare.” Retrieved July 2023 (https://www.pewresearch.org/short-reads/2022/05/05/jobs-hobbies-top-the-list-of-things-u-s-adults-put-in-their-twitter-profiles-references-to-politics-relatively-rare/).

71.

Pew Research Center. 2024. “Partisanship by Gender, Sexual Orientation, Marital and Parental Status.” Retrieved August 2025 (https://www.pewresearch.org/politics/2024/04/09/partisanship-by-gender-sexual-orientation-marital-and-parental-status/).

72.

Price

Cappella

J. N.

Nir

. 2002. “Does Disagreement Contribute to More Deliberative Opinion?” Political Communication 19(1): 95–112.

73.

Prior

Sood

Khanna

. 2015. “You Cannot Be Serious: The Impact of Accuracy Incentives on Partisan Bias in Reports of Economic Perceptions.” Quarterly Journal of Political Science 10(4): 489–518.

74.

Quelle

Bovet

. 2025. “Bluesky: Network Topology, Polarization, and Algorithmic Curation.” PLoS One 20(2): e0318034. doi:10.1371/journal.pone.0318034.

75.

Rathje

Van Bavel

J. J.

Van Der Linden

. 2021. “Out-group Animosity Drives Engagement on Social Media.” Proceedings of the National Academy of Sciences 118(26): e2024292118.

76.

Rehurek

Sojka

. 2011. “Gensim–Python Framework for Vector Space Modelling.” NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic 3(2).

77.

Reid

Deaux

. 1996. “Relationship Between Social and Personal Identities: Segregation Or Integration.” Journal of Personality and Social Psychology 71(6): 1084.

78.

Reuters Institute. 2023. “Here’s What our Research Says About News Audiences on Twitter, the Platform now Known as X.” Retrieved October 2023 (https://reutersinstitute.politics.ox.ac.uk/news/heres-what-our-research-says-about-news-audiences-twitter-platform-now-known-x).

79.

Rheault

Cochrane

. 2020. “Word Embeddings for the Analysis of Ideological Placement in Parliamentary Corpora.” Political Analysis 28(1): 112–133.

80.

Rice

Rhodes

J. H.

Nteta

. 2019. “Racial Bias in Legal Language.” Research & Politics 6(2): 2053168019848930.

81.

Rodman

2020. “A Timely Intervention: Tracking the Changing Meanings of Political Concepts with Word Vectors.” Political Analysis 28(1): 87–111.

82.

Rodriguez

P. L.

Spirling

. 2022. “Word Embeddings: What Works, what Doesn’t, and How to Tell the Difference for Applied Research.” The Journal of Politics 84(1): 101–115.

83.

Rogers

Jones

J. J.

. 2021. “Using Twitter Bios to Measure Changes in Self-identity: Are Americans Defining Themselves More Politically Over Time?” Journal of Social Computing 2(1): 1–13.

84.

Sedikides

Gaertner

O’Mara

E. M.

. 2011. “Individual Self, Relational Self, Collective Self: Hierarchical Ordering of the Tripartite Self.” Psychological Studies 56(1): 98–107.

85.

Sylwester

Purver

. 2015. “Twitter Language Use Reflects Psychological Differences Between Democrats and Republicans.” PLoS One 10(9): e0137422.

86.

Tajfel

Turner

J. C.

. 1979. “An Integrative theory of Intergroup Conflict.” Pp. 33–47 in The Social Psychology of Intergroup Relations, edited by W. G. Austin and S. Worchel S. Monterey, CA: Brooks/Cole.

87.

Tausanovitch

Warshaw

. 2014. “Representation in Municipal Government.” American Political Science Review 108(3): 605–641.

88.

Terrell

Kofink

Middleton

Rainear

Murphy-Hill

Parnin

Stallings

. 2017. “Gender Differences and Bias in Open Source: Pull Request Acceptance of Women Versus Men.” PeerJ Computer Science 3, e111. doi:10.7717/peerj-cs.111.

89.

Tromble

2021. “Where Have all the Data Gone? A Critical Reflection on Academic Digital Research in the Post-API Age.” Social Media + Society 7(1): 2056305121988929.

90.

Tucker

Jones

. 2023. “Pronoun Lists in Profile Bios Display Increased Prevalence, Systematic Co-presence with Other Keywords and Network Tie Clustering Among US Twitter Users 2015–2022.” Journal of Quantitative Description: Digital Media 3, 1–35. doi:10.51685/jqd.2023.002.

91.

US Census Bureau. 2022. “Educational Attainment. U.S. Census Bureau.” Retrieved September 18, 2022 (https://data.census.gov/table?q=S1501:+Educational+Attainment&g=010XX00US<brace>0400000_040XX00US01).

92.

Wilson

Johnson

Rebala

. 2014. “Are you a J. Crew Democrat or a Pizza Hut Republican?” (http://time.com/3559482/stores-politics/).

93.

Yan

. 2024. “Censoring the Intellectual Public Space in China: What Topics are not Allowed and Who Gets Blacklisted?” Perspectives on Politics 22(3): 753–770.

Measuring Social and Political Identities in Social Media Self-Descriptions

Abstract

Keywords

Get full access to this article

References