A Simulation-Based Slope Metric for Anchor List Reliability in Word Embedding Spaces

Abstract

Inducing semantic relations in word vector spaces and analyzing how other words or entire documents discursively engage these relations is a popular form of cultural analysis. The authors propose a reliability metric that is easily interpretable and agnostic to the type of relation. The metric, which the authors call the anchor reliability coefficient (relco), is found by creating an artificial document-term matrix of simulated documents that sequentially shift more of their tokens from relation-relevant anchor terms to nonanchor terms and then regressing the documents’ similarity to an induced relation on the anchor inclusion score of the documents. The authors validate the metric at the word level with both expert- and crowdsourced dictionaries and at the document level with expert-annotated social media posts. The authors also provide some heuristic baselines for assessing reliability effect sizes and null hypothesis testing.

Keywords

word embeddings reliability simulation semantic relations

Get full access to this article

View all access options for this article.

References

Airoldi

Edoardo M.

Bischof

Jonathan M.

2016. “Improving and Evaluating Topic Models and Other Models of Text.”Journal of the American Statistical Association 111(516):1381–403.

Antoniak

Maria

Mimno

David

. 2021. “Bad Seeds: Evaluating Lexical Methods for Bias Measurement.” Pp. 1889–1904 in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, edited by Zong

Xia

Navigli

Kerrville, TX: Association for Computational Linguistics.

Arseniev-Koehler

Alina

. 2022. “Theoretical Foundations and Limits of Word Embeddings: What Types of Meaning Can They Capture?” Sociological Methods & Research 53(4):1753–93.

Arseniev-Koehler

Alina

Best

Rachel Kahn

. 2025. “Disease Frames and Their Consequences for Stigma and Medical Research Funds.”Social Science & Medicine 372:117949.

Arseniev-Koehler

Alina

Foster

Jacob G.

2022. “Machine Learning as a Model for Cultural Learning: Teaching an Algorithm What It Means to Be Fat.”Sociological Methods & Research 51(4):1484–1539.

Best

Rachel Kahn

Arseniev-Koehler

Alina

. 2023. “The Stigma of Diseases: Unequal Burden, Uneven Decline.”American Sociological Review 88(5):938–69.

Bolukbasi

Tolga

Chang

Kai-Wei

Zou

James Y.

Saligrama

Venkatesh

Kalai

Adam T.

2016. “Man Is to Computer Programmer as Woman Is to Homemaker? Debiasing Word Embeddings.” Pp. 4349–47 in Advances in Neural Information Processing System, Vol. 29, edited by Lee

D. D.

Sugiyama

Luxburg

U. V.

Guyon

Garnett

New York: Curran Associates.

Boutyline

Andrei

Arseniev-Koehler

Alina

. 2025. “Meaning in Hyperspace: Word Embeddings as Tools for Cultural Measurement.”Annual Review of Sociology 51:89–107.

Boutyline

Andrei

Johnston

Ethan E.

2025. “Forging Better Axes: Evaluating and Improving the Reliability of Semantic Dimensions in Word Embeddings.” SocArXiv. Retrieved April 28, 2026. https://osf.io/preprints/socarxiv/576h3.

10.

Breiger

Ronald L.

1974. “The Duality of Persons and Groups.”Social Forces 53(2):181–90.

11.

Carmines

Edward G.

Zeller

Richard A.

1979. Reliability and Validity Assessment. Beverly Hills, CA: Sage.

12.

Daenekindt

Stijn

Schaap

Julian

. 2022. “Using Word Embedding Models to Capture Changing Media Discourses: A Study on the Role of Legitimacy, Gender and Genre in 24,000 Music Reviews, 1999–2021.”Journal of Computational Social Science 5(2):1615–36.

13.

Durrheim

Kevin

Schuld

Maria

Mafunda

Martin

Mazibuko

Sindisiwe

. 2022. “Using Word Embeddings to Investigate Cultural Biases.”British Journal of Social Psychology 62(1):617–29.

14.

Ethayarajh

Kawin

Duvenaud

David

Hirst

Graeme

. 2019a. “Towards Understanding Linear Word Analogies.” Pp. 3253–62 in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, edited by Korhonen

Traum

Màrquez

Kerrville, TX: Association for Computational Linguistics.

15.

Ethayarajh

Kawin

Duvenaud

David

Hirst

Graeme

. 2019b. “Understanding Undesirable Word Embedding Associations.” Pp. 1696–1705 in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, edited by Korhonen

Traum

Màrquez

Kerrville, TX: Association for Computational Linguistics.

16.

Graham

Jesse

Haidt

Jonathan

Nosek

Brian A.

2009. “Liberals and Conservatives Rely on Different Sets of Moral Foundations.”Journal of Personality and Social Psychology 96(5):1029–46.

17.

Haidt

Jonathan

Graham

Jesse

. 2007. “When Morality Opposes Justice: Conservatives Have Moral Intuitions That Liberals May Not Recognize.”Social Justice Research 20(1):98–116.

18.

Hopp

Frederic R.

Fisher

Jacob T.

Cornell

Devin

Huskey

Richard

Weber

René

. 2021. “The Extended Moral Foundations Dictionary (eMFD): Development and Applications of a Crowd-Sourced Approach to Extracting Moral Intuitions from Text.”Behavior Research Methods 53:232–46.

19.

Johnson

Amy L.

2024. “Psychotic White Men and Bipolar Black Women? Racialized and Gendered Implications of Mental Health Terminology.”Social Science & Medicine 352:117015.

20.

Jones

Jason J.

Amin

Mohammad Ruhul

Kim

Jessica

Skiena

Steven

. 2020. “Stereotypical Gender Associations in Language Have Decreased over Time.”Sociological Science 7:1–35.

21.

Joseph

Kenneth

Morgan

Jonathan

. 2020. “When Do Word Embeddings Accurately Reflect Surveys on Our Beliefs about People?” Pp. 4392–415 in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, edited by Jurafsky

Chai

Schluter

Tetreault

Kerrville, TX: Association for Computational Linguistics.

22.

King

Garry

Lam

Patrick

Roberts

Margaret E.

2017. “Computer-Assisted Keyword and Document Set Discovery from Unstructured Text.”American Journal of Political Science 61(4):971–88.

23.

Kozlowski

Austin C.

Taddy

Matt

Evans

James A.

2019. “The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings.”American Sociological Review 84(5):905–49.

24.

Liang

Yuxin

Cao

Rui

Zheng

Jie

Ren

Jie

Gao

Ling

. 2021. “Learning to Remove: Towards Isotropic Pre-trained BERT Embedding.” Pp. 448–59 in Artificial Neural Networks and Machine Learning—ICANN 2021, edited by Farkaš

Masulli

Otte

Wermter

Cham, Switzerland: Springer.

25.

Lix

Katharina

Goldberg

Amir

Srivastava

Sameer B.

Valentine

Melissa A.

2022. “Aligning Differences: Discursive Diversity and Team Performance.”Management Science 68(11):8430–48.

26.

McCumber

Andrew

Davis

Adam

. 2024. “Elite Environmental Aesthetics: Placing Nature in a Changing Climate.”American Journal of Cultural Sociology 12(1):53–84.

27.

Mikolov

Tomas

Chen

Kai

Corrado

Greg

Dean

Jeffrey

. 2013. “Efficient Estimation of Word Representations in Vector Space.” Retrieved April 28, 2026. https://arxiv.org/abs/1301.3781.

28.

Mikolov

Tomas

Grave

Edouard

Bojanowski

Piotr

Puhrsch

Christian

Joulin

Armand

. 2018. “Advances in Pre-training Distributed Word Representations.” Pp. 52–55 in Proceedings of the International Conference on Language Resources and Evaluation, edited by Calzolari

Choukri

Cieri

Declerck

Goggi

Hasida

Isahara

et al. Paris: European Language Resources Association.

29.

Mikolov

Tomas

Yih

Wen-tau

Zweig

Geoffrey

. 2013. “Linguistic Regularities in Continuous Space Word Representations.” Pp. 746–51 in Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, edited by Vanderwende

Daumé

III Kirchoff

Kerrville, TX: Association for Computational Linguistics.

30.

Mimno

David

Wallach

Hanna M.

Talley

Edmund

Leenders

Miriam

McCallum

Andrew

. 2011. “Optimizing Semantic Coherence in Topic Models.” Pp. 262–72 in Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, edited by Barzilay

Johnson

Kerrville, TX: Association for Computational Linguistics.

31.

Mohr

John W.

Duquenne

Vincent

. 1997. “The Duality of Culture and Practice: Poverty Relief in New York City, 1888–1917.”Theory and Society 26(2/3):305–356.

32.

Jiaqi

Bhat

Suma

Viswanath

Pramod

. 2017. “All-but-the-Top: Simple and Effective Postprocessing for Word Representations.” arXiv. Retrieved April 28, 2026. https://arxiv.org/abs/1702.01417.

33.

Nelson

Laura K.

2021. “Leveraging the Alignment between Machine Learning and Intersectionality: Using Word Embeddings to Measure Intersectional Experiences of the Nineteenth Century U.S. South.”Poetics 88:101539.

34.

Nguyen

Dat Quoc

Thanh

Nguyen

Anh Tuan

. 2020. “BERTweet: A Pre-trained Language Model for English Tweets.” Pp. 9–14 in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, edited by Liu

Schlangen

Kerrville, TX: Association for Computational Linguistics.

35.

Norvig

Peter

. 2009. “Natural Language Corpus Data.” Pp. 219–42 in Beautiful Data: The Stories Behind Elegant Data Solutions, edited by Segaran

Hammerbacher

Sebastopol, CA: O’Reilly Media.

36.

Pennebaker

James W.

Francis

Martha E.

Booth

Roger J.

2001. Linguistic Inquiry and Word Count: LIWC 2001. Mahwah, NJ: Lawrence Erlbaum.

37.

Pennington

Jeffrey

Socher

Richard

Manning

Christopher D.

2014. “GloVe: Global Vectors for Word Representation.” Pp. 1532–43 in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), edited by Moschitti

Pang

Daelemans

Kerrville, TX: Association for Computational Linguistics.

38.

Pouliot

Vincent

Patterson

Scott Rovert

. 2024. “Domesticating Wealth Inequality.”Global Studies Quarterly 4(2):ksae023.

39.

Rodriguez

Pedro L.

Spirling

Arthur

. 2022. “Word Embeddings: What Works, What Doesn’t, and How to Tell the Difference for Applied Research.”Journal of Politics 84(1):101–115.

40.

Schoon

Eric W.

Melamed

David

Breiger

Ronald L.

2024. Regression Inside Out. Cambridge, UK: Cambridge University Press.

41.

Selivanov

Dmitry

Bickel

Manuel

Wang

Qing

. 2023. “text2vec: Modern Text Mining Framework for R.” R Package Version 0.6.4. Retrieved April 28, 2026. https://CRAN.R-project.org/package=text2vec.

42.

Sievert

Carson

Shirley

Kenneth E.

2014. “LDAvis: A Method for Visualizing and Interpreting Topics.” Pp. 63–70 in Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, edited by Chuang

Green

Hearst

Heer

Koehn

Kerrville, TX: Association for Computational Linguistics.

43.

Stoltz

Dustin S.

Combs

Marissa A.

Taylor

Marshall A.

2023. “Corpus Modeling and the Geometries of Text: Meaning Spaces as Metaphor and Method.” Pp. 59–78 in The Oxford Handbook of the Sociology of Machine Learning, edited by Borch

Pardo-Guerra

J. P.

Oxford, UK: Oxford University Press.

44.

Stoltz

Dustin S.

Taylor

Marshall A.

2019. “Concept Mover’s Distance: Measuring Concept Engagement via Word Embeddings in Texts.”Journal of Computational Social Science 2(2):293–313.

45.

Stoltz

Dustin S.

Taylor

Marshall A.

2021. “Cultural Cartography with Word Embeddings.”Poetics 88:101567.

46.

Stoltz

Dustin S.

Taylor

Marshall A.

2022. “text2map: R Tools for Text Matrices.”Journal of Open Source Software 7(72):3741.

47.

Stoltz

Dustin S.

Taylor

Marshall A.

2024. Mapping Texts: Computational Text Analysis for the Social Sciences. Oxford, UK: Oxford University Press.

48.

Stoltz

Dustin S.

Taylor

Marshall A.

Dudley

Jennifer S. K.

2024. “A Tool Kit for Relation Induction in Text Analysis.”Sociological Methods & Research 54(2):565–604.

49.

Taylor

Marshall A.

Stoltz

Dustin S.

2020a. “Concept Class Analysis: A Method for Identifying Cultural Schemas in Texts.”Sociological Science 7:544–69.

50.

Taylor

Marshall A.

Stoltz

Dustin S.

2020b. “Integrating Semantic Directions with Concept Mover’s Distance to Measure Binary Concept Engagement.”Journal of Computational Social Science 4:231–42.

51.

Taylor

Marshall A.

Stoltz

Dustin S.

2025. “A Workflow for Analyzing Cultural Schemas in Texts.”Journal of Mathematical Sociology 49(1):1–24.

52.

van Loon

Austin

Freese

Jeremy

. 2023. “Word Embeddings Reveal How Fundamental Sentiments Structure Natural Language.”American Behavioral Scientist 67(2):175–200.

53.

Vann

Burrel

Jr.

2023. “The Framing of Marijuana in Black Newspapers.”International Journal of Press/Politics 30(1):370–97.

54.

Voyer

Andrea

Kline

Zachary D.

Danton

Madison

Volkova

Tatiana

. 2022. “From Strange to Normal: Computational Approaches to Examining Immigrant Incorporation through Shifts in the Mainstream.”Sociological Methods & Research 51(4):1540–79.

55.

Yoon

Hesu

McCumber

Andrew

. 2024. “A Symbolic Hierarchy of Places: Global Inequalities in Tourism Narratives of the New York Times Travel Section.”Poetics 102:101848.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.04 MB