Can Human Reading Validate a Topic Model?

Abstract

Validation is at the heart of methodological discussions about topic modeling. The authors argue that validation based on human reading hinges on distinctive words and readers’ labeling of a topic, and it overlooks the probability of conflicting results from semantically similar models, such as regressions or other methods. This runs counter to the presumption that topic modeling can reveal features of documents that have some measurable association with social aspects outside the text. The authors develop a similar topic identifying procedure to verify that semantically similar solutions yield similar results in further analysis. The authors argue that future validations of topic modeling must consider such procedures.

Keywords

topic modeling replication unsupervised learning validation measurement

Get full access to this article

View all access options for this article.

References

Baden

Christian

Pipal

Christian

Schoonvelde

Martijn

van der Velden

Mariken A. C. G.

2022. “Three Gaps in Computational Text Analysis Methods for Social Sciences: A Research Agenda.” Communication Methods and Measures 16(1):1–18.

Bail

Christopher A.

2016. “Cultural Carrying Capacity: Organ Donation Advocacy, Discursive Framing, and Social Media Engagement.” Social Science & Medicine 165:280–88.

Belford

Mark

Greene

Derek

. 2020. “Ensemble Topic Modeling Using Weighted Term Co-associations.” Expert Systems with Applications 161:113709.

Blei

David M.

2012. “Probabilistic Topic Models.” Communications of the ACM 55(4):77–84.

Buurma

Rachel Sagner

. 2015. “The Fictionality of Topic Modeling: Machine Reading Anthony Trollope’s Barsetshire Series.” Big Data & Society 2(2).

Chakrabarti

Parijat

Frye

Margaret

. 2017. “A Mixed-Methods Framework for Analyzing Text Data: Integrating Computational Techniques with Qualitative Methods in Demography.” Demographic Research 37(42):1351–82.

Chang

Jonathan

Gerrish

Sean

Wang

Chong

Boyd-Graber

Jordan L.

Blei

David M.

2009. “Reading Tea Leaves: How Humans Interpret Topic Models.” Pp. 288–96 in Advances in Neural Information Processing Systems, Vol. 22, edited by Bengio

Schuurmans

Lafferty

J. D.

Williams

C. K. I.

Culotta

Red Hook, NY: Curran Associates.

Chuang

Jason

Roberts

Margaret E.

Stewart

Brandon M.

Weiss

Rebecca

Tingley

Dustin

Grimmer

Justin

Heer

Jeffrey

. 2015. “TopicCheck: Interactive Alignment for Assessing Topic Model Stability.” Pp. 175–84 in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Denver, CO: Association for Computational Linguistics.

Denny

Matthew J.

Spirling

Arthur

. 2018. “Text Preprocessing for Unsupervised Learning: Why It Matters, When It Misleads, and What to Do about It.” Political Analysis 26(2):168–89.

10.

DiMaggio

Paul

. 2015. “Adapting Computational Text Analysis to Social Science (and Vice Versa).” Big Data & Society 2(2):1–5.

11.

DiMaggio

Paul

Nag

Manish

Blei

David

. 2013. “Exploiting Affinities between Topic Modeling and the Sociological Perspective on Culture: Application to Newspaper Coverage of U.S. Government Arts Funding.” Poetics 41(6):570–606.

12.

Edelmann

Achim

Mohr

John W.

2018. “Formal Studies of Culture: Issues, Challenges, and Current Trends.” Poetics 68:1–9.

13.

Edelmann

Achim

Wolff

Tom

Montagne

Danielle

Bail

Christopher A.

2020. “Computational Social Science and Sociology.” Annual Review of Sociology 46(1):61–81.

14.

Fligstein

Neil

Brundage

Jonah Stuart

Schultz

Michael

. 2017. “Seeing Like the Fed: Culture, Cognition, and Framing in the Failure to Anticipate the Financial Crisis of 2008.” American Sociological Review 82(5):879–909.

15.

Goldstone

Andrew

Underwood

Ted

. 2014. “The Quiet Transformations of Literary Studies: What Thirteen Thousand Scholars Could Tell Us.” New Literary History 45(3):359–84.

16.

Grimmer

Justin

King

Gary

. 2011. “General Purpose Computer-Assisted Clustering and Conceptualization.” Proceedings of the National Academy of Sciences 108(7):2643–50.

17.

Grimmer

Justin

Roberts

Margaret E.

Stewart

Brandon M.

2021. “Machine Learning for Social Science: An Agnostic Approach.” Annual Review of Political Science 24:395–419. https://doi.org/10.1146/annurev-polisci-053119-015921

18.

Grimmer

Justin

Roberts

Margaret E.

Stewart

Brandon M.

2022. Text as Data: A New Framework for Machine Learning and the Social Sciences. Princeton, NJ: Princeton University Press.

19.

Grimmer

Justin

Stewart

Brandon M.

2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21(3):267–97. https://doi.org/10.1093/pan/mps028

20.

Karell

Daniel

Freedman

Michael

. 2019. “Rhetorics of Radicalism.” American Sociological Review 84(4):726–53.

21.

Knox

Dean

Lucas

Christopher

Tam Cho

Wendy K.

2022. “Testing Causal Theories with Learned Proxies.” Annual Review of Political Science 25(1):419–41.

22.

Koltcov

Sergei

Koltsova

Olessia

Nikolenko

Sergey

. 2014. “Latent Dirichlet Allocation: Stability and Applications to Studies of User-Generated Content.” Pp. 161–65 in Proceedings of the 2014 ACM Conference on Web Science, WebSci ’14. New York: Association for Computing Machinery.

23.

Marshall

Emily A.

2013. “Defining Population Problems: Using Topic Models for Cross-National Comparison of Disciplinary Development.” Poetics 41(6):701–24.

24.

Mohr

John W.

Bail

Christopher A.

Frye

Margaret

Lena

Jennifer C.

Lizardo

Omar

McDonnell

Terence E.

Mische

Ann

Tavory

Iddo

Wherry

Frederick F.

2020. Measuring Culture. New York: Columbia University Press.

25.

Mohr

John W.

Bogdanov

Petko

. 2013. “Introduction Models: What They Are and Why They Matter.” Poetics 41(6):545–69.

26.

Mohr

John W.

Robin

Wagner-Pacifici

Breiger

Ronald L.

Bogdanov

Petko

. 2013. “Graphing the Grammar of Motives in National Security Strategies: Cultural Interpretation, Automated Text Analysis and the Drama of Global Politics.” Poetics 41(6):670–700.

27.

Muñoz

John

Young

Cristobal

. 2018. “We Ran 9 Billion Regressions: Eliminating False Positives through Computational Model Robustness.” Sociological Methodology 48(1):1–33.

28.

Nelson

Laura K.

2020. “Computational Grounded Theory: A Methodological Framework.” Sociological Methods & Research 49(1):3–42.

29.

Nelson

Laura K.

2021. “Cycles of Conflict, a Century of Continuity: The Impact of Persistent Place-Based Political Logics on Social Movement Strategy.” American Journal of Sociology 127(1):1–59.

30.

Nelson

Laura K.

Burk

Derek

Knudsen

Marcel

McCall

Leslie

. 2021. “The Future of Coding: A Comparison of Hand-Coding and Three Types of Computer-Assisted Text Analysis Methods.” Sociological Methods & Research 50(1):202–37.

31.

Pardo-Guerra

Juan Pablo

. 2020. “Where Are the Market Devices? Exploring the Links among Regulation, Markets, and Technology at the Securities and Exchange Commission, 1935.” Theory and Society 49(2):245–76.

32.

Pardo-Guerra

Juan Pablo

Pahwa

Prithviraj

. 2022. “The Extended Computational Case Method: A Framework for Research Design.” Sociological Methods & Research 51(4):1826–67.

33.

Quinn

Kevin M.

Monroe

Burt L.

Colaresi

Michael

Crespin

Michael H.

Radev

Dragomir R.

2010. “How to Analyze Political Attention with Minimal Assumptions and Costs.” American Journal of Political Science 54(1):209–28.

34.

Rieger

Jonas

. 2020. “ldaPrototype: A Method in R to Get a Prototype of Multiple Latent Dirichlet Allocations.” Journal of Open Source Software 5(51):2181.

35.

Roberts

Margaret E.

Stewart

Brandon M.

Tingley

Dustin

. 2016. “Navigating the Local Modes of Big Data: The Case of Topic Models.” Pp. 51–97 in Computational Social Science, edited by Alvarez

R. M.

Cambridge, UK: Cambridge University Press.

36.

Steyvers

Mark

Griffiths

Tom

. 2007. “Probabilistic Topic Models.” Handbook of Latent Semantic Analysis 427(7):424–40.

37.

Wilkerson

John

Casas

Andreu

. 2017. “Large-Scale Computerized Text Analysis in Political Science: Opportunities and Challenges.” Annual Review of Political Science 20:529–44.

38.

Ying

Luwei

Montgomery

Jacob M.

Stewart

Brandon M.

2022. “Topics, Concepts, and Measurement: A Crowdsourced Procedure for Validating Topics as Measures.” Political Analysis 30(4):570–89.

39.

Young

Cristobal

Holsteen

Katherine

. 2017. “Model Uncertainty and Robustness: A Computational Framework for Multimodel Analysis.” Sociological Methods & Research 46(1):3–40.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.78 MB