Abstract
Inducing semantic relations in word vector spaces and analyzing how other words or entire documents discursively engage these relations is a popular form of cultural analysis. The authors propose a reliability metric that is easily interpretable and agnostic to the type of relation. The metric, which the authors call the anchor reliability coefficient (relco), is found by creating an artificial document-term matrix of simulated documents that sequentially shift more of their tokens from relation-relevant anchor terms to nonanchor terms and then regressing the documents’ similarity to an induced relation on the anchor inclusion score of the documents. The authors validate the metric at the word level with both expert- and crowdsourced dictionaries and at the document level with expert-annotated social media posts. The authors also provide some heuristic baselines for assessing reliability effect sizes and null hypothesis testing.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
