Abstract
In this article, we address the issue of reliability of quantitative data on multilingualism of the past obtained as recall data. More specifically, we investigate whether the interviewees’ assessments of the language repertoires of their late relatives (indirect data) provide results that are quantitatively similar to those obtained from the people of the same age range themselves (direct data). The empirical data we use come from an ongoing field study of traditional multilingualism in Daghestan (Russia). We trained machine learning models to see whether they can detect differences in indirect and direct data. We conclude that our indirect quantitative data on L2 other than Russian are essentially similar to direct data, while there may be a small but systematic underestimation when reporting others’ knowledge of Russian.
Get full access to this article
View all access options for this article.
