Abstract
Fine-grained data on religious communities are often considered sensitive in South Asia and consequently remain inaccessible. Yet without such data, statistical research on communal relations and group-based inequality remains superficial, hampering the development of appropriate policy measures to prevent further social exclusion on the basis of religion. The open-source algorithm introduced in this article provides a workaround by probabilistically exploiting the communal connotations of names; it transforms name lists—which are readily available—into a new source of demographic data. The algorithm proves highly accurate in identifying Muslim population shares in Uttar Pradesh, India’s most populous state, but could be employed more widely across South Asia. It potentially enables more detailed analyses in economics, development studies, and political science as well as better sampling procedures in sociology and anthropology. This article describes the algorithm, evaluates its accuracy, reflects on ethical implications, and introduces a sample data set; the software itself is available in an online supplement to this article.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
