What’s in a Name? Probabilistic Inference of Religious Community from South Asian Names

Abstract

Fine-grained data on religious communities are often considered sensitive in South Asia and consequently remain inaccessible. Yet without such data, statistical research on communal relations and group-based inequality remains superficial, hampering the development of appropriate policy measures to prevent further social exclusion on the basis of religion. The open-source algorithm introduced in this article provides a workaround by probabilistically exploiting the communal connotations of names; it transforms name lists—which are readily available—into a new source of demographic data. The algorithm proves highly accurate in identifying Muslim population shares in Uttar Pradesh, India’s most populous state, but could be employed more widely across South Asia. It potentially enables more detailed analyses in economics, development studies, and political science as well as better sampling procedures in sociology and anthropology. This article describes the algorithm, evaluates its accuracy, reflects on ethical implications, and introduces a sample data set; the software itself is available in an online supplement to this article.

Keywords

names religion South Asia linguistics big data

Get full access to this article

View all access options for this article.

References

Basant

Shariff

2010. The state of Muslims in India: An overview. In Handbook of Muslims in India, eds. Basant

Shariff

, 1–23. Oxford, UK: Oxford University Press.

Bhalotra

Clots-Figueras

Iyer

n.d. Politician identity and religious conflict in India. http://www.isid.ac.in/˜pu/seminar/20_04_2012_Paper1.pdf (accessed October 1, 2014).

Boyd

Crawford

2012. Critical question for big data. Information, Communication & Society 15:662–79.

Field

Levinson

Pande

Visaria

2008. Segregation, rent control, and riots: The economics of religious conflict in an Indian city. The American Economic Review 98:505–10.

Galonnier

2012. Aligarh: Sir Syed Nagar and Shah Jamal, contrasted tales of a “Muslim” city. In Muslims in Indian Cities: Trajectories of Marginalisation, eds. Gayer

Jaffrelot

, 129–58. London: Hurst.

Gayer

Jaffrelot

2012. Introduction: Muslims of the Indian city. From centrality to marginality. In Muslims in Indian cities: Trajectories of marginalisation, eds. Gayer

Jaffrelot

, 1–22. London: Hurst.

Haroon

1984. Cataloguing of Indian Muslim names. Delhi: Indian Bibliographies Bureau.

Jaffrelot

Kumar

, eds. 2009. Rise of the plebeians? The changing face of the Indian legislative assemblies. New Delhi: Routledge.

Jaffrelot

Verniers

2012. Castes, communities and parties in Uttar Pradesh. Economic & Political Weekly 47:89–93.

10.

Macfarlane

G. J.

Lunt

Palmer

Afzal

Silman

A. J.

Esmail

2007. Determining aspects of ethnicity amongst persons of South Asian origin: The use of a surname-classification programme (Nam Pehchan). Public Health 121:231–36.

11.

Mateos

2007. A review of name-based ethnicity classification methods and their potential in population studies. Population, Space and Place 13:243–63.

12.

Mateos

2011. Uncertain segregation: The challenge of defining and measuring ethnicity in segregation studies. Built Environment 37:226–38.

13.

Mateos

Longley

P. A.

O’Sullivan

2011. Ethnicity and population structure in personal naming networks. PloS One 6:1–12.

14.

Nitsch

Kadalayil

Mangtani

Steenkamp

Ansell

Tomson

Dos Santos Silva

Roderick

2009. Validation and utility of a computerized South Asian names and group recognition algorithm in ascertaining South Asian ethnicity in the national renal registry. QJM: An International Journal of Medicine 102:865–72.

15.

Sachar

Hamid

Oommen

T. K.

Basith

M. A.

Basant

Majeed

Shariff

2006. Social, economic and educational status of the Muslim community of India. New Delhi: Government of India.

16.

Susewind

2013. Data on religion and politics in Uttar Pradesh. http://data.raphael-susewind.de, published under an Open Database License (ODbL) (accessed October 1, 2014).

17.

Susewind

Dhattiwala

2014. Spatial variation in the “Muslim vote” in Gujarat and Uttar Pradesh, 2014. Economic & Political Weekly 49:99–101.

18.

Thottingal

2009. Swathanthra Indian language computing project (Indic Soundex). http://smc.org.in/silpa/Soundex, published under a GNU Affero General Public License (accessed October 1, 2014).

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB