Sage Journals: Discover world-class research

Abstract

This research developed a machine learning classifier that reliably automates the coding process using the National Taxonomy of Exempt Entities as a schema and remapped the U.S. nonprofit sector. I achieved 90% overall accuracy for classifying the nonprofits into nine broad categories and 88% for classifying them into 25 major groups. The intercoder reliabilities between algorithms and human coders measured by kappa statistics are in the “almost perfect” range of .80 to 1.00. The results suggest that a state-of-the-art machine learning algorithm can approximate human coders and substantially improve researchers’ productivity. I also reassigned multiple category codes to more than 439,000 nonprofits and discovered a considerable amount of organizational activities that were previously ignored. The classifier is an essential methodological prerequisite for large-N and Big Data analyses, and the remapped U.S. nonprofit sector can serve as an important instrument for asking or reexamining fundamental questions of nonprofit studies. The working directory with all data sets, source codes, and historical versions are available on GitHub (https://github.com/ma-ji/npo_classifier).

Keywords

National Taxonomy of Exempt Entities nonprofit organization neural network BERT machine learning computational social science

Get full access to this article

View all access options for this article.

References

Anastasopoulos

L. J.

Whitford

A. B.

(2019). Machine learning for public administration research, with application to organizational reputation. Journal of Public Administration Research and Theory, 29(3), 491–510. https://doi.org/10.1093/jopart/muy060

Baćak

Kennedy

E. H.

(2019). Principled machine learning using the super learner: An application to predicting prison violence. Sociological Methods & Research, 48(3), 698–721. https://doi.org/10.1177/0049124117747301

Barman

(2013). Classificatory struggles in the nonprofit sector: The Formation of the National Taxonomy of Exempt Entities, 1969–1987. Social Science History, 37(1), 103–141.

Bellman

R. E.

(2015). Adaptive control processes: A guided tour. Princeton University Press. https://doi.org/10.1515/9781400874668 (Original work published 1961)

Bengfort

Bilbro

Ojeda

(2018). Applied text analysis with python: Enabling language-aware data products with machine learning (1st ed.). O’Reilly Media.

Bhati

McDonnell

(2020). Success in an online giving day: The role of social media in fundraising. Nonprofit and Voluntary Sector Quarterly, 49(1), 74–92. https://doi.org/10.1177/0899764019868849

Breiman

(1996). Bagging predictors. Machine Learning, 24(2), 123–140. https://doi.org/10.1007/BF00058655

Collobert

Weston

(2008, July). A unified architecture for natural language processing: Deep neural networks with multitask learning [Conference session]. Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland. https://doi.org/10.1145/1390156.1390177

Denison

D. V.

(2009). Which nonprofit organizations borrow? Public Budgeting & Finance, 29(3), 110–123. https://doi.org/10.1111/j.1540-5850.2009.00939.x

10.

Devlin

Chang

M.-W.

Lee

Toutanova

(2019). BERT: Pre-training of deep bidirectional transformers for language understanding. http://arxiv.org/abs/1810.04805

11.

Durkheim

. (2012). The elementary forms of the religious life. Courier Corporation. (Original work published 1912)

12.

Fyall

Moore

M. K.

Gugerty

M. K.

(2018). Beyond NTEE codes: Opportunities to understand nonprofit activity through mission statement content coding. Nonprofit and Voluntary Sector Quarterly, 47(4), 677–701. https://doi.org/10.1177/0899764018768019

13.

Grimmer

Stewart

B. M.

(2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3), 267–297. https://doi.org/10.1093/pan/mps028

14.

Grønbjerg

K. A.

(1994). Using NTEE to classify non-profit organisations: An assessment of human service and regional applications. Voluntas: International Journal of Voluntary and Nonprofit Organizations, 5(3), 301–328. https://doi.org/10.1007/BF02354038

15.

Grønbjerg

K. A.

Liu

H. K.

Pollak

T. H.

(2010). Incorporated but not IRS-registered: Exploring the (dark) grey fringes of the nonprofit universe. Nonprofit and Voluntary Sector Quarterly, 39(5), 925–945. https://doi.org/10.1177/0899764009342898

16.

Hall

P. D.

(2006). A historical overview of philanthropy, voluntary associations, and nonprofit organizations in the United States, 1600–2000. In Powell

W. W.

Steinberg

(Eds.), The nonprofit sector: A research handbook (pp. 32–65). Yale University Press.

17.

Hodgkinson

V. A.

(1990). Mapping the non-profit sector in the United States: Implications for research. Voluntas: International Journal of Voluntary and Nonprofit Organizations, 1(2), 6–32. https://doi.org/10.1007/BF01397436

18.

Hodgkinson

V. A.

Toppe

(1991). A new research and planning tool for managers: The National Taxonomy of Exempt Entities. Nonprofit Management and Leadership, 1(4), 403–414. https://doi.org/10.1002/nml.4130010410

19.

Hollibaugh

G. E.

(2018). The use of text as data methods in public administration: A review and an application to agency priorities. Journal of Public Administration Research and Theory, 29(3), 474–490. https://doi.org/10.1093/jopart/muy045

20.

Jurafsky

Martin

J. H.

(2019, October 16). Speech and language processing [Draft]. https://web.stanford.edu/~jurafsky/slp3/

21.

Keahey

Riteau

Stanzione

Cockerill

Mambretti

Rad

Ruth

(2018). Chameleon: A scalable production testbed for computer science research. In Vetter

(Ed.), Contemporary high performance computing: From petascale toward exascale (1st ed., pp. 123–148). CRC Press.

22.

Kozlowski

A. C.

Taddy

Evans

J. A.

(2019). The geometry of culture: Analyzing the meanings of class through word embeddings. American Sociological Review, 84(5), 905–949. https://doi.org/10.1177/0003122419877135

23.

Lampkin

Romeo

Finnin

(2001). Introducing the nonprofit program classification system: The taxonomy we’ve been waiting for. Nonprofit and Voluntary Sector Quarterly, 30(4), 781–793. https://doi.org/10.1177/0899764001304009

24.

Landis

J. R.

Koch

G. G.

(1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. https://doi.org/10.2307/2529310

25.

Lemaître

Nogueira

Aridas

C. K.

(2017). Imbalanced-learn: A Python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research, 18(1), 559–563. http://dl.acm.org/citation.cfm?id=3122009.3122026

26.

Litofcenko

Karner

Maier

(2020). Methods for classifying nonprofit organizations according to their field of activity: A report on semi-automated methods based on text. Voluntas: International Journal of Voluntary and Nonprofit Organizations, 31(1), 227–237. https://doi.org/10.1007/s11266-019-00181-w

27.

McKeever

B. S.

Dietz

N. E.

Fyffe

S. D.

(2016). The nonprofit almanac: The essential facts and figures for managers, researchers, and volunteers. Rowman & Littlefield.

28.

McVeigh

(2006). Structural influences on activism and crime: Identifying the social structure of discontent. American Journal of Sociology, 112(2), 510–566. https://doi.org/10.1086/506414

29.

Mikolov

Chen

Corrado

Dean

(2013). Efficient estimation of word representations in vector space. http://arxiv.org/abs/1301.3781

30.

Miller

G. A.

(1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39–41. https://doi.org/10.1145/219717.219748

31.

National Center for Charitable Statistics. (2006). Guide to using NCCS data. Urban Institute. https://nccs-data.urban.org/NCCSdata-guide.pdf

32.

National Center for Charitable Statistics. (2007). National Taxonomy of Exempt Entities-Core Codes 2007 desk reference. Urban Institute.

33.

Nelson

L. K.

Burk

Knudsen

McCall

(2018). The future of coding: A comparison of hand-coding and three types of computer-assisted text analysis methods. Sociological Methods & Research. Advance online publication. https://doi.org/10.1177/0049124118769114

34.

The Nonprofit Center. (2008, May 28). How and by whom are NTEEs assigned? https://web.archive.org/web/20200118035322/http://www.thenonprofitlink.org/knowledgebase/detail.php?linkID=728&category=120&xrefID=3012/

35.

Okten

Weisbrod

B. A.

(2000). Determinants of donations in private nonprofit markets. Journal of Public Economics, 75(2), 255–272. https://doi.org/10.1016/S0047-2727(99)00066-3

36.

Paxton

Velasco

Ressler

(2019). Nonprofit-specific glossary and stemmer. https://web.archive.org/web/20190509160945/https://www.pamelapaxton.com/990missionstatements

37.

Pennington

Socher

Manning

(2014, October). Glove: Global vectors for word representation [Conference session]. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar. https://doi.org/10.3115/v1/D14-1162

38.

Quinlan

J. R.

(1986). Induction of decision trees. Machine Learning, 1(1), 81–106. https://doi.org/10.1007/BF00116251

39.

Rennie

J. D. M.

Shih

Teevan

Karger

D. R.

(2003, August 21–24). Tackling the poor assumptions of Naive Bayes text classifiers [Conference session]. Proceedings of the Twentieth International Conference on International Conference on Machine Learning, Washington, DC, United States. http://dl.acm.org/citation.cfm?id=3041838.3041916

40.

Roeger

K. L.

Blackwood

A. S.

Pettijohn

S. L.

(2015). The nonprofit sector and its place in the national economy. In Ott

J. S.

Dicke

L. A.

(Eds.), The nature of the nonprofit sector (3rd ed., pp. 22–37). Westview Press.

41.

Salamon

L. M.

Anheier

H. K.

(1992). In search of the non-profit sector II: The problem of classification. Voluntas: International Journal of Voluntary and Nonprofit Organizations, 3(3), 267–309. https://doi.org/10.1007/BF01397460

42.

Salamon

L. M.

Anheir

H. K.

(1996). The international classification of nonprofit organizations ICNPO-Revision 1, 1996 (OCLC: 760476834). The Johns Hopkins University Institute for Policy Studies.

43.

Salminen

Yoganathan

Corporan

Jansen

B. J.

Jung

S.-G.

(2019). Machine learning approach to auto-tagging online content for content marketing efficiency: A comparative analysis between methods and content type. Journal of Business Research, 101, 203–217. https://doi.org/10.1016/j.jbusres.2019.04.018

44.

Sharkey

Torrats-Espinosa

Takyar

(2017). Community and the crime decline: The causal effect of local nonprofits on violent crime. American Sociological Review, 82(6), 1214–1240. https://doi.org/10.1177/0003122417736289

45.

Simundic

A.-M.

Nikolac

Ivankovic

Ferenec-Ruzic

Magdic

Kvaternik

Topic

(2009). Comparison of visual vs. automated detection of lipemic, icteric and hemolyzed specimens: Can we rely on a human eye? Clinical Chemistry and Laboratory Medicine, 47(11), 1361–1365. https://doi.org/10.1515/CCLM.2009.306

46.

Sloan

M. F.

(2009). The effects of nonprofit accountability ratings on donor behavior. Nonprofit and Voluntary Sector Quarterly, 38(2), 220–236. https://doi.org/10.1177/0899764008316470

47.

Smith

D. H.

(1997). The rest of the nonprofit sector: Grassroots associations as the dark matter ignored in prevailing “flat earth” maps of the sector. Nonprofit and Voluntary Sector Quarterly, 26(2), 114–131. https://doi.org/10.1177/0899764097262002

48.

Stengel

N. A. J.

Lampkin

L. M.

Stevenson

D. R.

(1998). Getting it right: Verifying the Classification of Public Charities in the 1994 Statistics of Income Study Sample. In Statistics of Income Division & Internal Revenue Service (Ed.), Turning administrative systems into information systems (pp. 145–167). Statistics of Income Division, Internal Revenue Service.

49.

U.S. Internal Revenue Service. (2013, March 29). IRS Static Files No. 2013-000520130005. https://web.archive.org/web/20170223062329/https://www.irs.gov/pub/irs-wd/13-0005.pdf

50.

U.S. Internal Revenue Service. (2014, April). Exempt organizations business master file information sheet. https://web.archive.org/web/20191225215629/https://www.irs.gov/pub/irs-soi/eo_info.pdf

51.

U.S. Internal Revenue Service. (2018, January 29). 2019 instructions for form 990-EZ. https://web.archive.org/web/20200810153827/https://www.irs.gov/pub/irs-pdf/i990ez.pdf

52.

U.S. Internal Revenue Service. (2019). Annual exempt organization return: Who must file. https://web.archive.org/web/20200913033455/https://www.irs.gov/charities-non-profits/annual-exempt-organization-return-who-must-file

53.

Vakil

A. C.

(1997). Confronting the classification problem: Toward a taxonomy of NGOs. World Development, 25(12), 2057–2070. https://doi.org/10.1016/S0305-750X(97)00098-3

54.

Vasi

I. B.

Walker

E. T.

Johnson

J. S.

Tan

H. F.

(2015). “No fracking way!” Documentary film, discursive opportunity, and local opposition against hydraulic fracturing in the United States, 2010 to 2013. American Sociological Review, 80(5), 934–959. https://doi.org/10.1177/0003122415598534

55.

Viera

A. J.

Garrett

J. M.

(2005). Understanding interobserver agreement: The kappa statistic. Family Medicine, 37(5), 360–363.

56.

Zhang

Wallace

(2015). A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. http://arxiv.org/abs/1510.03820

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB

Automated Coding Using Machine Learning and Remapping the U.S. Nonprofit Sector: A Guide and Benchmark

Abstract

Keywords

Get full access to this article

References

Supplementary Material