Abstract
The research field of nonprofits and philanthropy has grown exponentially. To what extent do nonprofit scholars share a common language? Answering this question is crucial to assessing the field’s intellectual cohesiveness. We studied how coauthor networks, scholarly reputation, and the prevalence of female authors influence consensus formation. We found that the degree of consensus for all major research topics in the field has increased over time—For every 10% growth in the volume of literature, shared language increased by 1.4%. A cohesive research community on nonprofits and philanthropy has been forming since the early 2000s. Female scholars are fewer in number and less cited than males; their presence did not exceed 40% for most topics. The citation counts of scholars and small-world property of networks are positively associated with consensus, suggesting that star researchers and knowledge brokers bridging different intellectual communities are key to sharing research interests and language.
Keywords
Introduction
When Katz (1999) asked, “Where did the serious study of philanthropy come from, anyway?” Hall (1999) answered, “The work of many hands.” As an emerging and interdisciplinary research field (Ma & Konrath, 2018; D. H. Smith, 2013; Walk & Andersson, 2020), nonprofit and philanthropic studies has been attracting talents and topics from various disciplines; it also has faced a core challenge since its emergence: Aside from the exponential growth of the volume of publications, has this field produced a cohesive and unique body of knowledge that can distinguish itself from other research areas (Young, 1999, pp. 19–21)? With this article, we study a basic question that is fundamental to the core challenge: How shared language and research topics in our field have developed over time, and what drives that development.
Even without achieving substantive agreement, scholars who share language and research topics tend to align their thinking and form research paradigms—a critical condition for disciplinary development (Kuhn, 1970). For example, while diverging in definitions of “the third sector,” nonprofit scholars share research interests and use similar language despite those disagreements—the type of consensus studied in this project.
Has the degree of consensus increased over time? How does the development of consensus differ between research topics, and how can the development of consensus be explained? By applying innovative computational methods in studying the sociology of knowledge (Edelmann et al., 2020, p. 24.8), we provide a first explanatory analysis of our research field’s knowledge growth from three aspects: coauthor networks, scholarly reputation, and gender differences.
Studying Nonprofits and Philanthropy: The Quest to Define a New Field
Before delving into the technical details of studying consensus, we must review how scholars from various disciplines have collectively shaped this field, and how the field has expanded. The research field of nonprofits and philanthropy has deep historical roots. In the field’s earliest stages, dating back to the 1890s, most of the scholarship was published as doctoral dissertations and theses by scholars in different disciplines. No collective awareness of “nonprofit and philanthropic studies” existed until the 1970s (Hall, 1999). From the 1970s to the 2000s, the field began to emerge, with intellectual growth in terms of volume. During this period, scholars focused on building both consensus around a few questions and institutional forces that were fundamental to the field. From the 2010s to the present, the field expanded and consolidated, with intellectual growth in terms of quality and diversity, and an increasing presence of female scholars. Across both time periods, this interdisciplinary research field strove to define itself as a distinct research field with a unique and cohesive body of knowledge.
Early Foundations and Boom (1970s–2000s)
Inventing the term “nonprofit sector” and justifying the sector’s existence was the most fundamental development in the 1970s (Bushouse, 2017; Hall, 2006). One of the field’s earliest milestones was the assembly of the Commission on Private Philanthropy and Public Needs (also known as, the Filer Commission) in the early 1970s, convened by John D. Rockefeller III. The executive board of the commission consisted of 26 distinguished members (four of them female) from various nonprofit or religious organizations, and they led the publication of six volumes of research papers, providing a comprehensive review of research available at that time (Commission on Private Philanthropy and Public Needs, 1975). The commission introduced the notion of “nonprofit sector” and argued that the sector was an essential part of a healthy society. Rockefeller also provided the initial funding for establishing the field’s first university-based research center in 1977, the Program on Non-Profit Organizations at Yale University. That program not only facilitated many influential studies and prepared the intellectual basis for the field, but also served as a model for other research centers founded in subsequent decades (Hall, 2006, pp. 54–55; Soskis, 2020, pp. 61–63).
Another institutional advance underpinning the field was the establishment of academic associations and journals serving as home base for nonprofit scholars. The first milestone was the founding of the Association of Voluntary Action Scholars (AVAS) in the early 1970s. The association also established its own journal around the same time, the Journal of Voluntary Action Research (JVAR). Around 1990, both the association and journal were renamed—the AVAS was rename the Association for Research on Nonprofit Organizations and Voluntary Action (ARNOVA), and the JVAR was renamed Nonprofit and Voluntary Sector Quarterly (NVSQ), marking the broadening of the research interests from a focus on voluntary action to a wider range of nonprofit topics (Smith 2003, pp. 462–463). Brudney and Durden (1993) analyzed all the articles published by the journal in its first 20 years (i.e., 1972–1991) and found that (a) the field’s disciplinary origins were remarkably diverse—the first authors hailed from 35 disciplines, with a heavy representation from sociology; (b) even in its early years, the journal’s articles used robust empirical analysis methods; (c) 35% of the authors were women, and the contributions from female authors were expected to continue increasing.
While Brudney and Durden forecasted a promising future for the field’s intellectual growth, the dissemination of nonprofit and philanthropic research in the early 1990s lagged well behind the level of academic interest because of limited publication opportunities. An analysis of all empirical research on charitable giving found that most of the research on giving appeared in psychology journals until the 1980s, with economics and sociology taking precedence thereafter (Bekkers & Dursun, 2013). The perceived shortage of publication opportunities sparked the creation of two new journals and a scholarly association: Nonprofit Management and Leadership (NML) and Voluntas were both established in 1990, and the International Society for Third-Sector Research (ISTR) was founded in 1992. With ISTR and its affiliated journal, Voluntas, the research field enlarged its geographic scope from primarily North America to global (Anheier & Knapp, 1990, p. 1). Meanwhile, NML aspired to bridge the gap between theory and practice, emphasizing the utility of scholarly work to practitioners and policymakers (Young & Billis, 1990, p. 2). Since their creation, the three journals have gained recognition as the field’s top-tier outlets (Brudney & Herman, 2004). In the field of nonprofit marketing, two specialized journals were also established in the 1990s: the Journal of Nonprofit & Public Sector Marketing (Self, 1993) and the International Journal of Nonprofit and Voluntary Sector Marketing (Saxton, 1996).
Globally, the number of journals and academic associations with an emphasis on studying nonprofits and philanthropy increased rapidly between the 1980s and 2000s. D. H. Smith (2013) estimated that there were over 100 journals and over 40 research associations worldwide by the late 2000s.
With the proliferation of journal outlets and academic associations, the volume of literature in nonprofit and philanthropic studies grew exponentially from the early 1990s (Ma & Konrath, 2018, p. 1145). Nonprofit scholars started to publish original articles in these outlets and gradually formed consensus around topics that are core to this field, for example, on the origins of the nonprofit sector (e.g., Salamon, 1987; Salamon & Anheier, 1998), on volunteering (e.g., Cnaan et al., 1996; D. H. Smith, 1994), and on cross-sector collaboration (e.g., Austin, 2000). Despite disagreement on the answers to specific questions, nonprofit scholars began to share research interests and construct a common language.
An Emerging Research Field and Intellectual Cohesion (2010s–Present)
The field’s knowledge production continued its exponential growth and expansion. Notably, the field’s research topics and methods became increasingly diverse, and the activity and success of women increased further in this period. Students of nonprofits and philanthropy began to recognize themselves as “nonprofit scholars,” forming a consensus on what the research field is about.
Numerous journals dedicated to this field have launched since 2010, providing more publication opportunities to scholars studying different topics and attracting talents from various disciplines. To improve the applicability of scholarly research and mitigate the theory-practice divide in the research field (Bushouse & Sowa, 2012), Dennis Young, the founding editor of NML, established the Nonprofit Policy Forum in 2010 to serve as an interface between public policy and the nonprofit sector (Young, 2010, 2021 pp. 13–15). Other specialized journals, including the Journal of Nonprofit Education and Leadership, Voluntary Sector Review, and The Foundation Review, were established around the same time (Behrens, 2009; Dolch, 2010; Halfpenny et al., 2010). The Journal of Public and Nonprofit Affairs published its inaugural issue in 2015 (Eger et al., 2015). These new journals, together with those established earlier and those primarily serving other fields, formed a variety of journal groups ranging from core to periphery, and sustained the field’s intellectual growth in the 2010s (Walk & Andersson, 2020, p. 87).
Aside from its expansion in size, the field’s intellectual cohesiveness is another important indicator of maturity. A few empirical studies published since 2010 explored the field’s major research themes. For example, Bekkers and Wiepking (2011) categorized into eight mechanisms the determinants of philanthropic behavior studied empirically in over 500 publications from 1955 to 2008. Shier and Handy (2014) coded 3,790 nonprofit and philanthropic studies dissertations and theses written between 1986 and 2010 and found five major themes: resources, organizational effectiveness and performance, organizational development, intra-organizational context, and interaction and collaboration. The themes suggested that the field’s scholarship had begun to cluster around core questions and develop intellectual cohesion. Minkowitz et al. (2020) described the use of data and theories and the nationality of authors of 972 articles published in the three core journals, NVSQ, Voluntas, and NML. The analysis showed a sustained dominance of U.S.-based research relying on quantitative analyses, with a relative dearth of theoretical integration.
Since the late 2010s, as more data about the sector became available to nonprofit scholars, 1 researchers also initiated numerous data projects to track the field’s intellectual growth systematically. One of the earliest projects was the Philanthropic Studies Index (PSI) at Indiana University’s Lilly Family School of Philanthropy. In the early 1990s, the PSI started collecting bibliographic records of books and journal articles in nonprofit and philanthropic studies. By the mid-2010s, when PSI stopped updating regularly, the database included nearly 20,000 records classified by professional librarians using Library of Congress Subject Headings. In a later effort, Brass et al. (2018) constructed the NGO Knowledge Collective database including 3,336 journal articles that relate to NGOs and development. They described the profiles of authors, methodologies, and research themes. Ma and Konrath (2018) created an even larger database of 12,016 publications in 19 journals published between 1925 and 2015, periodizing the development of the field, describing research activities and scholarly networks, and analyzing major themes. Revamping the data set used in Ma and Konrath (2018) and merging it with PSI, Ma et al. (2021) constructed the Knowledge Infrastructure of Nonprofit and Philanthropic Studies (KINPS). The KINPS currently includes over 70,000 bibliographical records and is the largest database of nonprofit and philanthropic studies literature to date.
Taken together, these studies provide strong empirical evidence of the growth in both volume and quality of the field’s intellectual base. The field has generated clusters of themes, and scholars share theories and primary data sources and refer to a common set of classic works in the field. The improved availability and quality of research databases provide scholars with novel opportunities to study the field’s knowledge production. We can reasonably expect that more articles of this sort will be published in the years to come. Collective efforts will be, as they already have been, crucial to further advances in this emerging research field.
Studying Consensus: Knowledge Cohesion and Disciplinary Development
The historical review presents some encouraging evidence but also reveals a clear gap: The cohesion of nonprofit scholarship deserves more attention than its volume, and we need to understand whether and how the field’s intellectual growth has brought together scholars from different specializations and disciplines. Have nonprofit scholars formed consensus on what the research field is about? We need to know to what extent researchers are sharing theories, topics, and language that are core to studying nonprofits and philanthropy. Inquiries of this sort are particularly germane to new interdisciplinary fields of research; empirical analysis shows that scholars in disciplines with low levels of consensus assess the vitality of their fields more pessimistically (Hargens & Kelly-Wilson, 1994, p. 1191).
To examine shared research topics and language, earlier endeavors in nonprofit studies relied primarily on manually coding and counting the topics of research articles and theses (e.g., Brudney & Durden, 1993; Bushouse & Sowa, 2012; Shier & Handy, 2014). Although the topics identified by different studies vary, a few of them consistently appear on different lists (e.g., volunteering, human resources, and interorganizational relations and collaboration). By applying advanced computational methods and utilizing bibliometrics, recent studies have mapped the connections between and the evolution of different research topics in this field (e.g., Jung et al., 2022; Kang et al., 2022; Ma & Konrath, 2018). Scholars have also mapped the relations between nonprofit studies and other fields of research, recommending more knowledge integration between different research fields (LePere-Schloop & Nesbit, 2022). In this article, we analyze the degree to which nonprofit researchers have consensus in the form of shared language.
Although consensus of language and research interest is a precondition for disciplinary development (Cole, 1983, p. 134; Lodahl & Gordon, 1972, p. 60), it does not equate to agreement. Disciplinary consensus can be classified into two primary groups: discursive and substantive (J. H. Evans, 2007). The discursive consensus is “an agreed upon language to describe the phenomena” (J. H. Evans, 2007, p. 2). Scholars may interpret a phenomenon differently under this consensus, but they all agree that such a phenomenon is a core interest of the field. Scholars achieve substantive consensus when certain interpretations are widely accepted by the academic community.
This study focuses on discursive consensus (i.e., shared language and topics) for two reasons. First, it is the basis for achieving substantive consensus. Scholars must first share a common language in debate even when they hold different opinions. For example, nonprofit scholars who hold different positions in discussing the origins of the nonprofit sector must share a core vocabulary, for example, “origin,” “market,” and “government.” Second, empirical methods limit the analytical possibilities. Studying substantive consensus requires computer algorithms to process the meanings of paragraphs and articles. But at the moment, even the most sophisticated algorithms can as yet hardly distinguish whether two texts are disputing a claim or agreeing with each other.
Finally, we need to be cautious about the intrinsic value of building consensus. Kuhn’s notion of normal science warns us that scholars in a field with mature paradigms are more likely to be puzzle-solvers (Kuhn, 1970), producing and reproducing knowledge that is large in volume but less innovative (Chu & Evans, 2021). For nonprofit studies, scholars have already discussed the “danger of success” that results from the institutionalization of the field (O’Neill, 2007, 174S): As the field continues its growth, researchers may share mature paradigms, theories, and languages and become puzzle-solvers. However, such a situation is not necessarily desirable even though scholars may achieve higher consensus. A recent study found that this field’s journals display a lower level of diversity in terms of research themes and theories than do dissertations, suggesting a decreasing level of research diversity (Schubert et al., 2022). Furthermore, the “critical nonprofit scholarship” reminds us that the shared mainstream theories can be very limited in terms of inclusiveness and diversity (Coule et al., 2022). How to maintain the balance between consensus and diversity will be a persistent challenge to this interdisciplinary field.
Drivers of Scientific Consensus
Despite the importance of consensus for disciplinary development, the drivers of consensus have been only sporadically studied by social scientists. Although social scientists have been exploring the correlates of consensus for half a century, the existing literature still leaves room for improvement. First, many of the assumptions in the literature are empirically untested, which leaves room for controversy. Second, the majority of empirical studies do not aim at establishing a direct relation between consensus and its correlates. Third, the measures are methodologically weak in many previous empirical studies (Bruggeman et al., 2012; Cole, 1983, p. 129; Cole et al., 1988). Last but not least, our understanding about the drivers of consensus is fragmented because no study to date has investigated multiple drivers in the same analysis.
We theorize the drivers of consensus primarily from a relational perspective and base our hypotheses on the literature on disciplinary and thematic difference, collaboration networks, scholarly reputation, and gender differences. The following paragraphs briefly review relevant theoretical backgrounds, and Supplemental Appendix A details the specific hypotheses we test.
Disciplinary and Thematic Differences
Academic activities vary by discipline and research topic. Therefore, consensus cannot be discussed without considering disciplinary and thematic differences. One of the oldest and most classic propositions regarding disciplinary differences is probably the Hierarchy of the Sciences (Cole, 1983). It assumes that sciences can be arranged hierarchically, with natural sciences (“hard sciences,” e.g., physics and mathematics) on top, and social sciences (“soft sciences,” e.g., sociology and political science) on bottom. 2 The “hard sciences” share more consensus and achieve agreements faster than the “soft sciences” (Collins, 1994). Even within a specific discipline, the “hardness” of different research themes can vary (L. D. Smith et al., 2000, p. 79). Although the Hierarchy of the Sciences is still empirically controversial, it is a widely accepted theoretical and philosophical assumption, and scholars have not yet ceased testing this assumption with novel methodological advances (e.g., Fanelli, 2019; Fanelli & Glänzel, 2013; Peng et al., 2020; Simonton, 2015).
Collaboration Networks
Previous studies on knowledge production have shown that consensus is related to the structure of collaboration networks between researchers. When researchers collaborate, they speak with one voice in coauthored publications. Furthermore, they are likely to use similar theories, data, and methods in their other work, building upon a body of collective knowledge. We first test the relationship between the size of collaboration and the level of consensus because consensus is the basis for collaboration (Fanelli & Glänzel, 2013, p. 4).
Two other features of collaboration networks reflect how individuals and sub-communities are connected. At the individual level, a decisive feature of social networks is the small-world property: Researchers are not randomly or equally connected; they form densely connected sub-communities. Moreover, there are effective “shortcut-nodes” reducing the distance between two individuals otherwise far apart (Watts, 1999, p. 241). Some nonprofit scholars may collaborate with each other more often than with others, forming a “small world”; meanwhile, one or more of those scholars may also collaborate with others who are not part of that small-world network, thereby bridging different intellectual communities.
At the whole-network level, the connection pattern between sub-communities is another relevant structural feature: “nodes are joined together in tightly knit groups, between which there are only looser connections” (Girvan & Newman, 2002, p. 7821). In the literature on scientific consensus, such a characteristic is called structural salience (Shwed & Bearman, 2010). A network with a high level of structural salience will have more isolated network clusters, impeding the sharing of information and knowledge at the whole-network level. Supplemental Appendix A.1 in the appendix details the hypothesis for each of these features.
Scholarly Reputation
Scholarly reputation is a key factor in building collaboration. Because star researchers disproportionately attract more attention than average, they can perform leadership roles in intellectual communities (Mulkay et al., 1975). They are the gatekeepers in a particular research field and have the authority to determine what work should be kept and what discarded, directing the formation of consensus (Cole, 1983, p. 138). They can also produce tight intellectual communities in which individuals hold similar beliefs (J. L. Martin, 2002). The relationship between star researchers and consensus can go both ways: On one hand, star researchers can efficiently increase consensus if they agree with each other; on the other hand, if star researchers disagree, they may pull academic communities toward different directions, making achieving consensus even harder.
Depending on how “starness” is defined, we measured (a) the average citations of scholars, testing whether consensus formation is a collective effort; and (b) centralization of scholarly citations, testing if consensus formation is directed by elite groups. Supplemental Appendix A.2 has the details.
Gender Differences
Gender is an important factor in building relationships. Because women perform better in social sensitivity (Bear & Woolley, 2011), female scholars are found to be superior in improving the quality of teamwork and in building consensus (J. R. Martin, 2018; Woolley et al., 2010).Furthermore, it is vital to examine gender inequality in our research field’s knowledge production. Female scholars have been reported to be at a disadvantage in male-dominated academic systems, with respect to productivity (Xie & Shauman, 1998), promotion and tenure (Durodoye et al., 2020; Weisshaar, 2017), self-citation (King et al., 2017), and research impact (Thelwall, 2018). Although the field of nonprofit and philanthropic studies has greater female presence than other fields of research (Brudney & Durden, 1993; Schubert et al., 2022), prestigious authorship positions remain more likely to be occupied by males (M. D. Evans, 2022). We have some scattered and anecdotal evidence showing that gender inequality in our field has been diminishing. However, we still need more timely and comprehensive assessment. Supplemental Appendix A.3 details the hypotheses.
Method
Data Compilation
We compiled our data sets based primarily on the KINPS (Ma et al., 2021). The KINPS is the largest English bibliographic database of its kind to date in nonprofit and philanthropic studies; it includes detailed information on 111,783 works published between the early 1920s and 2018 worldwide, including peer-reviewed journal articles, books, book chapters, and so on. The records are derived primarily from Scopus (https://www.scopus.com/; Baas et al., 2020), Philanthropic Studies Index (Lilly Family School of Philanthropy, 2020), and Google Scholar (https://scholar.google.com/). Technical details of KINPS are described in Ma et al. (2021). Bearing in mind that no database or data source is perfect (Martìn-Martìn et al., 2020; Tennant, 2020; van Eck & Waltman, 2019; Visser et al., 2021), we omit a summary of the details of KINPS to save space for elaborating the data compilation and validation in this project.
Table 1 summarizes the five data components used in this study and their sources. The abstracts of publications are from Scopus and Google Scholar. The authorship records are from Scopus and PSI. The citation relationships are from Scopus. The topics of articles are decided according to expert knowledge, automation and replicability, and instrumental validity (Supplemental Appendix B.1). Author’s gender is predicted using a probabilistic approach, manually validated, and statistically tested (Supplemental Appendices B.2 and E.2). We also employed multiple strategies to improve data quality (Supplemental Appendix B.3).
Data Components and Sources
Note.
Measures Overview
Table 2 gives an overview of the major constructs, measures, and corresponding hypotheses. Full details of these hypotheses and measures are in Supplemental Appendices A and C. The below subsections briefly review how we measured the dependent variable, consensus of shared language, and research topics. Measures of consensus used in other studies and methodological details are discussed in Supplemental Appendix C.1.
Overview of Theoretical Constructs, Operationalization, and Hypothesis Testing.
Note. Full details of these hypotheses and measures are in Supplemental Appendices A and C. LR = linear regression.
Measuring Semantic Similarity Using Natural Language Processing (NLP)
The NLP algorithms first need to convert texts to series of numbers (i.e., vectors). There are primarily two approaches to the vectorization process: the bag-of-words approach, which does not consider words’ meanings and orders, and the semantic embedding approach, which considers words’ semantic meanings and contexts (Jurafsky & Martin, 2022). The conventional linguistic measures reviewed in Supplemental Appendix C.1 (i.e., Herfindahl index and Shannon entropy of keywords) take the bag-of-words approach. Although better than indirect measures, they still suffer from two fundamental defects. First, they cannot consider synonyms and morphologically identical words. Scholars may use different keywords, or different spelling of keywords, to describe or label the same object of research. Second, they cannot distinguish homonyms by context. The same word can have different meanings in different contexts (e.g., “trust helps build social capital” and “funded by philanthropic trust”). Considering these linguistic caveats is methodologically essential.
The semantic embedding methods are the newest advances in NLP. They represent words in high-dimensional semantic spaces pretrained from large corpora (e.g., the entire corpus of Wikipedia). Take Figure 1 for example: The semantic distance between texts in subgraph A is smaller than that in subgraph B (Kusner et al., 2015; Sato et al., 2022). Words’ positions in the space can be fixed (e.g., “bank” in “Bank of America” and “river bank” has the same embedding vector) or contextual (e.g., “bank” in the previous example has different embedding vectors). These methods are not purely data-driven; linguistic theories underpin their validity (Jurafsky & Martin, 2022; Kozlowski et al., 2019, p. 931). Although social scientists have only recently started to apply the semantic embedding methods, these methods have shown strong novelty and validity in empirical studies (e.g., Kozlowski et al., 2019; Stoltz & Taylor, 2019; Taylor & Stoltz, 2020) and methodological tests (Rodriguez & Spirling, 2022).

Measuring the Semantic Similarity Between Texts
Measuring Consensus in This Study: Contextual Semantic Embedding
Our study employs the latest advance in semantic embedding (i.e., contextual semantic embedding; Devlin et al., 2019) to resolve the deficits of existing consensus measures. We measure the semantic similarity between two articles’ abstracts following two steps: (a) represent the abstracts as vectors using the contextual semantic embedding method, and (b) calculate the cosine similarity between the two vectors (Jurafsky & Martin, 2022). Our dependent variable, research consensus (Consensus), is operationalized by averaging the similarities of all abstract-pairs in a given time period. Mathematically, we can define
Estimation Strategy
We first examine the correlation between the explanatory variables because collinearity can reduce statistical power and increase the possibility of Type II error (i.e., failing to reject the null hypothesis when it is false). Supplemental Figure D1 in the appendix shows that the correlation coefficients between the explanatory variables are moderate (i.e.,
For the regression analysis, we first test all models at the topic level (Models 0–7). Model 0 includes only controls and topic-fixed effects to estimate the base line variance explained without independent variables (i.e., collaboration, reputation, and female presence). Models 1 to 3 estimate the direct relationship between the independent variables and consensus. To consider the confounding effects and avoid gender essentialism (Mavin & Yusupova, 2021; Nelson, 2014, p. 221), Models 4 to 6 allow us to test how the estimations of Models 1 to 3 change when adding another independent variable. Model 7 includes all explanatory variables.
Time is an important correlate of many of the explanatory variables and the dependent variable. If enough time lapses, researchers may achieve consensus regardless of scholarly efforts and network structure. An author’s reputation usually increases year by year, and social networks naturally form cliques as time passes. Therefore, the growth of research consensus may be a mere result of the passage of time. Model 8 adds time-fixed effects to Model 7 to take time into account. The motivation and causal graph underlying each of the models are detailed in Supplemental Appendix D.
Results
Consensus in Nonprofit and Philanthropic Studies
Consensus by Topic
As Figure 2 shows, the consensus levels of the top ten nonprofit and philanthropic studies topics have been generally increasing over time. It also shows both mean-level change and rank-order stability. The intraclass correlation coefficient

Consensus Levels of the Top 10 Nonprofit and Philanthropic Studies Topics by Time
Consensus Formation and Scholarship Accumulation
Does the increase in the volume of scholarship necessarily lead to cohesive knowledge? We can run a regression between the time unit and consensus at topic level to gain some insights.
3
The regression (within group
Consensus and Collaboration Size
Figure 3 plots all 35 topics by consensus and the number of coauthors per article. The graphed topics can be grouped into micro-, meso-, and macro-levels, which echo the Hierarchy of the Sciences. Micro-level topics primarily focus on the mechanisms of psychology 4 and management 5 and have the highest levels of consensus, reflecting that they are from “harder” sciences (Simonton, 2015). In the middle of the ranking are topics at the meso-level, where scholars focus on the inter-organizational and field analysis of nonprofits and NGOs. 6 At the bottom of the ranking are topics at the macro-level, where political aspects matter and scholars from the humanities are active. 7 Topics at this level are generally categorized as the “soft sciences” (Fanelli & Glänzel, 2013).

Consensus Level and Collaboration Size by Topic in Nonprofit and Philanthropic Studies
By fitting an ordinary least square (OLS) model in Figure 3, the model shows that the relationship between consensus and the number of coauthors per article is substantially positive (
Collaboration Networks: Are We a Community?
The research field of nonprofit and philanthropy attracts scholars from many other disciplines. Has the field become their primary scholarly home? Are there really “nonprofit scholars”? As we explained in the “Collaboration networks” section and detailed in Supplemental Appendix A.1, a connected scholarly community is an important condition for increasing consensus. Are we such a community yet? Figure 4 provides some affirmative evidence.

Average Connection of Authors by Year and Author Type
Most, if not all, social science disciplines have a list of core publication sources. Scholars who identify as researchers in an academic field often attempt to publish in the field’s core journals. If there is a community of nonprofit and philanthropic studies, we can expect that scholars in the community are more likely to publish in the field’s core journals and coauthor with those who also publish in these journals.
Figure 4 shows scholars’ connectedness in coauthor networks by author type. Scholars who publish articles in the 25 core journals of nonprofit and philanthropy are colored green (core authors; Ma & Konrath, 2018; D. H. Smith, 2013; Walk & Andersson, 2020), and those who cite these core articles but only publish elsewhere are colored blue (citing authors). The figure shows that the number of authors has been increasing, and the increase is stronger for core authors from 2001 onward, indicating that the field has organized itself more strongly since then.
The continuously increasing number of coauthors indicates that collaboration in studying nonprofits and philanthropy has become more popular over time. For example, each core author had 0.5 coauthor on average in 1971; this number increased to about 2.2 by 2013. This development is not unique for the field of research on nonprofits and philanthropy; teamwork has become popular across all disciplines (Wuchty et al., 2007).
Core authors became more connected than citing authors over time, suggesting that a sense of scholarly community is forming. In 1971, there was no substantial difference between the connectedness of core authors and that of citing authors. Until the mid 1990s, citing authors were more connected than core authors, but the difference started to decline from 1977 onward. Since the early 2000s, the core authors have an advantage, which continues to increase and suggests that a cohesive scholarly community studying nonprofits and philanthropy is emerging. 8
Gender Differences and Gendered Discourse
From a Kuhnian perspective, the field’s development can be periodized into preparadigm, paradigm building, and normal science (Ma & Konrath, 2018, p. 1145). In addition, according to the review section, we can identify 2010 as an important watershed during the normal science period—from 2010 onward, the field’s development became more diverse yet institutionalized.
Gender Differences in Citation and Scholarly Population
Figure 5 shows the numbers of citations (top; self-citations are removed) and authors (bottom) by gender and year. Gender differences started to emerge at the end of the paradigm-building period (i.e., the end of the 1980s) and were significant throughout the entire normal science period (i.e., 1990s to date)—Female scholars have been consistently fewer in number and less cited than male scholars, and the disparities became more stable and evident since 2010. These findings support our hypothesis of gender inequality (i.e., Hypothesis A.3.1).

The Number of Citations and Authors by Gender and Year
Gender Differences in Collaboration
Figure 6 presents the patterns of coauthorship by gender. Overall, (a) female scholars have slightly more coauthors than male scholars (top panel). (b) Female scholars have more female coauthors than do male scholars throughout the normal science period (middle panel). For female scholars, the proportion of female coauthors increased slightly, to over 40%. For male scholars, this proportion increased from 20% to close to 40%, indicating that male scholars collaborate with female scholars more often than a few decades ago (but male scholars still collaborate more often with male coauthors). (c) The percentage of solo-authored publications for both genders has been decreasing since 1990 (bottom panel). These results suggest that scholarly teamwork has been increasing for both genders.

Gender Interactions in Coauthor Networks
Gendered Discourse and Research Agenda
Figure 7 further breaks down the proportion of female scholars by research topics. The descriptive and preliminary results may serve as a stimulus for future studies, and a few trends are worth highlighting:
The largest 16 themes account for nearly 90% of the entire nonprofit and philanthropy literature, and only two of those topics (i.e., “voluntarism” and “volunteers”) ever had over 50% female scholars.
Although the prevalence of female authors has been increasing overall, there is a ceiling of about 40% for the largest research topics.
For most of the smaller topics (i.e., from Topic 17 to 35), the prevalence of females is higher than 50%, but these topics account for only 10% of the entire body of scholarship on nonprofits and philanthropy.
These observations suggest a gendered discourse and research agenda in our field, and we will address their implications in the discussion section.

Female Presence by Research Topic
Contributing to Consensus: Small-World Network and Scholarly Reputation
Figure 8 shows the selected results of standardized regression models introduced in the estimation strategy section. Small-world property of network and average scholarly reputation significantly correlate with consensus according to the full model (i.e., including topic- and time-fixed effects). The full results of all regression models are presented in Supplemental Appendix Table D1. Below we discuss the implications of the regression results in detail.

Selected Estimations of Primary Explanatory Variables
Network Structure
Topics with higher consensus are more likely to have larger teams of coauthors, supporting Hypothesis A.1.1. However, the significance of the positive association disappears after including time-fixed effects, probably because both consensus and team size increase over time, and the positive association results simply from the passage of time. The coefficients of structural salience are not significant in any of the models, suggesting that the presence of scholarly camps neither increases nor decreases research consensus. The small-world property has a positive association with consensus, and this association is significant in all models.
Scholarly Reputation
The centralization of reputation is not significant in any of the models, suggesting no substantial association between star researchers and consensus levels. However, the average reputation of scholars studying a research topic is significant in all models, suggesting a positive association between overall scholarly reputation and research consensus.
Gender
Female presence has a substantial positive association with consensus in all models without time-fixed effect. However, the significance disappears once the model includes time-fixed effects. This indicates that female presence and consensus both increase over time, but do not influence each other.
Interpretation on Estimation Coefficients
The full model (i.e., Model 8) can explain half of the variance in consensus (
Checking Robustness
We checked the robustness of our estimations and found them to be robust indeed with respect to statistical sensitivity and data quality. Statistical sensitivity is checked using post-estimation tests and winsorization of extreme values. Data quality, which tests the robustness of gender prediction, is checked by an alternative imputation method. See Supplemental Appendix E.
Discussion
What drives scholars to share research interests and language, thereby forming a common academic agenda and discourse in this emerging field of research? We found that the levels of consensus for all major topics have increased as the volume of literature has grown over time: For each research topic, a 10% growth of the volume of literature correlates to a 1.4% growth of sharing language and research interests. The citation count of scholars and small-world property of networks are positively associated with consensus, suggesting that star researchers and knowledge brokers bridging different intellectual communities are key to sharing research interests and language. The following sections discuss the implications for field development, faculty promotion, and gender equity. We conclude this article with a discussion of limitations and future studies.
Improving Consensus: Build the Research Field as a “Small World”
We offer two explanations for interpreting the relationship between small-world network and research consensus. First, building scientific consensus requires more rapid information diffusion and communication, which is an important feature of small-world networks. Second, scholars in research fields with greater consensus may be easier to collaborate with, therefore leading to the formation of a small-world community. The two explanations do not conflict but indicate different causal directions.
Building our research field like a “small world” may be an effective method for increasing the consensus in nonprofit and philanthropic studies. To achieve this goal, it is especially important to facilitate collaboration among scholars across institutions because communication across institutional lines is harder. It is also important to cultivate boundary spanners because they can significantly reduce the distance between scholars in different intellectual camps.
Academic associations and research centers are important driving forces for developing our research field (Abramson & McCarthy, 2012, p. 429; Prentice & Brudney, 2018, p. 54; Rooney & Burlingame, 2020). They are also important for building a small world because research centers and associations are primary actors “in the process of collectively defining and redefining the institutional logic of a professional field” (Greenwood et al., 2002, p. 76). A survey in 2013 found about 55 research associations focused on nonprofit and philanthropic studies around the world, but most of them are national rather than multinational (D. H. Smith, 2013, pp. 641–643). Most of these associations facilitate academic conversations domestically. However, there should be more initiatives focusing on exchange among members of associations in different countries so that this research field can become truly global (Wiepking, 2021). As online and hybrid meetings have become popular in the post-pandemic world, consensus may form faster or slower (de Leon & McQuillin, 2018)—an interesting and timely proposition for future studies.
In addition, we should also seek opportunities to minimize the distance dividing scholars active in our field from those in other disciplinary associations. Good channels for doing so are sections or groups organized in other associations, such as the long-standing nonprofit sections in the Network of Schools of Public Policy, Affairs, and Administration (NASPAA) and the Civil Society, Policy, and Power group in the American Political Science Association.
We must emphasize that a “small world” is not a “clustered world.” In a clustered network, individuals within a clique are densely connected, but they are loosely or rarely connected with those outside of that clique. Such a network feature is measured as “structural salience” in this study. Although we found no evidence showing that salience decreases research consensus, it may foster a contentious academic community instead of a collaborative one.
Scholarly Productivity and Citation as Promotion Standards
Scholarly productivity and citation are also associated with levels of research consensus. The average number of coauthors within different topics varies from about 1.5 to 2.9 (Figure 3). As a result, the productivity of scholars studying different topics may also differ considerably: Intellectual collaboration can increase individual productivity, even when collaborative output is discounted by the number of authors (Ductor, 2015, p. 398).
The average scholarly citation count can also vary by level of consensus: The counts of high-consensus disciplines (e.g., “Biology&Biochemistry”) are higher than those of low-consensus disciplines (e.g., “Social Sciences”; Iglesias & Pecharromán, 2007, p. 310). This principle applies as well to the different topics within nonprofit and philanthropic studies: The average citation count is positively associated with the consensus level.
We offer two explanations for this association. On one hand, a Kuhnian reason is that scholars become more productive when research consensus improves, leading to more citations on average (J. H. Evans, 2007). On the other hand, a higher average citation count for a topic in nonprofit and philanthropic studies may indicate the presence of more scholars proficient on that topic, and the increased research consensus grows from those scholars’ collective effort.
Both scholarly productivity and the average number of citations vary between research topics. Given the interdisciplinary nature of the field, it is important to consider nonprofit scholars’ research themes while evaluating their performance so that they can be compared with kindred groups (if they are to be compared). A survey of 273 member institutions of the NASPAA reveals that the quantity of academic publication is overrated compared with quality in evaluating a faculty member’s promotion and tenure, rendering scholarly productivity the most important tenure standard (Coggburn & Neely, 2015, p. 206). Another quantitative indicator often used in considering promotion and tenure is the h-index, an improved measure based on citation counts (Hirsch, 2005).
The filed of nonprofit and philanthropic studies needs to be extra cautious when using these quantitative measures, not only because of concerns about their quality, but also because productivity and citation measures are a function of research topic. Recognizing the perverse incentives of the impact factor and other citation metrics, many institutions around the globe have signed the San Francisco Declaration on Research Assessment (https://sfdora.org/), which requires signatories not to use such metrics for tenure and promotion decisions.
Gendered Research Discourse and Academic Culture in Nonprofit Studies
The gender gaps in our research field are evident. While the number of male scholars has surpassed the number of female scholars only since the early 2000s, female scholars have been consistently less cited than males since the late 1990s. Of the largest 15 research topics that account for 90% of the entire nonprofit and philanthropic studies literature, only two ever had more than 50% female scholars. Along with the institutionalization of this field since 2010, these disparities stabilized instead of being alleviated.
Despite these disadvantages, the presence of female scholars and research consensus are positively associated. For a given research topic, a larger percentage of female scholars is substantially associated with a higher level of research consensus. But we also observe that this association becomes insignificant after including time-fixed effects. This probably indicates that the academic productivity of female scholars increased over the same period during which consensus also increased, perhaps as a result of the same historical conditions and trends.
Instead of treating gender as a cause for gender inequality, scholars started to examine the external environment and discourse in which gender differences are embedded. For example, women’s disadvantaged position can be attributed to masculine organizational culture and “micro-political practices” that are competition- and output-driven rather than valuing collaboration and relationship-building (Teelken et al., 2021, p. 842). As a result, the standards for evaluating faculty members favor male scholars (O’Connor et al., 2020) and are the “leading culprit” for gender inequality in the promotion and tenure process (Weisshaar, 2017, p. 554).
We observe numerous factors contributing to a male-dominated research discourse and a gendered academic culture, indications that our research field needs to be extremely cautious. Although the field has become more diverse since 2010, the majority of the largest research topics are still dominated by male scholars, and female scholars have been consistently less cited. There is also considerable evidence of gender inequality in the practices of the nonprofit sector, showing that men disproportionately take more upper-level management positions than women (M. D. Evans & Knepper, 2022; Gibelman, 2000). Ironically, despite these facts, women can strengthen the nonprofit sector in practice, according to Themudo (2009) and increase research consensus in academia, according to our analysis.
Limitations and Future Studies
Computational social science methods have been trending in the past decade, but they are still relatively new compared with conventional research methods (e.g., surveys and experiments) and not widely applied in the research field of nonprofit and philanthropic studies (Ma et al., 2021). Although a few studies in this research field adopted NLP methods and suggested good validity (e.g., Brandtner, 2021; Ma, 2022; Paxton et al., 2020; Wasif, 2021), the newest advances in NLP are still in an early phase of development (Rodriguez & Spirling, 2022). Therefore, computational social scientists should exercise caution and always validate the measures and test the robustness of estimation results. We call for future applications of the methods we have used here for the analysis of consensus in other fields of research.
The theorization of this article may have causal implications, but we want to reemphasize that our findings are associational and may not be causal. Future studies can better understand the causal mechanisms by adding author level to the analysis and by using more sophisticated strategies for causal inference. The analysis should also include the individual level so that the heterogeneity of scholars can be considered. For example, personal traits such as self-esteem, extroversion, and emotional stability are likely to be correlated with collaboration and consensus formation, but these factors cannot be considered at an aggregated level. A critical prerequisite for this improvement is construction of a dependent variable that can measure the degree of consensus of each of the scholars while also accounting for thematic differences. To allow for causal inference, experimental studies, particularly survey experiments, will be especially helpful. Moreover, qualitative studies can help unpack the causal mechanisms behind positive findings, confirming or amending the theorized findings of quantitative studies.
As acknowledged earlier, this study only considers “discursive consensus”—“an agreed upon language to describe the phenomena” in a research field (J. H. Evans, 2007, p. 2). Future projects could take two distinct approaches: (a) Studying “substantive consensus”—the acceptance of certain interpretations and theories by the majority of an academic community. Since even state-of-the-art computational methods cannot distinguish between contentious and concurring use of language, manually coding a selected number of topics is most promising for studying substantive consensus. (b) Examining the diversity of this interdisciplinary field and its relation to other disciplines (e.g., LePere-Schloop & Nesbit, 2022; Schubert et al., 2022). Either of these approaches can facilitate our understanding of the field’s intellectual growth and its institutionalization.
Last but not least, our data set is composed only of English bibliographic records, but most of the world’s people are not from Western, educated, industrialized, rich, and democratic (WEIRD) societies (Henrich et al., 2010). As Ma and Konrath (2018) and Wiepking (2021) have pointed out, the study of nonprofits and philanthropy has long suffered from skewed geographical representation. A few projects have analyzed the non-English literature of nonprofit and philanthropic studies and compiled promising data sets for future analysis (e.g., Zhang & Guo, 2021). Furthermore, the newest advances in NLP have made the analysis of multilingual corpora possible (Devlin et al., 2019). Because of improvements in both data and methodology, we can be optimistic that multilingual studies of nonprofit and philanthropic studies literature are in the making.
Supplemental Material
sj-pdf-1-nvs-10.1177_08997640221146948 – Supplemental material for Consensus Formation in Nonprofit and Philanthropic Studies: Networks, Reputation, and Gender
Supplemental material, sj-pdf-1-nvs-10.1177_08997640221146948 for Consensus Formation in Nonprofit and Philanthropic Studies: Networks, Reputation, and Gender by Ji Ma and René Bekkers in Nonprofit and Voluntary Sector Quarterly
Footnotes
Acknowledgements
We thank Angela M. Eikenberry, ChiaKo Hung, Christopher R. Prentice, Curtis Child, David Hammack, Dwight F. Burlingame, Francisco Santamarina, Frederick Lane, Lindsey M. McDougle, Marlene Walk, Melissa V. Abad, Pamala Wiepking, Patrick M. Rooney, Peter C. Weber, Peter Schubert, Ram Cnaan, and Rikki Abzug for their kind and constructive comments and suggestions. We also thank the attendees of the 2021 NACC Summer Research and Administration Summit, the Philanthropy Research Seminar at VU Amsterdam, and the 2021 ARNOVA Annual Conference. We thank Meiying Xu and Yan Wang for their research assistance and Kate Hartford and Jing Liu for editing and proofreading. We thank Susan Phillips, Joanne Carman, and Jaclyn Piatak for their editorial support and three anonymous reviewers for their insightful and constructive comments.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The project is partly funded or supported by (a) the Academic Development Funds from the RGK Center, (b) a 2021-22 PRI Award from the LBJ School, (c) library resources through the IU Lilly Family School of Philanthropy, and (d) cloud computing resources through the Texas Advanced Computing Center at UT Austin (Keahey et al., 2020).
Supplemental Material
Supplemental material for this article is available online.
Notes
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
