Sage Journals: Discover world-class research

Abstract

Research in the global field of artificial intelligence is increasingly hybrid in orientation. Researchers are beholden to the requirements of multiple intersecting spheres, such as scholarly, public, and commercial, each with their own language and logic. Relatedly, collaboration across disciplinary, sector and national borders is increasingly expected, or required. Using a dataset of 93,482 artificial intelligence publications, this article operationalises scholarly, public, and commercial spheres through citations, news mentions, and patent mentions, respectively. High performing publications (99th percentile) for each metric were separated into eight categories of influence. These comprised four blended categories of influence (news, patents and citations; news and patents; news and citations; patents and citations) and three single categories of influence (citations; news; patents), in addition to the ‘Other’ category of non-high performing publications. The article develops and applies two components of a new hybridity lens: evaluative hybridity and generative hybridity. Using multinomial logistic regression, selected aspects of knowledge production – research context, focus, artefacts, and collaborative configurations – were examined. The results elucidate key characteristics of knowledge production in the artificial intelligence field and demonstrate the utility of the proposed lens.

Keywords

Artificial intelligence hybridity knowledge production research value bibliometrics

Introduction

Artificial intelligence (AI) research has captured the attention of national research funders (Abadi et al., 2020), the private sector (Bughin et al., 2017), the news media (Chuan et al., 2019), social scientists (Frank et al., 2019), and public policy experts and decision-makers (Jobin et al., 2019). In this context, the AI research field has come to be of scholarly and national interest. Governments have adopted national strategies to support local AI research, often with a focus on private–public partnerships (Saran et al., 2018). Meanwhile, critical scholars have highlighted the ways AI technologies are changing scientific practices (Echterhölter et al., 2021), and questioned the role of corporate funding in shaping the AI research field (Ahmed and Wahed, 2020; Hagendorff and Meding, 2021), and, more broadly, have been alert to the changing political economy of knowledge production in the context of intertwining university–industry relationships (Crespo and Dridi, 2007). As Frank et al. (2019: 1) observe, ‘AI research is increasingly focused on engineering applications – perhaps due to the increasingly central role of the technology industry’.

In this study, we conceptualise the AI research field as sitting at the intersection of multiple spheres, each with their own priorities and logics. Building on existing concerns regarding university–industry relationships, and the public policy import of AI research (Schaich Borg, 2021; Fournier-Tombs, 2021), we focus on three spheres of interest: scholarly, public, and commercial. Divergence in the logics of these spheres occurs at two scales. First, at the scale of individual researchers, how individuals are situated in relation to these spheres will vary from researcher to researcher. For example, those working in the academic sector are buffeted by these spheres differently to those in the commercial sector. Second, at the scale of research institutions, how these three spheres are embodied and reflected will vary from institution to institution. Most pertinently, the criteria by which research institutions value and evaluate research outputs are influenced by the ways in which institutions are situated within these spheres. For example, in traditional academic institutions, legitimacy is gained by producing work that is valued by other academics, primarily measured through citation-based metrics (e.g. the h-index). In commercial research laboratories, by contrast, legitimacy may be gained in part by producing work that is valued by other academics, but also by producing work that can be successfully commercialised, which requires attention to other measures of success. At both scales, these overarching spheres shape knowledge production practices and outcomes.

The AI research field extends well beyond the scholarly world, with AI researchers situated within and outside academia engaging with the logics of multiple spheres. AI researchers are increasingly oriented to users and developers of technology, to commercial contexts and industrial labs, to policymakers and the public, and a range of mediators that fall in between. Published AI research must be flexible enough to inhabit these multiple intersecting communities without relinquishing academic credibility (Star and Griesemer, 1989). Accordingly, when quantifying performance, it becomes necessary for AI research institutions to look beyond the scholarly sphere. Consequently, metrics designed to capture commercialisation, investment, intellectual property, and product-use gain importance in organisational settings. The divergent logics of different spheres produce organisational hybridity, or ‘the mixing of core organisational elements that would not conventionally go together’ (Battilana et al., 2017: 129). Within these hybrid forms, there are a range of potential principles around what constitutes valuable and worthwhile research, as well as a range of potential principles of evaluation (Stark, 2011).

We analyse the relationship between the properties of research publications produced in the AI field, and their performance across three categories of influence, corresponding to the three spheres identified above. We draw on the Scopus AI research dataset (Siebert et al., 2018),¹ which consists of approximately $700, 000$ publications indexed by Scopus between 1998 and 2017. We augment the dataset with metrics from Altmetric² that capture online activity surrounding the research publications, including news media coverage and mentions in patent filings.

We adopt Elsevier’s broad definition of the AI field, which includes research on high-level AI techniques, the application of these techniques in various domains, and social science research on their use and influence (Siebert et al., 2018). This approach reflects our interest in studying the influence of AI research beyond the scholarly sphere, which occurs at both the high-level, and in application domains (Cockburn et al., 2018; Gargiulo et al., 2022). Finally, we note that defining AI itself remains a challenge for policymakers and researchers (Samoili et al., 2021), and so favour a broad definition of the AI field, rather than one grounded in a particular conceptual definition of AI technology itself.

Hybridity

There is a well-established literature on hybridity. One fruitful conceptualisation takes hybrid sites as those that integrate cultural patterns of values, beliefs and practices that arise from multiple field or societal-level logics. Thornton and Ocasio (1999: 804) describe sets of ‘assumptions and values, usually implicit, about how to interpret organisational reality, what constitutes appropriate behaviour and how to succeed’. For example, social enterprises combine the logic of the market with the logic of the third sector (Mikołajczak, 2020), while state owned enterprises combine bureaucratic and market logics (Thornton et al., 2012).

Although far less developed, there is also a specific literature that considers hybridity in research settings. In the broadest sense, this literature focuses on sweeping changes to the nature of science. This work highlights a shift from established disciplinary categories to increasingly nebulous, ill-defined areas of study. Proponents of ‘Mode 2’ or ‘post-academic science’, argue that the 1990s saw the emergence of a new form of knowledge production that is increasingly applied, trans-disciplinary, and transient (Enders and De Weert, 2004). They observe a broader, more heterogeneous set of researchers working across disciplinary boundaries on specific local problems with a greater focus on social accountability and reflexivity. They also highlight the involvement of a more diverse set of organisations and institutions that display great breadth in funding patterns with a diverse range of requirements and expectations.

This broad shift has seen a necessary integration of distinct logics or institutional cultures. For example, the ‘entrepreneurial science’ model holds that the convergence of scientific and economic arenas has led researchers to harbour a desire to market their research (Albert, 2003). Similarly, the ‘academic capitalism’ model highlights increasing competition for external funding and increasing market activities (Ylijoki, 2003). Other researchers have examined settings where two distinct logics or institutional cultures are combined. Adler, Elmquist and Norrgren (2009) explore how research managers negotiate competing logics of independence, sustainability and freedom on the one hand, and the logic of integration, relevance, and predictability on the other. Mirowski (2018) examines how the logic of the media and public space are combined by open science proponents to make research more accessible, open, and reproducible.

At the scale of the researcher, to work productively in fields that are subject to the logics of multiple spheres, researchers engage in practices of hybrid knowledge production. Such practices require researchers to extend themselves beyond the production of scholarly publications, to include the translation of scholarly work into content digestible in commercial and public sectors, and the use of scholarly work in efforts designed to shape public policy, obtain commercial impact, and, ultimately, to gain the resources and legitimacy needed to continue scholarly work (Williams, 2022). For example, a researcher in the AI field may disseminate their scholarly outputs in conference proceedings, blogs and social media content, and direct engagement with commercial firms. In doing so, credibility is sought across multiple spheres, with different strategies enrolled for each sphere, responding to the different logics and symbolic resources dominant in each. Successful hybrid knowledge production thus involves balancing the production of a range of symbolic resources and necessitates competence within the academic system and beyond it.

At the scale of the research organisation, hybridity emerges as a response to the reliance on both material and symbolic resources from external sources. These sources can include the overarching institution within which a research unit operates, as well as other sources in their network. Accordingly, they strive to be judged as acceptable and legitimate by external constituents, including funders, target audiences and the public, as well the internal academic community. In spanning multiple social spheres, hybrid knowledge producers can face competing or incommensurate external expectations (Battilana et al., 2017). This can lead to difficulties creating legitimacy due to the challenges of satisfying multiple groups simultaneously. Yet, when hybrid knowledge producers gain legitimacy across multiple groups of constituents, they can potentially generate a greater and more diverse resource base than those that represent a single type of logic. However, tracking and measuring the influence of hybrid research on multiple constituencies is not straightforward. For example, as noted above, private sector actors are incentivised to understand how their technology investments perform across commercial markets, whereas academic actors are oriented to influence as indicated by scholarly metrics. These aspects of performance, and others, come together in hybrid contexts. Accordingly, research organisations often take stock of economic or statistical indicators of income or intellectual property or have researchers generate broad narratives about how their research has led to tangible societal or commercial impact (Williams, 2020).

Bibliometric studies of the AI field

This study builds on, and extends, existing bibliometric analyses of impactful research publications in the scholarly sphere. Bibliometric studies aim to quantitatively understand the influence of scholarly outputs (Roemer and Borchardt, 2015), and form part of established research evaluation practice (Williams, 2020). Typically, these studies focus on ‘performance analysis’, which uses citation and authorship data (Lawani, 1981) to examine the contributions of research constituents (e.g. authors, institutions, countries, and journals) to a given field.

Niu et al. (2016) explore co-authorship patterns in the AI research field through bibliometric analysis of publications from 1990 to 2014. They identify high performing authors and articles by reference to academic citations, and find that the AI research field is structured by clusters of researchers in the United States, Europe, and East Asia, with cooperation between researchers tending to happen intra-nationally, rather internationally. Hagendorff and Meding (2021) also explore co-authorship patterns in the AI research field. Although their focus is on industry, academic, and academic-corporate collaboration, their analysis defines high performance only by reference to academic citations. They find that papers produced by industry-affiliated authors have higher citation rates than those produced by academic-affiliated authors, and that the topics covered in academic papers tend to lag two years behind the topics covered in industry papers. And, Gargiulo et al. (2022) explore co-authorship patterns across the AI research field through a novel analysis of authors’ disciplines (as indicated by the discipline of the venues they publish in), finding that as the AI field moves between phases of contraction and growth the field shifts between its ‘native’ disciplines in computer science, mathematics, and statistics to more interdisciplinary applied disciplines.

Bibliometric analyses also explore the relationship between academic and commercial institutions. Klinger et al. (2020) approach the AI research field from the perspective of directed technical change research, conducting an analysis of pre-print papers on the arXiv platform to discern thematic composition of research outputs. Drawing on author affiliation data, they find that researchers in the private sector are more narrowly focused on deep learning techniques than those in the academic and public sectors. Ahmed and Wahed (2020) consider the extent to which large firms dominate AI research by studying the institutional affiliations of papers published in leading computer science academic venues. They find that 2012 was an inflection point in the AI research field, with the rise of deep learning techniques driving an increase in compute-intensive research, which concentrates research activities in resource-rich elite universities and large technology firms. Through analysis of papers presented at the three major machine learning conferences from 2015 to 2019, Hagendorff and Meding (2021) find an increasing trend in papers presented by authors affiliated with both academia and industry, representing the intertwining of academic and corporate research settings. Similarly, Frank et al. (2019) analyse citation patterns and referencing relationships in computer science subfields that are relevant to AI. They find a consolidation, over time, of influential AI research in industry ‘hubs’ and identify a trend of decreasing reference rates from fields outside of AI to AI research, which they suggest is indicative of the broader academic community struggling to keep up with the pace of AI research.

Existing bibliometric analyses thus give some insight into the increasingly hybrid nature of knowledge production in AI research. To date, there has not, however, been adequate analysis of the extent of hybridity in the field. This article aims to unpack the features of hybrid knowledge production inherent to AI research.

Altmetric studies of the AI field

Given hybrid researchers are increasingly beholden to new expectations, roles, and responsibilities outside the academic system (Bastow et al., 2014), it is necessary to capture the reception of research from a more diverse set of constituents than is offered by bibliometric analyses alone. This is especially pronounced in AI research, where researchers simultaneously straddle the performance requirements of academically credible, commercially viable and widely interpretable work.

Patent data has been used by researchers interested in the relationship between academic publications and innovation. These studies highlight the significant commercialisation of AI research (Baruffaldi et al., 2020; Li and Lei, 2021), and the close intertwining of open science and proprietary technology in AI development (Kazuyuki, 2020). Patent analysis is used to identify patterns in a particular region, examine novelty and quality, and highlight technological needs and gaps (Abbas et al., 2014). A study of $123, 500$ AI-related patents finds that the US had the highest share of AI-related patents, followed by Japan, and Korea. The study identifies data processing, data detection and display, and electronic communications as key application areas (Abadi and Pecht, 2020). A study of four machine learning approaches using keyword searches shows that patent growth in deep learning has been exponential since 2014, and that the majority of deep learning patents are held by large companies (e.g. IBM and Siemens), followed by universities (collectively), and then big technology companies (e.g. Apple and Google) (Haney, 2020). One study that linked AI research articles with US patent data finds that authors from private firms account for less than $10 %$ of AI papers, but when considering authors with patents, private firms account for nearly $20 %$ (Motohashi, 2018). The author concludes that open science and innovation are co-occurring, suggesting integration of scientific and commercial logics.

Another avenue that generates fruitful insights on the AI field is media studies. These studies take discussion in news outlets as a key indicator of public interest and influence. Brennen et al. (2022) show that almost $60 %$ of news articles in six British mainstream news outlets related to industry products, initiatives, or announcements. They further find that over a third of unique sources across all articles are affiliated with industry, which was double those from academia, and six times as many as those from the government sector. Meanwhile, Fast and Horvitz’s (2017) study of 30 years of AI coverage in the New York Times shows that discussion of AI has risen sharply since 2009, and with consistently more optimism than pessimism. And, Chuan et al.’s (2019) investigation of discussion of AI in American newspapers finds that business and technology were the most salient topics in news coverage and a prevalence of discussion of the benefit of AI compared to the risks. These studies provide a broad understanding of AI as an important public issue. However, by utilising a top-down media approach, they do not explicitly examine how scholarly outputs come to influence public discussion, thus limiting the insights that can be gained into knowledge production in AI.

Studies have also expanded to incorporate alternative metrics (‘altmetrics’) that attempt to reflect the influence of scholarly outputs beyond traditional academic venues (Bornmann, 2014). Altmetrics have emerged to measure the discussion of scholarly outputs on social media and in news media, and to make the most of the transition to online working environments by researchers, which enable the tracking of a new range of interactions with scholarly outputs (e.g. downloads and digital readership) (Haustein et al., 2014). Relatively few studies of the AI research field attempt to contextualise or extend insights garnered from bibliometric data sources with altmetrics, although some acknowledge the potential utility of this (Klinger et al., 2020). Zhang and Dafoe (2020) find that altmetric attention scores for AI publications have increased dramatically between 2011 and 2017, indicating greater public attention to the field. Zhao et al. (2018) use altmetric and bibliometric data to compare the performance of publications authored solely by university affiliated researchers, industry affiliated researchers, and collaborations between the two. They find that industry publications receive more attention online than university authored papers. Our study builds on this work, by expanding the comparison to include academic, corporate, government, and medical sectors, and by expanding the breadth of altmetrics data considered to include in-patent citations as well as media mentions. Although altmetrics cannot claim the same clear meaning or credibility as bibliometrics, we suggest that careful selection of indicators within a theoretical framework permits useful insights into hybrid knowledge production.

In this study, we integrate patent and news citation data into our bibliometric analysis. Doing so enables us to consider the relationship between potential signifiers of value across the three spheres of interest (scholarly, public, and commercial), and to directly address questions of hybridity in our analysis.

A new lens on hybridity in AI research

Taking the multiple forms of research that are produced in the AI field, and the multiple criteria of evaluation that can be applied, we propose a new hybridity lens for understanding knowledge production in AI research. Our hybridity lens brings into focus critical questions regarding how institutions and individual researchers manage the competing logics of different spheres, and how these spheres together help shape the characteristics of the AI field. Table 1 illustrates the two components to the hybridity lens: evaluative and generative. In the first, the evaluative component, we examine the knowledge production features (research context, research focus, and research artefacts) of high performing publications across three different spheres of interest (scholarly, public, and commercial). In the second, the generative component, we consider the multiformity of selected features of knowledge production (numbers of authors, institutions, sectors, regions and funders, and disciplines). The evaluative component thus identifies and interprets what research is influential across multiple spheres, whilst the generative lens explicates the conditions under which this influential research is produced. We note that these components are intended to function as analytic tools for exploring knowledge production, rather than as a comprehensive description of the knowledge production process.

Table 1.

The two components of the hybridity lens.

	Evaluative hybridity	Generative hybridity
Goal	Identify and interpret what kind of research is influential across multiple spheres $^{a}$	Explicate the conditions under which this influential research is produced
Approach	Examine knowledge production features of high performing publications across different spheres of interest	Examine multiformity of selected features of knowledge production for high performing publications across different spheres of interest
Selected knowledge production features	Research context (e.g. author, institution, sector and status of institution, and funder)	Collaborative configurations (e.g. number of authors, institutions, sectors, regions, and funders)
	Research focus (e.g. field(s) of research and discipline)
	Research artefacts (e.g. publication venue, open access status, and publication rank quartile)

$^{a}$ Once specific spheres of influence have been selected, the hybridity lens can be applied. See Table 2 for details of the three spheres of influence selected in this study and their operationlisation.

The research context aspect is concerned with the institutional and situational features that govern the type of research that is encouraged, facilitated, and rewarded. This includes the sector of the authors, demarcated in our dataset as academic, corporate, government, medical, or other, and the ranking and region of the authors’ affiliated institution. The evaluative component of the hybridity lens applied to research context considers, for example, how different institutions situate themselves within multiple spheres and define impactful research accordingly. Meanwhile, the generative component of the hybridity lens applied to research context considers, for example, how increasing the diversity of authors across institutions or sectors affects the performance of a publication.

Research focus is concerned with the content of the work that is generated and published by researchers. This includes the primary discipline that the research intends to speak to (e.g. computer science or mathematical sciences). Here, the evaluative component of the hybridity lens considers the relationship between publication success across multiple spheres of interest and the disciplinary focus of the research; the generative component considers, for example, how different arrangements of inter-disciplinary collaborations may enable different research focuses to emerge.

Lastly, the research artefacts aspect is concerned with the characteristics of research publications. This includes the type of publication (e.g. book or article), whether it is open access or behind a paywall, and the rank quartile of the journal. The evaluative component of the hybridity lens considers the performance of these factors across the key spheres of interest; the generative component considers, for example, whether performance of these factors is related to different collaborative configurations (e.g. multiple institutions, regions, funders, etc.).

The aim in applying an hybridity lens is to identify the underlying characteristics of research outputs that are highly influential in: the scholarly sphere, the public sphere, and the commercial sphere, and the combination of these. We take these measures as potential signifiers of value that can be used together to understand influence in research. In this view, citations represent a measure of value (i.e. academic credibility) from the scholarly sphere, news mentions represent a measure of value from the public/media sphere (i.e. visibility amongst those outside academia), and patent mentions represent a measure of value from the economic sphere (i.e. commercial value outside academia) (Williams, 2022). We note that each of these measures is only an imperfect proxy for value. Using this lens, the remainder of this article addresses the question: what are the knowledge production features of publications that are high performing across theoretically defined categories of influence?

Methods and analysis

Data extraction and enrichment

Two primary data sources were relied upon: Scopus and Altmetric. We used Scopus to extract standard bibliometric metadata for Elsevier’s large dataset of publications in the AI field, including descriptors, classifications, abstracts, and to explore their academic reception (citations).³ We enriched the Scopus dataset with data from Altmetric, an online platform that gathers mentions of outputs (via links and references) from non-academic sources (e.g. mainstream news media, patents, policy documents, social media). We limited our use of Altmetric data to Field of Research (FoR), for identifying the disciplinary classification of publications, and news and patent mentions, for exploring reception of publications in the public and commercial spheres.⁴

At the time of data extraction, the Scopus dataset of AI publications contained 726,158 scholarly outputs, including published articles and pre-prints.⁵ 31,134 of these did not contain metadata on the affiliation of authors, and were thus omitted from our sample. The sample was reduced to 296,351 outputs published between 2014 and 2019 (inclusive) to allow time for citations to accrue for more recent publications; and to mitigate against poor coverage of news and patent mentions before 2014. Overall, 28,374 duplicate records (i.e. records with the same DOI) were found in the dataset.⁶ After de-duplication (by retaining the record with the greatest number of citations), 267,977 scholarly outputs remained in our Scopus sample, which was then matched to the Altmetric database. In total, Altmetric returned matches for 94,686 scholarly outputs. 1,183 duplicate records were identified and removed. In addition, Altmetric erroneously returned $11$ records that did not match any scholarly outputs in our sample. These were also removed, leaving a final sample of 93,492 scholarly outputs with both Scopus and Altmetric metadata.

In instances where a scholarly output had multiple authors, metadata associated with all authors was extracted from Scopus. For author-specific metadata (e.g. institution, region, and sector), we used a simple script to generate numerical totals for each data point (e.g. number of authors and number of institutions). Altmetric’s news data is derived from a manually curated list of news sources. Altmetric captures mentions of scholarly outputs through news posts that contain direct links to the output, and through automated text mining.⁷ Altmetric’s patent citation data is derived from the IFICLAIMS patent database, and the Dimensions service links patient filings to scholarly outputs that are cited in them.⁸ Altmetric’s FoR data is also provided by the Dimensions service and utilises the Australian 2008 FoR schema.

Within the final sample, the distribution of academic citations, patent citations, and news mentions reflected high academic engagement with scholarly outputs, but only sporadic news media or patent engagement. 88% of outputs received at least one academic citation. Meanwhile, only 10% received at least one patent citation, and only 4% received at least one news mention.

Identification of influential scholarly outputs

Within our sample, influential scholarly outputs were identified by reference to their relative performance on three metrics: academic citations, news mentions, and patent mentions. The $99$ th percentile score was found for each of these metrics (given in Table 2), and all scholarly outputs in the $99$ th percentile of at least one of these metrics were isolated for closer analysis. These outputs were further subdivided into several mutually exclusive categories, representing influence across the respective spheres of interest (Table 2). We recognise that these metrics are not themselves indicators of valuable research, but rather indicators of research valued by multiple audiences.

Table 2.

Categories of influence selected for investigation.

Sphere(s) of interest	Category of influence	Category type	Description	Criteria	No. of outputs
Academic $+$ public $+$ commercial	HighAll	Blended	Publications in the $99$ th percentile in all three metrics	Over $324$ citations, five news mentions & three patent mentions	$37$
Academic $+$ public	HighCN	Blended	Publications in the $99$ th percentile in academic citations and news mentions	Over $324$ citations, five news mentions	$40$
Academic $+$ commercial	HighCP	Blended	Publications in the $99$ th percentile in academic citations and patent mentions	Over $324$ citations, three patent mentions	$236$
Public $+$ commercial	HighNP	Blended	Publications in the $99$ th percentile in news mentions and patent mentions	Over five news & three patent mentions	$22$
Academic	HighC	Single	Remaining publications that are in $99$ th percentile in academic citations	Over $324$ citations (and not already included above)	$620$
Public	HighN	Single	Remaining publications that are in $99$ th percentile in news citations	Over three news mentions (and not already included above)	$792$
Commercial	HighP	Single	Remaining publications that are in $99$ th percentile in patent citations	Over three patent mentions (and not already included above)	$640$
NA	Other	Other	All other outputs	Not selected	91,105

Data analysis

We draw on various data analysis techniques to explore evaluative hybridity in relation to three aspects of knowledge production (research context, focus, and artefacts). The primary unit of each analysis is individual research publication. For each aspect of knowledge production, we compare publications in the categories of influence to publications that have not been highly influential to date, which we group together under the category ‘Other’.

To assess the research context, we analysed the sector of the institution that authors are affiliated with, the ranking in Computer Science of authors’ institutions, and the region of the authors. Authors were allocated a sector based on publication metadata. Rankings were taken from the global Shanghai rankings⁹ and divided into five buckets: institutions ranked $1 - 10$ ; $11 - 50$ ; $51 - 100$ ; $100 +$ ; and, not ranked. Using the $p o l i s c i d a t a$ R package, the regions of authors were allocated by country of the author’s institution and divided into Africa, Asia, Australia and Oceania, Europe, Latin America, and North America. For research context, we analysed the first author of each publication, and compared this to an analysis of all co-authors of each publication.

To assess research focus, we analysed the FoR for publications. FoR is a hierarchical academic discipline classification system, in which the discipline of a publication is represented by a 5-digit code.¹⁰ The first two digits determine the highest level of categorisation in the system, which consists of $22$ different fields (e.g. Mathematical Sciences, Economics, Engineering, etc.). Publications may have multiple FoR codes associated with them. Where a publication was associated with multiple FoRs from within the same high-level FoR category, it was only counted once in that category.

To assess research artefacts, we analysed the bibliometric metadata of publications. We analysed the source type (book, book series, conference proceedings, journal, and trade publication), open access status (yes or no), and the quartile ranking of the publication venue (using SCImago Journal Rank provided by Scopus).

Finally, to consider generative hybridity, we identified several indicators of different collaborative configurations – number of co-authors, sectors, institutions, countries, funders, and FoR – and extracted values from publication metadata. As publications are our unit of analysis, where an individual publication had multiple co-authors with the same sector, institution, or country, the shared sector was counted only once (i.e. a publication with three co-authors, two from the academic sector and one from the corporate sector was recorded as having two sectors). This allowed us to compare the hybrid configurations underlying publications in the categories of influence to those underlying the ‘Other’ category.

We used multinomial logistic regression (MLR) to confirm the findings, as shown in Table 3.

Table 3.

Results of the multinomial logistic regression (reference: ‘Other’ category).

	Blended categories				Single categories
	HighAll	HighCN	HighCP	HighNP	HighC	HighN	HighP
Evaluative: Research context
Sector
Corporate	$2.140$ ***	$0.826$	$1.136$ ***	$3.403$ ***	$1.182$ ***	$0.130$	$0.907$ ***
Government	$0.615$	$0.604$	$0.506$	$2.758$ *	$0.158$	$0.042$	$(0.286)$
Medical	$(0.052)$	$0.593$	$0.473$	$3.692$ **	$(0.956)$	$0.349$	$0.130$
Other	$(0.778)$	$(0.963)$	$(2.609)$	$(0.547)$ ***	$(0.052)$	$(0.959)$	$(1.018)$
Institutional rank
$11 - 50$	$(0.031)$	$(1.018)$ *	$(0.803)$ ***	$1.002$	$(0.429)$ ***	$(0.869)$ ***	$(0.120)$
$51 - 100$	$(0.967)$	$(1.755)$ **	$(0.989)$ ***	$2.824$ *	$(0.478)$ ***	$(0.514)$ ***	$(0.068)$
$100 +$	$(1.778)$ ***	$(1.613)$ ***	$(1.161)$ ***	$0.683$	$(0.967)$ ***	$(0.657)$ ***	$(0.679)$ ***
None	$(1.281)$ *	$(2.002)$ ***	$(1.179)$ ***	$(1.760)$	$(0.984)$ ***	$(0.619)$ ***	$(0.330)$ *
Region
Africa	$(3.660)$ ***	$(0.379)$	$(0.285)$	$1.600$	$(0.878)$	$(0.928)$ **	$(0.029)$
Asia	$(1.099)$ **	$(24.653)$ ***	$(0.527)$ ***	$(12.225)$ ***	$0.213$ **	$(1.563)$ ***	$(0.045)$
Oceania	$(6.862)$ ***	$(1.181)$	$(1.420)$ **	$(6.531)$ ***	$(0.488)$ *	$(1.009)$ ***	$(1.749)$ ***
Europe	$(0.947)$ **	$(0.817)$ **	$(0.720)$ ***	$(0.370)$	$(0.303)$ ***	$- 0.939$ ***	$(0.199)$ *
Latin America/Carribean	$(4.709)$ ***	$(5.376)$ ***	$(2.327)$	$(4.974)$ ***	$(0.935)$ *	$(1.553)$ ***	$(1.374)$ ***
Evaluative: Research artefacts
Source
Book	$(3.191)$ ***	$(3.670)$ ***	$(3.065)$	$(2.146)$ ***	$(3.022)$	$(1.829)$ **	$(1.978)$ ***
Book series	$(20.394)$ ***	$(48.312)$	$(2.368)$ ***	$(50.365)$ ***	$(0.432)$ **	$(2.295)$ ***	$(3.273)$ ***
Conference proceedings	$(1.595)$ ***	$(2.870)$ ***	$0.492$ ***	$(30.607)$ ***	$0.565$ ***	$(2.699)$ ***	$- 0.316$ ***
Trade publications	$(0.262)$ ***	$(0.496)$	$(1.107)$	$(0.422)$ ***	$1.196$	$(3.022)$	$1.239$ *
Open access status
Open access	0.884*	0.625	1.358***	7.801***	0.805***	0.818***	(0.091)
Journal rank
Q2	$(17.421)$ ***	$(44.135)$ ***	$(0.851)$ ***	$(28.177)$ ***	$(0.731)$ ***	$(1.090)$ ***	$(0.227)$ *
Q3	$(17.729)$ ***	$(2.070)$	$(2.317)$ **	$(16.822)$ ***	$(2.129)$ ***	$(1.057)$ ***	$(0.605)$ **
Q4	$(5.310)$ ***	$(4.112)$ ***	$(3.594)$	$(6.244)$ ***	$(6.939)$ ***	$(1.354)$ *	$(2.064)$ *
Generative: Multiformity
Collaborative configurations
No. of authors	$0.017$ ***	$0.014$ *	$0.008$	$0.009$	$(0.022)$	$(0.004)$	$0.016$ ***
No. of institutions	$(0.507)$ **	$(0.027)$	$(0.117)$ *	$(0.075)$	$(0.074)$ *	$0.024$	$(0.177)$ ***
No. of sectors	$0.866$ *	$0.926$ ***	$0.431$ ***	$0.626$	$0.238$ **	$0.268$ ***	$0.285$ ***
No. of countries	$0.396$	$0.255$ ***	$0.150$	$0.188$	$0.229$ ***	$0.094$ **	$0.052$
No. of funders	$0.150$ *	$(0.024)$	$0.105$ **	$0.231$ **	$0.128$ ***	$0.129$ ***	$0.059$ *
No. of fields of research	$0.050$	$0.065$	$(0.174)$	$0.519$	$(0.015)$	$(0.056)$	$(0.260)$ ***
Constant	$(7.205)$ ***	$(6.669)$ ***	$(5.808)$ ***	$(17.954)$ ***	$(5.006)$ ***	$(3.741)$ ***	$(3.756)$ ***

* $p < 0.1$ ; ** $p < 0.05$ ; *** $p < 0.01$ .

Results

The evaluative hybridity component

The evaluative component of the hybridity lens considers the performance of publications across three different spheres of interest (scholarly, public, and commercial), which in this study are operationalised in terms of blended categories (i.e. high performance on all, high performance on two of the three spheres) and single categories (i.e. high performance on only one sphere). The results below detail each knowledge production aspect in turn: research context, research focus, and research artefacts.

Research context

The research context that emerges for the typical publication is unsurprising. Non-highly influential publications (i.e. the ‘Other’ category) are largely first-authored by a researcher with academic affiliation (88%), whose institution is ranked outside the top 100 Computer Science institutions (62.5%), and who is based either in Asia (39%) or Europe (34%). We found minimal differences between the research context of first authors compared to all authors. As such, we report only the results of first author analysis (although details of co-author analysis can be found below in Figure 2).

The research context, however, is more complicated when considering the categories of influence. Across all the categories of influence, the proportion of publications with first authors working in the academic sector declines, whilst the proportion of publications with the first authors working in the corporate sector increases. Indeed, whilst the first authors working in the corporate sector account for only $4 %$ of the first authors in the ‘Other’ category, they account for $27 %$ in the HighAll category, $18 %$ in the HighNP category, and $14 %$ in the HighCP category. Furthermore, publications that were influential in the blended categories of influence (HighAll, HighCN, HighCP, and HighNP) have a lower proportion of first authors affiliated with the academic sector ( $70 %$ ) than scholarly outputs that are influential in the single categories (HighC, HighN, and HighP) ( $82 %$ ).

The institutions of authors producing publications in the categories of influence also skew towards the top ranks of the Shanghai Computer Science rankings. Whilst only $7 %$ of the first authors in the ‘Other’ category are affiliated with an institution in the top $10$ of these rankings, $29 %$ of publications in the HighAll category are from first authors affiliated with a top $10$ institution. Institutional affiliation is particularly strongly affiliated with the categories of influence associated with citations. In the HighC, HighCP, and HighCN categories, first authors affiliated with an institution in the top $50$ of the rankings account for over $23 %$ of all publications, and this increases to $31 %$ in the HighAll category. This to be expected, given institutional rankings use citation data as an input.¹¹ Finally, it should be noted that one-third of the institutions affiliated with authors are not listed in the Shanghai Computer Science rankings. This reinforces, however, the centrality of computer science to the AI field.

Whilst the overall AI field is dominated by authors working at institutions in Europe, Asia, and North America (only $9 %$ of the ‘Other’ publications were from Africa, Oceania, or Latin America), the proportion of authors affiliated with institutions in North America increases significantly in all the categories of influence. Only $18 %$ of publications in the ‘Other’ category have first authors at North American institutions. In contrast, $68 %$ of the HighAll, $55 %$ of the HighCN, and $52 %$ of the HighN publications have a first author in North America. The proportion of authors affiliated with European institutions, meanwhile, falls across most categories of influence. $34 %$ of publications in the ‘Other’ category have first authors at European institutions, whilst only $22 %$ of the HighAll publications have European affiliations. The only categories of influence where the proportion of first authors with European affiliations increases are HighCN ( $35 %$ ) and HighNP ( $41 %$ ). News coverage of the AI field appears to be particularly concentrated on research produced at European and North American institutions: in the HighAll, HighCN, HighNP, and HighN categories of influence these two regions account for over $80 %$ of all publications. At the country level, it is clear that across all these categories, the vast majority of North American institutions are in the United States: for example, in the HighAll category $60 %$ of all co-authors are affiliated with a North American institution, and $91 %$ of these are based in the United States. There is a similar, although less pronounced, dynamic for institutions based in Asia: $42 %$ of all the co-authors affiliated with an institution in Asia are affiliated, in particular, with China.

As shown in Table 3, the MLR analysis confirms the above results. Compared to academic first authors, corporate first authors are significantly more likely to be in all categories of influence except HighCN and HighN; government and medical first authors are both more likely to be in HighNP. The results show a nuanced picture of the importance of sector across the categories of influence, with only eight statistically significant independent variables. This suggests the impact of sector on influence via patents and citations. Publications produced from institutions outside the top $10$ in the Shanghai Computer Science Rankings are less likely to be in the high citation categories (HighCP, HighCN, and HighC) or HighN, compared to the top $10$ . In addition, those with rankings of $100 +$ or not ranked are less likely to be in HighAll or HighP. Compared to North America, publications with first authors from Europe, Asia, Oceania, and Latin America are less likely to be in most categories of influence, except for Asia which is more likely to be in HighC. Compared to sector, there is less differentiation across the categories of influence for institutional rank and region. This may indicate that the same underlying factors are at play for high performing publications. However, close inspection reveals that correlates between single and blended categories are not always strongly pronounced, and in some cases are even reversed (e.g. HighCN vs. HighC and HighN). Table 5 in the Supplemental Material shows the full statistics described in this subsection.

Research focus

The research focus for the typical AI publication is also unsurprising. Results are reported in Table 6 in Supplemental Material. Most publications are categorised under Information and Computing Science in the FoR (54% of ‘Other’ publications), followed by Engineering (9%) and Medical and Health Sciences (8%).

When considering the relationship between the categories of influence and FoR, as shown in Figure 1, a slightly different picture emerges. The Information and Computing Sciences FoR is the dominant FoR across all categories of influence except for HighN and HighCP. In HighCP, HighC, and HighN the Medical and Health Sciences FoR was equally or more dominant. The only other FoRs to have greater than a $10 %$ share of publications in any category of influence were: Psychology and Cognitive Sciences (HighAll, HighCP, HighC, and HighN), Biological Sciences (HighN), and Engineering (HighCN and HighP). The Mathematical Sciences FoR is represented in only $6 %$ of ‘Other’ publications, and declines substantially in the categories of influence. The strong representation in the categories of influence of Information and Computing Sciences, Medical and Health Sciences, Psychology and Cognitive Sciences, and Engineering suggests that AI research focused on computer science, health and engineering applications has been particularly influential.

Figure 1.

The relationship between categories of influence and disciplines given by field of research (FoR) code.

Figure 2.

Proportion of sector, region, and Shanghai Computer Science Ranking based on affiliations of all authors.

Research artefacts

Unsurprisingly, most AI publications are journal articles ( $44 %$ of the ‘Other’ category), book series ( $34 %$ ), or conference proceedings ( $20 %$ ). In the categories of influence, however, only conference proceedings and journal articles are meaningfully represented. Indeed, in all the categories of influence, journal articles are the dominant research artefact ( $78 %$ in HighAll, $95 %$ in HighCN, $52 %$ in HighCP, and $100 %$ in HighNP). Conference proceedings are particularly prevalent in the categories of influence associated with the scholarly sphere: they account for $47 %$ of HighCP publications and $41 %$ of HighC publications.

In terms of open access, $59 %$ of publications in the ‘Other’ category are behind a paywall, whilst $41 %$ are open access. In contrast, in all the categories of influence, most publications are open access. Indeed, in the blended categories of influence (HighAll, HighCN, HighCP, and HighNP), over $70 %$ of articles are open access.

In terms of the quartile journal rank of publication venues, there is a roughly equal distribution of publications in the ‘Other’ category across Q1, Q2, and Q4 journals. In the categories of influence, however, the distribution of publications skews strongly towards Q1 journals. This trend is particularly strong for the categories of influence associated with the public sphere (in HighNP $95 %$ of publications are from Q1 journals, in HighCN $88 %$ , and in HighN $85 %$ ).

The MLR analysis confirms these results, as shown in Table 3. Compared to journal articles, books, book series, conference proceedings, and trade publications are less likely to be in most categories of influence, with the exception of conference proceedings which are more likely to be in HighCP and HighC, and trade publications which are more likely to be in HighP. Compared to Q1 journals, publications from Q2, Q3, and Q4 are less likely to be in all categories of influence. Indeed, as expected, the pronounced number of statistically significant results here suggest that influential publications may generally share the same research artefact characteristics (i.e. most influential publications are articles in Q1 journals). Detailed statistics are reported in Table 7 in Supplemental Material.

The generative hybridity component

The generative component of the hybridity lens considers the multiformity of selected features of knowledge production for high performing publications across three different spheres of interest (scholarly, public, and commercial), operationalised as above in terms of blended categories and single categories. Whilst the generative component can be considered on the level of research context, focus, and artefacts (as we do with the evaluative component), for conciseness, we present our preliminary findings across these three aspects together.

Our analysis of generative hybridity reveals a nuanced picture of multiformity of knowledge production features in the AI field.¹² Results are reported in Table 4. In terms of the mean number of authors contributing to publications, the categories of influence have more authors than in the ‘Other’ category. This was most pronounced in the high news categories – HighCN ( $13.9$ ), HighNP ( $11.5$ ), HighAll ( $7.9$ ), and HighN ( $7.3$ ) – compared to a mean of $4.2$ authors in the ‘Other’ category. Similarly, the mean number of institutions represented in publications in the categories of influence is greater than the mean number of institutions in the ‘Other’ category. This is also most evident in the high news categories – HighCN ( $5.3$ ), HighNP ( $3.5$ ), and HighN ( $3.1$ ) – compared to a mean of $2$ in the ‘Other’ category.

Table 4.

Application of the generative component of the hybridity lens: collaborative configurations underlying categories of influence.

	Blended categories				Single categories			Comparators
	HighAll	HighCN	HighCP	HighNP	HighC	HighN	HighP	Other	Total
	$(37)$	$(40)$	$(236)$	$(22)$	$(620)$	$(792)$	$(640)$	$(91, 105)$	$(93, 492)$
No. of authors
Mean (SD)	$7.9 (5.8)$	$13.9 (24.6)$	$5.2 (5.6)$	$11.5 (12.2)$	$4.6 (3.0)$	$7.3 (9.2)$	$5.8 (29.2)$	$4.2 (12.6)$	$4.2 (12.7)$
Min $-$ max	$1.0 - 21.0$	$1.0 - 143.0$	$1.0 - 76.0$	$2.0 - 57.0$	$1.0 - 37.0$	$1.0 - 130.0$	$1.0 - 740.0$	$1.0 - 2, 883.0$	$1.0 - 2, 883.0$
No. of institutions
Mean (SD)	$2.3 (1.2)$	$5.3 (11.9)$	$2.3 (2.1)$	$3.5 (3.0)$	$2.3 (1.8)$	$3.1 (3.5)$	$2.1 (1.6)$	$2.0 (1.9)$	$2.0 (2.0)$
Min $-$ max	$1.0 - 6.0$	$1.0 - 77.0$	$1.0 - 26.0$	$1.0 - 14.0$	$1.0 - 25.0$	$1.0 - 45.0$	$1.0 - 17.0$	$1.0 - 254.0$	$1.0 - 254.0$
No. of sectors
Mean (SD)	$1.4 (0.6)$	$1.7 (0.9)$	$1.4 (0.6)$	$1.6 (0.7)$	$1.3 (0.5)$	$1.4 (0.6)$	$1.3 (0.5)$	$1.2 (0.5)$	$1.2 (0.5)$
Min $-$ max	$1.0 - 4.0$	$1.0 - 4.0$	$1.0 - 4.0$	$1.0 - 3.0$	$1.0 - 4.0$	$1.0 - 5.0$	$1.0 - 4.0$	$1.0 - 5.0$	$1.0 - 5.0$
No. of countries
Mean (SD)	$1.4 (0.6)$	$2.1 (2.5)$	$1.5 (0.7)$	$2.0 (1.0)$	$1.6 (0.9)$	$1.7 (1.3)$	$1.4 (0.8)$	$1.4 (0.8)$	$1.4 (0.8)$
Min $-$ max	$1.0 - 3.0$	$1.0 - 15.0$	$1.0 - 5.0$	$1.0 - 5.0$	$1.0 - 12.0$	$1.0 - 17.0$	$1.0 - 8.0$	$1.0 - 47.0$	$1.0 - 47.0$
No. of funders
Mean (SD)	$1.8 (2.0)$	$1.6 (1.9)$	$1.4 (1.7)$	$2.1 (2.7)$	$1.3 (1.6)$	$1.9 (2.1)$	$1.1 (1.4)$	$0.8 (1.2)$	$0.8 (1.2)$
Min $-$ max	$0.0 - 7.0$	$0.0 - 10.0$	$0.0 - 13.0$	$0.0 - 11.0$	$0.0 - 11.0$	$0.0 - 23.0$	$0.0 - 9.0$	$0.0 - 21.0$	$0.0 - 23.0$
No. of fields of research
Mean (SD)	$2.6 (1.2)$	$2.6 (1.3)$	$2.4 (0.9)$	$2.7 (1.1)$	$2.5 (1.1)$	$2.6 (1.1)$	$2.5 (1.1)$	$2.5 (1.1)$	$2.5 (1.1)$
Min $-$ max	$1.0 - 6.0$	$1.0 - 6.0$	$1.0 - 6.0$	$2.0 - 6.0$	$1.0 - 7.0$	$1.0 - 7.0$	$1.0 - 6.0$	$1.0 - 8.0$	$0.0 - 4.0$

The mean number of sectors represented in publications is relatively low given the maximum potential number of sectors for a publication is $5$ (i.e. one coauthor from each of the academic, corporate, medical, government, and other sectors). Nonetheless, publications in all categories of influence have slightly higher mean numbers of sectors than those in the ‘Other’ category ( $1.2$ ), which is most pronounced in HighCN ( $1.7$ ) and HighNP ( $1.6$ ) categories.

In this section, rather than consider the number of regions represented in each publication, we consider the number of different countries. The mean number of countries represented in ‘Other’ publications is $1.4$ , rising to $2.1$ in HighCN, $2$ in HighNP, and $1.7$ in HighN. We also consider the number of funders associated with each publication. All categories of influence have a higher mean number of funders than the ‘Other’ category ( $0.8$ ). The most pronounced increase was in the HighNP ( $2.1$ ), HighN ( $1.9$ ), and HighAll ( $1.8$ ) categories.

Interestingly, the mean number of FoRs displays a different pattern, whereby only high news categories – HighAll ( $2.7$ ), HighNP ( $2.7$ ), HighCN ( $2.6$ ), and HighN ( $2.6$ ) – have higher mean number of disciplines than the ‘Other’ category ( $2.5$ ).

The MLR analysis presented in Table 3 confirms these results. In particular, the MLR finds that increasing the number of authors resulted in greater performance in HighAll, HighCN, and HighP. Interestingly, increasing the number of institutions results in poorer performance in the scholarly and commercial spheres (HighAll, HighCP, HighC, and HighP). Increasing the number of sectors results in greater performance in all categories of influence except for HighNP. Increasing the number of countries results in a greater performance in the scholarly and public spheres (HighCN, HighC, and HighN), while an increase in the number of funders is associated with the high performance in all categories of influence except for HighCN. Increasing the number of FoRs results in worse performance in the HighP category.

Discussion

In this section, we discuss the results of our analysis at two levels: first, we consider what our findings reveal about the characteristics of knowledge production in the AI field; second, we consider our nascent hybridity lens in light of our findings.

Knowledge production in the AI field

Whilst the field of AI research is large and diverse, our results indicate that influence is centralised in the knowledge production practices of a small number of elite Global North institutions. Figure 2 illustrates this concentration across sectors, regions, and institutional ranking for all authors. The wider the influence of a publication (i.e. the more spheres in which it is successful), on average, the less we see academic affiliation among its authors and the more we see North American and corporate institutions represented.

We also see a pattern across categories of influence, where influential papers tend to be from small teams of researchers working in elite universities or large technology firms. This is possibly due to the presence of well-known academic ‘celebrities’ that garner interest across spheres or to the greater access to resources these types of organisations can provide to facilitate the translation research into public and commercial influence.

This characteristic – the influence of North American and corporate and elite institutions – is likely to correspond to the dominance of deep learning that has been observed by others Luitse and Denkena (2021). State-of-the-art deep learning research is resource intensive: very large datasets, significant human resources for data preparation, and significant computer capacity are all required (Strubell et al., 2019). These resources are, increasingly, beyond the reach of most academic institutions, and thus accessible exclusively to researchers inside a very small number of corporate institutions, or researchers working in collaboration with those institutions (Jurowetzki et al., 2021; Luitse and Denkena, 2021). By way of example, in natural language processing, a significant subfield of AI in which deep learning techniques have become dominant, on the widely used GLUE benchmark (Wang et al., 2019), all the top 10 models are associated with corporate institutions, with the majority based in the United States.¹³

Developing a hybridity lens

The application of the new hybridity lens demonstrates two components of hybridity: evaluative and generative. The first relates to the hybridity that is observed when attention is given to indicators beyond a single sphere. The second relates to the hybrid configurations of research sites, whereby increasing the multiformity of actors and perspectives involved in knowledge production seems to generate a boost in performance.

In terms of evaluative hybridity, the difference between the four blended categories of influence and the three single categories of influence is illuminating. For example, considering the single metrics of citations, news mentions or patent mentions alone under-emphasises the contribution of specific sectors, whereas blended metrics show their capacity for simultaneous influence within and beyond the academy. Likewise, considering regional affiliation using single metrics obscures the specific patterns observable through blended categories. This suggests that particular social contexts and structures provide better support for multiple types of influence. Similarly, evaluating the Information and Computing Sciences FoR based on single metrics hides the field’s capacity to influence multiple spheres at once. This suggests that what we choose to measure and how we understand the dimensions of value matters a great deal to what is deemed to be successful research. As the expectations for research have changed over time, researchers and research organisations increasingly seek to influence a range of audiences. This necessitates not only working to create the conditions for research that is valued in different contexts, but also holding multiple evaluative principles in play simultaneously to be able to assess whether these goals have been achieved (Williams, 2022). Acknowledgement and operationalisation of evaluative hybridity is essential to gaining insights into the type of collaborative configurations that generate a higher degree of influence.

In terms of generative hybridity, which we define as the multiformity of selected features of knowledge production, our results show that publications that perform well in the scholarly, commercial and public spheres – and combinations thereof – tend to be more collaborative or networked than the typical AI publication. The analysis shows that the typical AI publication is a paywalled journal article or book series written by four authors from two different institutions. Usually, these are funded by one funder, if any, and are produced by a single sector within a single country. Strikingly, when examining the high performing categories of influence as a whole, the average publication is a journal article written by eight authors from three institutions, involving a higher number of sectors, countries, and funders compared to the typical publication. This suggests the importance of generative hybridity to research ‘success’.

There are some specific features that seem to be key in creating the conditions for influential publications. Here, the difference between high performing publications according to the sphere of interest is telling. For example, the elevated number of authors, sectors, and funders for high news categories suggests that publications have greater influence in the public sphere when there are more actors involved, perhaps due to increased capacity for media engagement provided by greater access to resources. Similarly, the higher mean number of authors, institutions, sectors, and fields of research in high patent categories suggest higher influence in the commercial sphere when a greater number of inter-sector and interdisciplinary actors are involved. This is reinforced by the increased presence of corporate sector actors in high patent categories, which suggests that industry-academic relationships boost performance. In addition to opening greater resources for commercialisation, it may be that these types of collaborations improve the translation of technical AI knowledge into applied domain areas. The pattern also occurs in the high citation categories, where high performing publications have a higher number of authors, sectors, countries and funders. This echoes the results of Hagendorff et al. (2021), who found that high-performing publications in terms of citations tend to be the product of academic-corporate collaborations. However, our MLR analysis shows that even once the type of sector has been accounted for, the multiformity of actors remains significant. Thus, academic-industry collaboration is one part of a broader picture of hybridity, but bringing together multiple actors through different collaboration configurations also matters in its own right.

Limitations

The central idea of our analysis is that articles in the $99$ th percentile of citations, news mentions, and patent mentions, are representative of ‘high influence’ articles. This brings with it certain limitations. First, it conflates volume of citations, news or patent mentions, with some kind of meaningful influence, which may not hold (Aksnes et al., 2019). In particular, like most citation based analyses, our method cannot account for the accuracy or quality of the underlying research, given that publications are sometimes cited for being incorrect, unethical, or otherwise problematic. Second, it results in each category of influence being relatively small in terms of number of articles represented, which is not ideal statistically.

Our analysis is also dependent on the quality of bibliometric and altmetric data we inherit from Scopus and Altmetrics. Whilst these are both considered to be best-in-class data sources, they do have known issues. Most relevantly, Altmetric is likely to underestimate the visibility given to some outputs, given that not all disciplines use DOIs to the same extent. In addition, coverage of news mentions is biased towards English-language news outlets, and Scopus’s author affiliation data is provided by authors’ ORCID accounts, which in turn is provided by authors themselves, and so at times can be out of date or incomplete.

The relationship between some of the context factors and the categories of influence is not independent. In particular, the Shanghai Computer Science rankings and journal quartile rankings use citation metrics in their calculations. It is unclear to what extent this explains our finding of a correlation between institutional and journal ranking and publication performance across the categories of influence.

Conclusion

Recent literature highlights the multiple logics that new knowledge is increasingly required to meet. Yet, this literature lacks nuance in two areas. The first relates to the challenge of interpreting influence across different spheres simultaneously. This article addresses this through the evaluative component of the hybridity lens, which considers performance across three spheres through operationalisation of eight blended and single categories of influence. The second relates to fragmentation in the knowledge production factors that are considered, whereby research tends to be understood in terms of its context, topic or ouptut type rather than a holisitic picture of these features. This study addresses this through the evaluative and generative components of the hybridity lens by considering a suite of elements from each knowledge production aspect, as well as using selected features to examine different collaborative configurations at play.

The article thus contributes a new theoretical and empirical understanding of hybridity as a measure of influence as well as as a feature of knowledge production. It offers a new conceptualisation of two elements of hybridity, which have been termed generative and evaluative, respectively. It provides a theoretically driven use of bibliometric and altmetric analyses that make metrics interpretable through the lens of broader social spheres. It also opens the possibility of integrating a range of additional indicators into a more holistic framework that can illuminate the varied dimensions of research value.

In developing this understanding of hybridity, the article confirms important characteristics of influential research within the AI field. From the evaluative perspective, by comparing blended measures of influence to single measures, the article foregrounds the benefits of integrating multiple evaluative principles to draw more robust conclusions about research value. It also highlights the importance of broader social contexts and structures in supporting authors to produce research that is simultaneously influential in the scholarly, public, and commercial spheres. From the generative perspective, influential AI research is more likely to be produced in contexts where hybridity is maximised, whether it be through inter-sector, inter-funder or international collaborations.

Future research could examine the relationship between multiformity of actors and topical content, and could draw directly on abstract text and statistical topic modelling, rather than relying on FoR categories alone. It could also examine the relationship between sector and open access, noting that open access may be correlated with authors’ access to resources. Finally, future research could also attempt to validate the research presented here through empirical, comparative studies of knowledge production sites with different hybridity compositions.

Supplemental Material

sj-pdf-1-bds-10.1177_20539517231180577 - Supplemental material for Investigating hybridity in artificial intelligence research

Supplemental material, sj-pdf-1-bds-10.1177_20539517231180577 for Investigating hybridity in artificial intelligence research by Kate Williams, Glen Berman and Sandra Michalska in Big Data & Society

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Economic and Social Research Council grant ES/V004123/1, awarded to Kate Williams and Jonathan Grant.

ORCID iDs

Kate Williams

Glen Berman

Supplemental material

Supplemental material for this article is available online.

Notes

References

Abadi

HHN

Pecht

(2020) Artificial intelligence-related research funding by the U.S. National Science Foundation and the National Natural Science Foundation of China. IEEE Access 8: 183448–183459. DOI: 10.1109/ACCESS.2020.3029231

Abadi

HHN

Pecht

(2020) Artificial intelligence trends based on the patents granted by the United States Patent and Trademark Office. IEEE Access 8: 81633–81643. DOI: 10.1109/ACCESS.2020.2988815

Abbas

Zhang

Khan

(2014) A literature review on the state-of-the-art in patent analysis. World Patent Information 37: 3–13. DOI: 10.1016/j.wpi.2013.12.006

Adler

Elmquist

Norrgren

(2009) The challenge of managing boundary-spanning research activities: Experiences from the Swedish context. Research Policy 38(7): 1136–1149.

Ahmed

Wahed

(2020) The De-democratization of AI: Deep Learning and the Compute Divide in Artificial Intelligence Research. DOI: 10.48550/ARXIV.2010.15581. Publisher: arXiv Version Number: 1.

Aksnes

Langfeldt

Wouters

(2019) Citations, citation indicators, and research quality: An overview of basic concepts and theories. SAGE open 9(1): 215824401982957. DOI: 10.1177/2158244019829575

Albert

(2003) Universities and the market economy: The differential impact on knowledge production in sociology and economics. Higher Education 45(2): 147–182.

Baruffaldi

Beuzekom

Dernis

, et al. (2020) Identifying and measuring developments in artificial intelligence: Making the impossible possible. Technical report, OECD, Paris. DOI: 10.1787/5f65ff7e-en.

Bastow

Dunleavy

Tinkler

(2014) The Impact of the Social Sciences: How Academics and Their Research Make a Difference. London, UK: SAGE Publications Ltd. ISBN 978-1-4462-7510-8.

10.

Battilana

Besharov

Mitzinneck

(2017) On hybrids and hybrid organizing: A review and roadmap for future research. The SAGE handbook of organizational institutionalism 2: 133–169.

11.

Bornmann

(2014) Do altmetrics point to the broader impact of research? An overview of benefits and disadvantages of altmetrics. Journal of Informetrics 8(4): 895–903. DOI: 10.1016/j.joi.2014.09.005

12.

Brennen

Howard

Nielsen

(2022) What to expect when you’re expecting robots: Futures, expectations, and pseudo-artificial general intelligence in UK news. Journalism 23(1): 22–38. DOI: 10.1177/1464884920947535

13.

Bughin

Hazan

Ramaswamy

, et al. (2017) Artificial intelligence: The next digital frontier? McKinsey Global Institute.

14.

Chuan

Tsai

WHS

Cho

(2019) Framing artificial intelligence in American Newspapers. In: Proceedings of the 2019 AAAI/ACM conference on AI, ethics, and society, AIES ’19. New York, NY, USA: Association for Computing Machinery. ISBN 978-1-4503-6324-2, pp. 339–344. DOI: 10.1145/3306618.3314285. Number of pages: 6 Place: Honolulu, HI, USA.

15.

Cockburn

Henderson

Stern

(2018) The impact of artificial intelligence on innovation. In: Agrawal A, Gans J, and Goldfarb A (eds) The economics of artificial intelligence: An agenda. Publisher: University of Chicago Press Chicago, IL, USA. pp. 115–152.

16.

Crespo

Dridi

(2007) Intensification of university–industry relationships and its impact on academic research. Higher Education 54(1): 61–84. DOI: 10.1007/s10734-006-9046-0

17.

Echterhölter

Schröter

Sudmann

(2021) How is Artificial Intelligence Changing Science? Research in the Era of Learning Algorithms. Preprint, MediArXiv. doi:10.33767/osf.io/28pnx.

18.

Enders

De Weert

(2004) Science, training and career: Changing modes of knowledge production and labour markets. Higher Education Policy 17(2): 135–152. DOI: 10.1057/palgrave.hep.8300047

19.

Fast

Horvitz

(2017) Long-term trends in the public perception of artificial intelligence. Proceedings of the AAAI Conference on Artificial Intelligence 31(1). DOI: 10.1609/aaai.v31i1.10635

20.

Fournier-Tombs

(2021) Towards a United Nations internal regulation for artificial intelligence. Big Data & Society 8(2): 205395172110394. DOI:10.1177/20539517211039493

21.

Frank

Wang

Cebrian

, et al. (2019) The evolution of citation graphs in artificial intelligence research. Nature Machine Intelligence 1(2): 79–85. DOI: 10.1038/s42256-019-0024-5

22.

Gargiulo

Fontaine

Dubois

, et al. (2022) A meso-scale cartography of the AI ecosystem. DOI: 10.48550/arXiv.2212.12263. ArXiv:2212.12263 [physics].

23.

Gumpenberger

Glänzel

Gorraiz

(2016) The ecstasy and the agony of the altmetric score. Scientometrics 108(2): 977–982. DOI: 10.1007/s11192-016-1991-5

24.

Hagendorff

Meding

(2021) Ethical considerations and statistical analysis of industry involvement in machine learning research. AI & Society 38: 35–45. DOI: 10.1007/s00146-021-01284-z

25.

Hammarfelt

(2014) Using altmetrics for assessing research impact in the humanities. Scientometrics 101(2): 1419–1430. DOI: 10.1007/s11192-014-1261-3

26.

Haney

(2020) Deep reinforcement learning patents: An empirical survey. SSRN Electronic Journal. DOI: 10.2139/ssrn.3570254

27.

Haustein

Peters

Bar-Ilan

, et al. (2014) Coverage and adoption of altmetrics sources in the bibliometric community. Scientometrics 101(2): 1145–1163. DOI: 10.1007/s11192-013-1221-3

28.

Jobin

Ienca

Vayena

(2019) The global landscape of AI ethics guidelines. Nature Machine Intelligence 1(9): 389–399. DOI: 10.1038/s42256-019-0088-2

29.

Jurowetzki

Hain

Mateos-Garcia

, et al. (2021) The Privatization of AI Research(-ers): Causes and Potential Consequences – From university–industry interaction to public research brain–drain? DOI: 10.48550/arXiv.2102.01648. ArXiv:2102.01648 [cs].

30.

Kazuyuki

(2020) Science and Technology Co-evolution in AI: Empirical Understanding through a Linked Dataset of Scientific Articles and Patents. Technical report, Research Institute of Economy, Trade and Industry, Tokyo, Japan.

31.

Klinger

Mateos-Garcia

Stathoulopoulos

(2020) A narrowing of AI research? SSRN Electronic Journal. DOI: 10.2139/ssrn.3698698

32.

Lawani

(1981) Bibliometrics: Its theoretical foundations, methods and applications. Libri 31(1): 294–315. DOI: 10.1515/libr.1981.31.1.294

33.

Lei

(2021) A bibliometric analysis of topic modelling studies (2000–2017). Journal of Information Science 47(2): 161–175.

34.

Luitse

Denkena

(2021) The great transformer: Examining the role of large language models in the political economy of AI. Big Data & Society 8(2): 205395172110477. DOI: 10.1177/20539517211047734

35.

Mikołajczak

(2020) Social enterprises’ hybridity in the concept of institutional logics: Evidence from Polish NGOs. Voluntas: International Journal of Voluntary and Nonprofit Organizations 31(3): 472–483. DOI: 10.1007/s11266-020-00195-9

36.

Mirowski

(2018) The future(s) of open science. Social Studies of Science 48(2): 171–203.

37.

Motohashi

(2018) Understanding AI driven innovation by linked database of scientific articles and patents.

38.

Niu

Tang

, et al. (2016) Global research on artificial intelligence from 1990–2014: Spatially-explicit bibliometric analysis. ISPRS International Journal of Geo-Information 5(5): 66. DOI: 10.3390/ijgi5050066

39.

Ortega

(2019) Availability and audit of links in altmetric data providers: Link checking of blogs and news in Altmetric.com, crossref event data and plumX. Journal of Altmetrics 2(1): 4. DOI: 10.29024/joa.14

40.

Roemer

Borchardt

(2015) Meaningful metrics: a 21st century librarian’s guide to bibliometrics, altmetrics, and research impact. Chicago: Association of College and Research Libraries, A division of the American Library Association. ISBN 978-0-8389-8755-1.

41.

Samoili

Cobo

Delipetrev

, et al. (2021) AI Watch. Defining Artificial Intelligence 2.0. Towards an operational definition and taxonomy of AI for the AI landscape. Technical report, Joint Research Centre (Seville site).

42.

Saran

Natarajan

Srikumar

(2018) In pursuit of autonomy: AI and national strategies. Observer Research Foundation.

43.

Schaich Borg

(2021) Four investment areas for ethical AI: Transdisciplinary opportunities to close the publication-to-practice gap. Big Data & Society 8(2): 205395172110401. DOI: 10.1177/20539517211040197

44.

Siebert

Kohler

Scerri

, et al. (2018) Technical Background and Methodology for the Elsevier’s Artificial Intelligence Report.

45.

Star

Griesemer

(1989) Institutional ecology, ‘translations’ and boundary objects: Amateurs and professionals in Berkeley’s Museum of Vertebrate Zoology, 1907–39. Social Studies of Science 19(3): 387–420. DOI: 10.1177/030631289019003001

46.

Stark

(2011) The sense of dissonance. In: The Sense of Dissonance. Princeton University Press.

47.

Strubell

Ganesh

McCallum

(2019) Energy and Policy Considerations for Deep Learning in NLP. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, pp. 3645–3650. DOI: 10.18653/v1/P19-1355.

48.

Thornton

Ocasio

(1999) Institutional logics and the historical contingency of power in organizations: Executive succession in the higher education publishing industry, 1958–1990. American journal of Sociology 105(3): 801–843.

49.

Thornton

Ocasio

Lounsbury

(2012) The Institutional Logics Perspective: A New Approach to Culture, Structure and Process. New York, NY: OUP Oxford.

50.

Wang

Singh

Michael

, et al. (2019) GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. ArXiv:1804.07461 [cs].

51.

Williams

(2020) Playing the fields: Theorizing research impact and its assessment. Research Evaluation 29(2): 191–202. DOI: 10.1093/reseval/rvaa001

52.

Williams

(2022) What counts: Making sense of metrics of research value. Science and Public Policy 49(3): 518–531. DOI: 10.1093/scipol/scac004

53.

Ylijoki

(2003) Entangled in academic capitalism? A case-study on changing ideals and practices of university research. Higher education 45(3): 307–335.

54.

Zhang

Dafoe

(2020) U.S. Public Opinion on the Governance of Artificial Intelligence. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. Association for Computing Machinery. ISBN 978-1-4503-7110-0, pp. 187–193. DOI: 10.1145/3375627.3375827.

55.

Zhao

Lou

Tan

, et al. (2018) Do funded papers attract more usage? Scientometrics 115(1): 153–168. DOI: 10.1007/s11192-018-2662-5

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.13 MB