Abstract
Research in the global field of artificial intelligence is increasingly hybrid in orientation. Researchers are beholden to the requirements of multiple intersecting spheres, such as scholarly, public, and commercial, each with their own language and logic. Relatedly, collaboration across disciplinary, sector and national borders is increasingly expected, or required. Using a dataset of 93,482 artificial intelligence publications, this article operationalises scholarly, public, and commercial spheres through citations, news mentions, and patent mentions, respectively. High performing publications (99th percentile) for each metric were separated into eight categories of influence. These comprised four blended categories of influence (news, patents and citations; news and patents; news and citations; patents and citations) and three single categories of influence (citations; news; patents), in addition to the ‘Other’ category of non-high performing publications. The article develops and applies two components of a new hybridity lens: evaluative hybridity and generative hybridity. Using multinomial logistic regression, selected aspects of knowledge production – research context, focus, artefacts, and collaborative configurations – were examined. The results elucidate key characteristics of knowledge production in the artificial intelligence field and demonstrate the utility of the proposed lens.
Introduction
Artificial intelligence (AI) research has captured the attention of national research funders (Abadi et al., 2020), the private sector (Bughin et al., 2017), the news media (Chuan et al., 2019), social scientists (Frank et al., 2019), and public policy experts and decision-makers (Jobin et al., 2019). In this context, the AI research field has come to be of scholarly and national interest. Governments have adopted national strategies to support local AI research, often with a focus on private–public partnerships (Saran et al., 2018). Meanwhile, critical scholars have highlighted the ways AI technologies are changing scientific practices (Echterhölter et al., 2021), and questioned the role of corporate funding in shaping the AI research field (Ahmed and Wahed, 2020; Hagendorff and Meding, 2021), and, more broadly, have been alert to the changing political economy of knowledge production in the context of intertwining university–industry relationships (Crespo and Dridi, 2007). As Frank et al. (2019: 1) observe, ‘AI research is increasingly focused on engineering applications – perhaps due to the increasingly central role of the technology industry’.
In this study, we conceptualise the AI research field as sitting at the intersection of multiple spheres, each with their own priorities and logics. Building on existing concerns regarding university–industry relationships, and the public policy import of AI research (Schaich Borg, 2021; Fournier-Tombs, 2021), we focus on three spheres of interest: scholarly, public, and commercial. Divergence in the logics of these spheres occurs at two scales. First, at the scale of individual researchers, how individuals are situated in relation to these spheres will vary from researcher to researcher. For example, those working in the academic sector are buffeted by these spheres differently to those in the commercial sector. Second, at the scale of research institutions, how these three spheres are embodied and reflected will vary from institution to institution. Most pertinently, the criteria by which research institutions value and evaluate research outputs are influenced by the ways in which institutions are situated within these spheres. For example, in traditional academic institutions, legitimacy is gained by producing work that is valued by other academics, primarily measured through citation-based metrics (e.g. the
The AI research field extends well beyond the scholarly world, with AI researchers situated within and outside academia engaging with the logics of multiple spheres. AI researchers are increasingly oriented to users and developers of technology, to commercial contexts and industrial labs, to policymakers and the public, and a range of mediators that fall in between. Published AI research must be flexible enough to inhabit these multiple intersecting communities without relinquishing academic credibility (Star and Griesemer, 1989). Accordingly, when quantifying performance, it becomes necessary for AI research institutions to look beyond the scholarly sphere. Consequently, metrics designed to capture commercialisation, investment, intellectual property, and product-use gain importance in organisational settings. The divergent logics of different spheres produce organisational hybridity, or ‘the mixing of core organisational elements that would not conventionally go together’ (Battilana et al., 2017: 129). Within these hybrid forms, there are a range of potential principles around what constitutes valuable and worthwhile research, as well as a range of potential principles of evaluation (Stark, 2011).
We analyse the relationship between the properties of research publications produced in the AI field, and their performance across three categories of influence, corresponding to the three spheres identified above. We draw on the Scopus AI research dataset (Siebert et al., 2018),
1
which consists of approximately
We adopt Elsevier’s broad definition of the AI field, which includes research on high-level AI techniques, the application of these techniques in various domains, and social science research on their use and influence (Siebert et al., 2018). This approach reflects our interest in studying the influence of AI research beyond the scholarly sphere, which occurs at both the high-level, and in application domains (Cockburn et al., 2018; Gargiulo et al., 2022). Finally, we note that defining AI itself remains a challenge for policymakers and researchers (Samoili et al., 2021), and so favour a broad definition of the AI field, rather than one grounded in a particular conceptual definition of AI technology itself.
Hybridity
There is a well-established literature on hybridity. One fruitful conceptualisation takes hybrid sites as those that integrate cultural patterns of values, beliefs and practices that arise from multiple field or societal-level logics. Thornton and Ocasio (1999: 804) describe sets of ‘assumptions and values, usually implicit, about how to interpret organisational reality, what constitutes appropriate behaviour and how to succeed’. For example, social enterprises combine the logic of the market with the logic of the third sector (Mikołajczak, 2020), while state owned enterprises combine bureaucratic and market logics (Thornton et al., 2012).
Although far less developed, there is also a specific literature that considers hybridity in research settings. In the broadest sense, this literature focuses on sweeping changes to the nature of science. This work highlights a shift from established disciplinary categories to increasingly nebulous, ill-defined areas of study. Proponents of ‘Mode 2’ or ‘post-academic science’, argue that the 1990s saw the emergence of a new form of knowledge production that is increasingly applied, trans-disciplinary, and transient (Enders and De Weert, 2004). They observe a broader, more heterogeneous set of researchers working across disciplinary boundaries on specific local problems with a greater focus on social accountability and reflexivity. They also highlight the involvement of a more diverse set of organisations and institutions that display great breadth in funding patterns with a diverse range of requirements and expectations.
This broad shift has seen a necessary integration of distinct logics or institutional cultures. For example, the ‘entrepreneurial science’ model holds that the convergence of scientific and economic arenas has led researchers to harbour a desire to market their research (Albert, 2003). Similarly, the ‘academic capitalism’ model highlights increasing competition for external funding and increasing market activities (Ylijoki, 2003). Other researchers have examined settings where two distinct logics or institutional cultures are combined. Adler, Elmquist and Norrgren (2009) explore how research managers negotiate competing logics of independence, sustainability and freedom on the one hand, and the logic of integration, relevance, and predictability on the other. Mirowski (2018) examines how the logic of the media and public space are combined by open science proponents to make research more accessible, open, and reproducible.
At the scale of the researcher, to work productively in fields that are subject to the logics of multiple spheres, researchers engage in practices of hybrid knowledge production. Such practices require researchers to extend themselves beyond the production of scholarly publications, to include the translation of scholarly work into content digestible in commercial and public sectors, and the use of scholarly work in efforts designed to shape public policy, obtain commercial impact, and, ultimately, to gain the resources and legitimacy needed to continue scholarly work (Williams, 2022). For example, a researcher in the AI field may disseminate their scholarly outputs in conference proceedings, blogs and social media content, and direct engagement with commercial firms. In doing so, credibility is sought across multiple spheres, with different strategies enrolled for each sphere, responding to the different logics and symbolic resources dominant in each. Successful hybrid knowledge production thus involves balancing the production of a range of symbolic resources and necessitates competence within the academic system and beyond it.
At the scale of the research organisation, hybridity emerges as a response to the reliance on both material and symbolic resources from external sources. These sources can include the overarching institution within which a research unit operates, as well as other sources in their network. Accordingly, they strive to be judged as acceptable and legitimate by external constituents, including funders, target audiences and the public, as well the internal academic community. In spanning multiple social spheres, hybrid knowledge producers can face competing or incommensurate external expectations (Battilana et al., 2017). This can lead to difficulties creating legitimacy due to the challenges of satisfying multiple groups simultaneously. Yet, when hybrid knowledge producers gain legitimacy across multiple groups of constituents, they can potentially generate a greater and more diverse resource base than those that represent a single type of logic. However, tracking and measuring the influence of hybrid research on multiple constituencies is not straightforward. For example, as noted above, private sector actors are incentivised to understand how their technology investments perform across commercial markets, whereas academic actors are oriented to influence as indicated by scholarly metrics. These aspects of performance, and others, come together in hybrid contexts. Accordingly, research organisations often take stock of economic or statistical indicators of income or intellectual property or have researchers generate broad narratives about how their research has led to tangible societal or commercial impact (Williams, 2020).
Bibliometric studies of the AI field
This study builds on, and extends, existing bibliometric analyses of impactful research publications in the scholarly sphere. Bibliometric studies aim to quantitatively understand the influence of scholarly outputs (Roemer and Borchardt, 2015), and form part of established research evaluation practice (Williams, 2020). Typically, these studies focus on ‘performance analysis’, which uses citation and authorship data (Lawani, 1981) to examine the contributions of research constituents (e.g. authors, institutions, countries, and journals) to a given field.
Niu et al. (2016) explore co-authorship patterns in the AI research field through bibliometric analysis of publications from 1990 to 2014. They identify high performing authors and articles by reference to academic citations, and find that the AI research field is structured by clusters of researchers in the United States, Europe, and East Asia, with cooperation between researchers tending to happen intra-nationally, rather internationally. Hagendorff and Meding (2021) also explore co-authorship patterns in the AI research field. Although their focus is on industry, academic, and academic-corporate collaboration, their analysis defines high performance only by reference to academic citations. They find that papers produced by industry-affiliated authors have higher citation rates than those produced by academic-affiliated authors, and that the topics covered in academic papers tend to lag two years behind the topics covered in industry papers. And, Gargiulo et al. (2022) explore co-authorship patterns across the AI research field through a novel analysis of authors’ disciplines (as indicated by the discipline of the venues they publish in), finding that as the AI field moves between phases of contraction and growth the field shifts between its ‘native’ disciplines in computer science, mathematics, and statistics to more interdisciplinary applied disciplines.
Bibliometric analyses also explore the relationship between academic and commercial institutions. Klinger et al. (2020) approach the AI research field from the perspective of directed technical change research, conducting an analysis of pre-print papers on the arXiv platform to discern thematic composition of research outputs. Drawing on author affiliation data, they find that researchers in the private sector are more narrowly focused on deep learning techniques than those in the academic and public sectors. Ahmed and Wahed (2020) consider the extent to which large firms dominate AI research by studying the institutional affiliations of papers published in leading computer science academic venues. They find that 2012 was an inflection point in the AI research field, with the rise of deep learning techniques driving an increase in compute-intensive research, which concentrates research activities in resource-rich elite universities and large technology firms. Through analysis of papers presented at the three major machine learning conferences from 2015 to 2019, Hagendorff and Meding (2021) find an increasing trend in papers presented by authors affiliated with both academia and industry, representing the intertwining of academic and corporate research settings. Similarly, Frank et al. (2019) analyse citation patterns and referencing relationships in computer science subfields that are relevant to AI. They find a consolidation, over time, of influential AI research in industry ‘hubs’ and identify a trend of decreasing reference rates from fields outside of AI to AI research, which they suggest is indicative of the broader academic community struggling to keep up with the pace of AI research.
Existing bibliometric analyses thus give some insight into the increasingly hybrid nature of knowledge production in AI research. To date, there has not, however, been adequate analysis of the extent of hybridity in the field. This article aims to unpack the features of hybrid knowledge production inherent to AI research.
Altmetric studies of the AI field
Given hybrid researchers are increasingly beholden to new expectations, roles, and responsibilities outside the academic system (Bastow et al., 2014), it is necessary to capture the reception of research from a more diverse set of constituents than is offered by bibliometric analyses alone. This is especially pronounced in AI research, where researchers simultaneously straddle the performance requirements of academically credible, commercially viable and widely interpretable work.
Patent data has been used by researchers interested in the relationship between academic publications and innovation. These studies highlight the significant commercialisation of AI research (Baruffaldi et al., 2020; Li and Lei, 2021), and the close intertwining of open science and proprietary technology in AI development (Kazuyuki, 2020). Patent analysis is used to identify patterns in a particular region, examine novelty and quality, and highlight technological needs and gaps (Abbas et al., 2014). A study of
Another avenue that generates fruitful insights on the AI field is media studies. These studies take discussion in news outlets as a key indicator of public interest and influence. Brennen et al. (2022) show that almost
Studies have also expanded to incorporate alternative metrics (‘altmetrics’) that attempt to reflect the influence of scholarly outputs beyond traditional academic venues (Bornmann, 2014). Altmetrics have emerged to measure the discussion of scholarly outputs on social media and in news media, and to make the most of the transition to online working environments by researchers, which enable the tracking of a new range of interactions with scholarly outputs (e.g. downloads and digital readership) (Haustein et al., 2014). Relatively few studies of the AI research field attempt to contextualise or extend insights garnered from bibliometric data sources with altmetrics, although some acknowledge the potential utility of this (Klinger et al., 2020). Zhang and Dafoe (2020) find that altmetric attention scores for AI publications have increased dramatically between 2011 and 2017, indicating greater public attention to the field. Zhao et al. (2018) use altmetric and bibliometric data to compare the performance of publications authored solely by university affiliated researchers, industry affiliated researchers, and collaborations between the two. They find that industry publications receive more attention online than university authored papers. Our study builds on this work, by expanding the comparison to include academic, corporate, government, and medical sectors, and by expanding the breadth of altmetrics data considered to include in-patent citations as well as media mentions. Although altmetrics cannot claim the same clear meaning or credibility as bibliometrics, we suggest that careful selection of indicators within a theoretical framework permits useful insights into hybrid knowledge production.
In this study, we integrate patent and news citation data into our bibliometric analysis. Doing so enables us to consider the relationship between potential signifiers of value across the three spheres of interest (scholarly, public, and commercial), and to directly address questions of hybridity in our analysis.
A new lens on hybridity in AI research
Taking the multiple forms of research that are produced in the AI field, and the multiple criteria of evaluation that can be applied, we propose a new
The two components of the hybridity lens.
The
Lastly, the
The aim in applying an hybridity lens is to identify the underlying characteristics of research outputs that are highly influential in: the scholarly sphere, the public sphere, and the commercial sphere, and the combination of these. We take these measures as potential signifiers of value that can be used together to understand influence in research. In this view, citations represent a measure of value (i.e. academic credibility) from the scholarly sphere, news mentions represent a measure of value from the public/media sphere (i.e. visibility amongst those outside academia), and patent mentions represent a measure of value from the economic sphere (i.e. commercial value outside academia) (Williams, 2022). We note that each of these measures is only an imperfect proxy for value. Using this lens, the remainder of this article addresses the question: what are the knowledge production features of publications that are high performing across theoretically defined categories of influence?
Methods and analysis
Data extraction and enrichment
Two primary data sources were relied upon: Scopus and Altmetric. We used Scopus to extract standard bibliometric metadata for Elsevier’s large dataset of publications in the AI field, including descriptors, classifications, abstracts, and to explore their academic reception (citations). 3 We enriched the Scopus dataset with data from Altmetric, an online platform that gathers mentions of outputs (via links and references) from non-academic sources (e.g. mainstream news media, patents, policy documents, social media). We limited our use of Altmetric data to Field of Research (FoR), for identifying the disciplinary classification of publications, and news and patent mentions, for exploring reception of publications in the public and commercial spheres. 4
At the time of data extraction, the Scopus dataset of AI publications contained 726,158 scholarly outputs, including published articles and pre-prints.
5
31,134 of these did not contain metadata on the affiliation of authors, and were thus omitted from our sample. The sample was reduced to 296,351 outputs published between 2014 and 2019 (inclusive) to allow time for citations to accrue for more recent publications; and to mitigate against poor coverage of news and patent mentions before 2014. Overall, 28,374 duplicate records (i.e. records with the same DOI) were found in the dataset.
6
After de-duplication (by retaining the record with the greatest number of citations), 267,977 scholarly outputs remained in our Scopus sample, which was then matched to the Altmetric database. In total, Altmetric returned matches for 94,686 scholarly outputs. 1,183 duplicate records were identified and removed. In addition, Altmetric erroneously returned
In instances where a scholarly output had multiple authors, metadata associated with all authors was extracted from Scopus. For author-specific metadata (e.g. institution, region, and sector), we used a simple script to generate numerical totals for each data point (e.g. number of authors and number of institutions). Altmetric’s news data is derived from a manually curated list of news sources. Altmetric captures mentions of scholarly outputs through news posts that contain direct links to the output, and through automated text mining. 7 Altmetric’s patent citation data is derived from the IFICLAIMS patent database, and the Dimensions service links patient filings to scholarly outputs that are cited in them. 8 Altmetric’s FoR data is also provided by the Dimensions service and utilises the Australian 2008 FoR schema.
Within the final sample, the distribution of academic citations, patent citations, and news mentions reflected high academic engagement with scholarly outputs, but only sporadic news media or patent engagement. 88% of outputs received at least one academic citation. Meanwhile, only 10% received at least one patent citation, and only 4% received at least one news mention.
Identification of influential scholarly outputs
Within our sample, influential scholarly outputs were identified by reference to their relative performance on three metrics: academic citations, news mentions, and patent mentions. The
Categories of influence selected for investigation.
Data analysis
We draw on various data analysis techniques to explore evaluative hybridity in relation to three aspects of knowledge production (research context, focus, and artefacts). The primary unit of each analysis is individual research publication. For each aspect of knowledge production, we compare publications in the categories of influence to publications that have not been highly influential to date, which we group together under the category ‘Other’.
To assess the research context, we analysed the sector of the institution that authors are affiliated with, the ranking in Computer Science of authors’ institutions, and the region of the authors. Authors were allocated a sector based on publication metadata. Rankings were taken from the global Shanghai rankings
9
and divided into five buckets: institutions ranked
To assess research focus, we analysed the FoR for publications. FoR is a hierarchical academic discipline classification system, in which the discipline of a publication is represented by a 5-digit code.
10
The first two digits determine the highest level of categorisation in the system, which consists of
To assess research artefacts, we analysed the bibliometric metadata of publications. We analysed the source type (book, book series, conference proceedings, journal, and trade publication), open access status (yes or no), and the quartile ranking of the publication venue (using SCImago Journal Rank provided by Scopus).
Finally, to consider generative hybridity, we identified several indicators of different collaborative configurations – number of co-authors, sectors, institutions, countries, funders, and FoR – and extracted values from publication metadata. As publications are our unit of analysis, where an individual publication had multiple co-authors with the same sector, institution, or country, the shared sector was counted only once (i.e. a publication with three co-authors, two from the academic sector and one from the corporate sector was recorded as having two sectors). This allowed us to compare the hybrid configurations underlying publications in the categories of influence to those underlying the ‘Other’ category.
We used multinomial logistic regression (MLR) to confirm the findings, as shown in Table 3.
Results of the multinomial logistic regression (reference: ‘Other’ category).
*
Results
The evaluative hybridity component
The evaluative component of the hybridity lens considers the performance of publications across three different spheres of interest (scholarly, public, and commercial), which in this study are operationalised in terms of blended categories (i.e. high performance on all, high performance on two of the three spheres) and single categories (i.e. high performance on only one sphere). The results below detail each knowledge production aspect in turn: research context, research focus, and research artefacts.
Research context
The research context that emerges for the typical publication is unsurprising. Non-highly influential publications (i.e. the ‘Other’ category) are largely first-authored by a researcher with academic affiliation (88%), whose institution is ranked outside the top 100 Computer Science institutions (62.5%), and who is based either in Asia (39%) or Europe (34%). We found minimal differences between the research context of first authors compared to all authors. As such, we report only the results of first author analysis (although details of co-author analysis can be found below in Figure 2).
The research context, however, is more complicated when considering the categories of influence. Across all the categories of influence, the proportion of publications with first authors working in the academic sector declines, whilst the proportion of publications with the first authors working in the corporate sector increases. Indeed, whilst the first authors working in the corporate sector account for only
The institutions of authors producing publications in the categories of influence also skew towards the top ranks of the Shanghai Computer Science rankings. Whilst only
Whilst the overall AI field is dominated by authors working at institutions in Europe, Asia, and North America (only
As shown in Table 3, the MLR analysis confirms the above results. Compared to academic first authors, corporate first authors are significantly more likely to be in all categories of influence except HighCN and HighN; government and medical first authors are both more likely to be in HighNP. The results show a nuanced picture of the importance of sector across the categories of influence, with only eight statistically significant independent variables. This suggests the impact of sector on influence via patents and citations. Publications produced from institutions outside the top
Research focus
The research focus for the typical AI publication is also unsurprising. Results are reported in Table 6 in Supplemental Material. Most publications are categorised under Information and Computing Science in the FoR (54% of ‘Other’ publications), followed by Engineering (9%) and Medical and Health Sciences (8%).
When considering the relationship between the categories of influence and FoR, as shown in Figure 1, a slightly different picture emerges. The Information and Computing Sciences FoR is the dominant FoR across all categories of influence except for HighN and HighCP. In HighCP, HighC, and HighN the Medical and Health Sciences FoR was equally or more dominant. The only other FoRs to have greater than a

The relationship between categories of influence and disciplines given by field of research (FoR) code.

Proportion of sector, region, and Shanghai Computer Science Ranking based on affiliations of all authors.
Research artefacts
Unsurprisingly, most AI publications are journal articles (
In terms of open access,
In terms of the quartile journal rank of publication venues, there is a roughly equal distribution of publications in the ‘Other’ category across Q1, Q2, and Q4 journals. In the categories of influence, however, the distribution of publications skews strongly towards Q1 journals. This trend is particularly strong for the categories of influence associated with the public sphere (in HighNP
The MLR analysis confirms these results, as shown in Table 3. Compared to journal articles, books, book series, conference proceedings, and trade publications are less likely to be in most categories of influence, with the exception of conference proceedings which are more likely to be in HighCP and HighC, and trade publications which are more likely to be in HighP. Compared to Q1 journals, publications from Q2, Q3, and Q4 are less likely to be in all categories of influence. Indeed, as expected, the pronounced number of statistically significant results here suggest that influential publications may generally share the same research artefact characteristics (i.e. most influential publications are articles in Q1 journals). Detailed statistics are reported in Table 7 in Supplemental Material.
The generative hybridity component
The generative component of the hybridity lens considers the multiformity of selected features of knowledge production for high performing publications across three different spheres of interest (scholarly, public, and commercial), operationalised as above in terms of blended categories and single categories. Whilst the generative component can be considered on the level of research context, focus, and artefacts (as we do with the evaluative component), for conciseness, we present our preliminary findings across these three aspects together.
Our analysis of generative hybridity reveals a nuanced picture of multiformity of knowledge production features in the AI field.
12
Results are reported in Table 4. In terms of the mean number of authors contributing to publications, the categories of influence have more authors than in the ‘Other’ category. This was most pronounced in the high news categories – HighCN (
Application of the generative component of the hybridity lens: collaborative configurations underlying categories of influence.
The mean number of sectors represented in publications is relatively low given the maximum potential number of sectors for a publication is
In this section, rather than consider the number of regions represented in each publication, we consider the number of different countries. The mean number of countries represented in ‘Other’ publications is
Interestingly, the mean number of FoRs displays a different pattern, whereby only high news categories – HighAll (
The MLR analysis presented in Table 3 confirms these results. In particular, the MLR finds that increasing the number of authors resulted in greater performance in HighAll, HighCN, and HighP. Interestingly, increasing the number of institutions results in poorer performance in the scholarly and commercial spheres (HighAll, HighCP, HighC, and HighP). Increasing the number of sectors results in greater performance in all categories of influence except for HighNP. Increasing the number of countries results in a greater performance in the scholarly and public spheres (HighCN, HighC, and HighN), while an increase in the number of funders is associated with the high performance in all categories of influence except for HighCN. Increasing the number of FoRs results in worse performance in the HighP category.
Discussion
In this section, we discuss the results of our analysis at two levels: first, we consider what our findings reveal about the characteristics of knowledge production in the AI field; second, we consider our nascent hybridity lens in light of our findings.
Knowledge production in the AI field
Whilst the field of AI research is large and diverse, our results indicate that influence is centralised in the knowledge production practices of a small number of elite Global North institutions. Figure 2 illustrates this concentration across sectors, regions, and institutional ranking for all authors. The wider the influence of a publication (i.e. the more spheres in which it is successful), on average, the less we see academic affiliation among its authors and the more we see North American and corporate institutions represented.
We also see a pattern across categories of influence, where influential papers tend to be from small teams of researchers working in elite universities or large technology firms. This is possibly due to the presence of well-known academic ‘celebrities’ that garner interest across spheres or to the greater access to resources these types of organisations can provide to facilitate the translation research into public and commercial influence.
This characteristic – the influence of North American and corporate and elite institutions – is likely to correspond to the dominance of deep learning that has been observed by others Luitse and Denkena (2021). State-of-the-art deep learning research is resource intensive: very large datasets, significant human resources for data preparation, and significant computer capacity are all required (Strubell et al., 2019). These resources are, increasingly, beyond the reach of most academic institutions, and thus accessible exclusively to researchers inside a very small number of corporate institutions, or researchers working in collaboration with those institutions (Jurowetzki et al., 2021; Luitse and Denkena, 2021). By way of example, in natural language processing, a significant subfield of AI in which deep learning techniques have become dominant, on the widely used GLUE benchmark (Wang et al., 2019), all the top 10 models are associated with corporate institutions, with the majority based in the United States. 13
Developing a hybridity lens
The application of the new hybridity lens demonstrates two components of hybridity: evaluative and generative. The first relates to the hybridity that is observed when attention is given to indicators beyond a single sphere. The second relates to the hybrid configurations of research sites, whereby increasing the multiformity of actors and perspectives involved in knowledge production seems to generate a boost in performance.
In terms of evaluative hybridity, the difference between the four blended categories of influence and the three single categories of influence is illuminating. For example, considering the single metrics of citations, news mentions or patent mentions alone under-emphasises the contribution of specific sectors, whereas blended metrics show their capacity for simultaneous influence within and beyond the academy. Likewise, considering regional affiliation using single metrics obscures the specific patterns observable through blended categories. This suggests that particular social contexts and structures provide better support for multiple types of influence. Similarly, evaluating the Information and Computing Sciences FoR based on single metrics hides the field’s capacity to influence multiple spheres at once. This suggests that what we choose to measure and how we understand the dimensions of value matters a great deal to what is deemed to be successful research. As the expectations for research have changed over time, researchers and research organisations increasingly seek to influence a range of audiences. This necessitates not only working to create the conditions for research that is valued in different contexts, but also holding multiple evaluative principles in play simultaneously to be able to assess whether these goals have been achieved (Williams, 2022). Acknowledgement and operationalisation of evaluative hybridity is essential to gaining insights into the type of collaborative configurations that generate a higher degree of influence.
In terms of generative hybridity, which we define as the multiformity of selected features of knowledge production, our results show that publications that perform well in the scholarly, commercial and public spheres – and combinations thereof – tend to be more collaborative or networked than the typical AI publication. The analysis shows that the typical AI publication is a paywalled journal article or book series written by four authors from two different institutions. Usually, these are funded by one funder, if any, and are produced by a single sector within a single country. Strikingly, when examining the high performing categories of influence as a whole, the average publication is a journal article written by eight authors from three institutions, involving a higher number of sectors, countries, and funders compared to the typical publication. This suggests the importance of generative hybridity to research ‘success’.
There are some specific features that seem to be key in creating the conditions for influential publications. Here, the difference between high performing publications according to the sphere of interest is telling. For example, the elevated number of authors, sectors, and funders for high news categories suggests that publications have greater influence in the public sphere when there are more actors involved, perhaps due to increased capacity for media engagement provided by greater access to resources. Similarly, the higher mean number of authors, institutions, sectors, and fields of research in high patent categories suggest higher influence in the commercial sphere when a greater number of inter-sector and interdisciplinary actors are involved. This is reinforced by the increased presence of corporate sector actors in high patent categories, which suggests that industry-academic relationships boost performance. In addition to opening greater resources for commercialisation, it may be that these types of collaborations improve the translation of technical AI knowledge into applied domain areas. The pattern also occurs in the high citation categories, where high performing publications have a higher number of authors, sectors, countries and funders. This echoes the results of Hagendorff et al. (2021), who found that high-performing publications in terms of citations tend to be the product of academic-corporate collaborations. However, our MLR analysis shows that even once the type of sector has been accounted for, the multiformity of actors remains significant. Thus, academic-industry collaboration is one part of a broader picture of hybridity, but bringing together multiple actors through different collaboration configurations also matters in its own right.
Limitations
The central idea of our analysis is that articles in the
Our analysis is also dependent on the quality of bibliometric and altmetric data we inherit from Scopus and Altmetrics. Whilst these are both considered to be best-in-class data sources, they do have known issues. Most relevantly, Altmetric is likely to underestimate the visibility given to some outputs, given that not all disciplines use DOIs to the same extent. In addition, coverage of news mentions is biased towards English-language news outlets, and Scopus’s author affiliation data is provided by authors’ ORCID accounts, which in turn is provided by authors themselves, and so at times can be out of date or incomplete.
The relationship between some of the context factors and the categories of influence is not independent. In particular, the Shanghai Computer Science rankings and journal quartile rankings use citation metrics in their calculations. It is unclear to what extent this explains our finding of a correlation between institutional and journal ranking and publication performance across the categories of influence.
Conclusion
Recent literature highlights the multiple logics that new knowledge is increasingly required to meet. Yet, this literature lacks nuance in two areas. The first relates to the challenge of interpreting influence across different spheres simultaneously. This article addresses this through the evaluative component of the hybridity lens, which considers performance across three spheres through operationalisation of eight blended and single categories of influence. The second relates to fragmentation in the knowledge production factors that are considered, whereby research tends to be understood in terms of its context, topic or ouptut type rather than a holisitic picture of these features. This study addresses this through the evaluative and generative components of the hybridity lens by considering a suite of elements from each knowledge production aspect, as well as using selected features to examine different collaborative configurations at play.
The article thus contributes a new theoretical and empirical understanding of hybridity as a measure of influence as well as as a feature of knowledge production. It offers a new conceptualisation of two elements of hybridity, which have been termed generative and evaluative, respectively. It provides a theoretically driven use of bibliometric and altmetric analyses that make metrics interpretable through the lens of broader social spheres. It also opens the possibility of integrating a range of additional indicators into a more holistic framework that can illuminate the varied dimensions of research value.
In developing this understanding of hybridity, the article confirms important characteristics of influential research within the AI field. From the evaluative perspective, by comparing blended measures of influence to single measures, the article foregrounds the benefits of integrating multiple evaluative principles to draw more robust conclusions about research value. It also highlights the importance of broader social contexts and structures in supporting authors to produce research that is simultaneously influential in the scholarly, public, and commercial spheres. From the generative perspective, influential AI research is more likely to be produced in contexts where hybridity is maximised, whether it be through inter-sector, inter-funder or international collaborations.
Future research could examine the relationship between multiformity of actors and topical content, and could draw directly on abstract text and statistical topic modelling, rather than relying on FoR categories alone. It could also examine the relationship between sector and open access, noting that open access may be correlated with authors’ access to resources. Finally, future research could also attempt to validate the research presented here through empirical, comparative studies of knowledge production sites with different hybridity compositions.
Supplemental Material
sj-pdf-1-bds-10.1177_20539517231180577 - Supplemental material for Investigating hybridity in artificial intelligence research
Supplemental material, sj-pdf-1-bds-10.1177_20539517231180577 for Investigating hybridity in artificial intelligence research by Kate Williams, Glen Berman and Sandra Michalska in Big Data & Society
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Economic and Social Research Council grant ES/V004123/1, awarded to Kate Williams and Jonathan Grant.
Supplemental material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
