Abstract
Understanding the extent to which scientific research informs legislation is essential for evidence-based policymaking. This study evaluates the thematic alignment between China’s Biosecurity Law and the academic literature related to biosecurity, using Latent Dirichlet Allocation (LDA) topic modeling and bibliometric analysis. The analysis is based on the full text of the 2024 version of the Biosecurity Law and 482 peer-reviewed articles published between 1999 and 2024. The results identify six principal topics in the legislation and eight thematic clusters in the literature. A cosine similarity assessment indicates a moderate degree of alignment, with an average similarity score of 0.223. While certain topics—such as public health governance and legal risk management—demonstrate convergence, substantial gaps are observed. These include insufficient legislative attention to rural biosecurity, underrepresentation of genetically modified organism (GMO) regulation, and limited incorporation of international laboratory safety standards. Based on these findings, the study proposes targeted policy recommendations to enhance legal responsiveness, including strengthening rural epidemic control systems, improving oversight of biological materials, and aligning domestic regulations with global biosafety norms. The proposed methodological framework provides a replicable and scalable approach for assessing law–research coherence in domains such as health, environment, and biotechnology governance.
Plain Language Summary
This study looks at whether academic research has helped shape China’s Biosecurity Law, a law that aims to protect people, animals, plants, and the environment from biological threats like diseases or harmful organisms. To do this, the researcher used computer tools to analyze the text of the law and compare it with over 400 related academic articles published in China. The study identified six key areas covered by the law, such as managing epidemics in rural areas and ensuring the safe use of biological resources. It also found eight main research themes in the literature, including topics like public health, food safety, environmental protection, and disease control. By comparing these themes, the study found that some research closely aligns with what the law says—but in other areas, the law doesn’t fully reflect the insights from academic studies. This gap suggests that policymakers could benefit more from scientific findings when updating or creating laws. The study also recommends improvements like updating biosecurity education, better protecting rural communities, and working more with other countries. Overall, the research shows that academic studies can play a big role in making better, smarter laws that keep people and the environment safe.
Introduction
In an era marked by intensifying global health threats, invasive biological species, and rapidly advancing biotechnology, biosecurity has emerged as a critical domain of national and international governance. Effective biosecurity legislation serves not only as a legal safeguard but also as a mechanism for translating scientific knowledge into enforceable action. China’s Biosecurity Law (2020) represents a landmark regulatory framework aimed at addressing biosafety risks across domains such as disease control, biotechnology, environmental protection, and public health. However, the mere existence of legislation does not guarantee its responsiveness to emerging scientific challenges. This raises a timely and fundamental policy question: To what extent are legal frameworks informed by, and aligned with, contemporary scientific research?
Prior scholarship has examined biosecurity governance from multiple perspectives, including legal analysis, public administration, and environmental policy (Cao, 2021; Qin & Sun, 2019). However, these studies predominantly rely on qualitative methods—case studies, policy commentary, and doctrinal interpretation—which, while rich in contextual depth, lack the scalability and objectivity required to systematically assess alignment between legislation and research at scale. Moreover, existing work seldom interrogates the structural content of legal texts and academic literature in a comparative, data-driven manner. As a result, there remains a gap in our understanding of how closely legislation reflects the thematic priorities of the scientific community, and where potential mismatches or omissions may exist.
To address this gap, this study proposes a computational approach to evaluating the thematic alignment between China’s Biosecurity Law and the academic literature related to it. Specifically, this study applies Latent Dirichlet Allocation (LDA) topic modeling and bibliometric analysis to extract and compare topic structures from both legal and scholarly corpora. Rather than asking whether research causally shapes law, this study frame a more testable and replicable objective: to assess the degree of semantic alignment between law and research based on their latent thematic structures. This method offers a scalable and reproducible alternative to prior qualitative assessments, enabling the identification of areas where legislation either converges with or diverges from scientific discourse.
Drawing on legal documents and 482 peer-reviewed publications from CNKI (1999–2024), this study identifies six core themes within the Biosecurity Law and eight dominant themes across the academic literature. Cosine similarity is used to evaluate cross-domain topic alignment, revealing both congruent areas (e.g., risk management, legal principles) and critical gaps (e.g., rural biosecurity, laboratory trade standards). The study contributes to the growing literature on evidence-informed policymaking, offering a generalizable framework for evaluating legislative responsiveness through NLP-based text mining. It also provides policy insights for enhancing legal adaptability, research–policy integration, and cross-sectoral coordination in biosecurity governance.
Material and Methods
The research process followed a structured workflow, as illustrated in Figure 1. First, two datasets were independently constructed: one from the full text of the China Biosecurity Law and the other from academic literature on biosecurity law. Both datasets underwent preprocessing to standardize text and remove noise. Latent Dirichlet Allocation (LDA) modeling was then applied separately to each corpus to extract latent thematic structures. The resulting sets of topics were subsequently compared to identify convergences and divergences between legislative priorities and scholarly discourse. Finally, the comparison results provided the basis for evaluating alignment and discrepancies between law and research.

Technology roadmap for this paper, including data collection, preprocessing, LDA modeling and visualization.
Data Source
This study draws on two data sources: (1) the full text of China’s Biosecurity Law, and (2) peer-reviewed literature from the China National Knowledge Infrastructure (CNKI) database published between 1999 and 2024 using the keyword China Biosecurity. For the legal text, the 2024 revised version of the law was adopted, as it is structurally identical to the original 2020 enactment (10 chapters, 88 articles), with only minor editorial revisions. A clause-by-clause comparison confirmed that these revisions did not affect substantive content or article numbering, ensuring consistency and comparability across years. A sensitivity check using the 2020 version yielded identical mappings, validating robustness. Given the law’s domestic legal and political specificity, CNKI was selected as the most authoritative source for capturing relevant scholarship, as international publications rarely examine the operational structure or enforcement of the law in China (Table 1).
Data Sources’ Names and Websites Link.
Data Preprocessing
Data preprocessing steps include word segmentation, part-of-speech restoration, removal of stop words, and removal of punctuation marks, numbers, and blank characters. To analyze the readability and esthetics of the results, the preprocessing stage used part-of-speech reduction instead of regular used method stemming. To analyze the effective content of the article and remove unnecessary interference, the corpus retains only nouns and removes invalid nouns, such as section, subsection, division, part, chapter, etc., and words with less than three letters. The above steps obtain the corresponding corpus of nine laws. The above tasks use the Natural Language Toolkit (NLTK) in Python. The preprocessing roadmap is shown in Figure 2.

Roadmap of preprocessing, the results of the preprocessing would be the corpus for next modeling.
Latent Dirichlet Allocation (LDA)
In the realm of text analysis, the “topic” refers to the central subject that the text discusses or examines. Topic modeling identifies this subject as a probability distribution over a collection of words (Griffiths & Steyvers, 2004). As an unsupervised learning technique, topic modeling fundamentally constitutes a graphical model grounded in Bayesian networks. It discerns potential topics within a corpus of text, with the derived outcomes serving for the initial analysis of the text (Blei, 2012). Topic modeling is adept at extracting themes from extensive text collections and is applicable to large datasets (Liu, 2025a). A key factor in the recent surge in the model’s popularity is its balance between complexity and interpretability, as the topics’ vocabulary is readily comprehensible.
The Latent Dirichlet Allocation (LDA) model, introduced by Blei et al. (2003), is a probabilistic and statistical approach for identifying the underlying themes within texts. It’s structured as a tri-level Bayesian model encompassing words, topics, and documents. This model has gained extensive application across various domains of text analysis and information processing since its inception (Liu, 2025c). Employing the bag-of-words paradigm, the LDA model views documents as mixtures of topics, which in turn are distributions over words, thereby establishing a hierarchical Bayesian network of “document-topic-word.” Further mathematical details and implementation specifications are available in the Supplemental materials.
Topic Popularity in Biosecurity Legislation
This study also uses the criteria of Liu (2025b) to filter the popularity of topics. These criteria consider both the trend and proportion of topics:
where
In this equation, the heat (or popularity) of a topic is calculated based on the topic’s average probability and trend score across all years. Therefore, it reflects the ongoing popularity of the topic, not just the popularity in the last year.
Topic Comparison
In the realm of text mining and natural language processing, the concept of text similarity is pivotal. It’s about assessing how much two or more textual entities—be it words, sentences, or entire documents—resemble each other in terms of their content and meaning (Ferreira et al., 2013). This resemblance is often expressed through a numerical value, indicating the degree of similarity. Various methods exist to calculate text similarity, each with its own approach and application. Some focus on the literal strings of text, while others delve into the semantic layers, interpreting the meaning behind the words. Techniques like cosine similarity and Jaccard index are commonly employed, alongside more complex models like word embeddings and pre-trained language models.
This study utilized the Cosine similarity to deduce how alike two entities are, irrespective of their size. It is particularly useful in high-dimensional spaces, like those used in information retrieval, where each unique term is assigned its own dimension. The similarity is calculated by taking the dot product of the term frequency vectors of two documents and dividing it by the product of their magnitudes. This results in a similarity score ranging from −1 to 1, where 1 indicates identical directionality, 0 indicates orthogonality, and −1 indicates completely opposite directionality. This score is a reflection of the thematic overlap between the documents, with higher scores indicating greater similarity in content (Lin, 1998). The mathematical expression for cosine similarity is:
where A and B are the term frequency vectors of the documents, and
Reproducibility and Data Availability
The LDA topic modeling in this study was implemented using an open-source Python library (gensim.models.LdaModel), with minor adaptations including TF-IDF weighting, Chinese-specific text preprocessing, and keyword translation for interpretability. Due to licensing restrictions, the full-text data retrieved from CNKI cannot be publicly shared. However, metadata (e.g., article titles, publication years, and abstracts), the legal corpus, and all code scripts used for analysis are available upon request.
Results
Preprocessing Results
This study conducted preprocessing on two databases. The first database is the China Biosecurity Law. After preprocessing and data cleaning, a total of 7,708 words were obtained. The second database consists of keyword search results from the CNKI website. Initially, 626 documents were acquired. Following preprocessing and data cleaning, a total of 482 valid research papers were obtained. A PRISMA is shown in Figure 3.

Dataset collection of the published literature.
Topic Finding of China Biosecurity Law
The study delineates six central themes from the China Biosecurity Act, utilizing LDA topic modeling. Figure 4 presents the distilled essence of these themes; each grounded in the posterior word distribution pertinent to the topic. To manifest the relative importance of the terms, a word cloud diagram has been created. It exhibits the top 15 terms, ranked by their posterior probability, with the dimension of each term reflecting its respective probability.

Word clouds of the topics in China Biosecurity Law. The word clouds depicted illustrate the respective topics derived from the analysis. Within each diagram, the prominence of a word is directly proportional to its frequency, making the most significant terms stand out visually.
The analysis of China’s Biosecurity Law reveals six interrelated thematic domains. The first concerns the nexus between epidemics and rural development, reflecting how public health crises intersect with agricultural livelihoods and ecological stewardship, and how governance structures coordinate environmental management, technological intervention, and social mobilization in rural contexts. A second domain focuses on the governance of biological resources in scientific experimentation, emphasizing the ethical and sustainable use of both microbial and human-derived materials, the necessity of stringent oversight, and the integration of robust data management with national and international compliance frameworks. A third domain addresses biosecurity mechanisms and management, underpinned by scientific and technological foundations, clear institutional responsibilities, formalized procedures, and evidence-based risk assessment to ensure coordinated prevention and preparedness. Closely related is the protection of flora, fauna, and medical institutions, highlighting the interdependence of environmental and human health, the mitigation of ecological and anthropogenic threats, and the establishment of policy frameworks and contingency measures to safeguard both ecosystems and healthcare infrastructure. The fifth domain reflects the legal and procedural architecture of biosecurity, integrating local governance, specialized expertise, and global standards to regulate material handling, institutional safety, and epidemic control, supported by documentation, training, and technology. Finally, the theme of biological risks and response measures underscores the role of legal mandates, technical capacity, and coordinated action in hazard identification, prevention, and rapid response, reinforced by public compliance and enforcement mechanisms. Collectively, these domains depict a comprehensive biosecurity framework that balances legal authority, scientific advancement, environmental stewardship, and societal engagement. These are shown in Table 2.
Summary of six extracted topics, including thematic meanings, high-frequency representative words, and exemplar legal clauses drawn from the Biosecurity Law of China.
Topic Finding of China Biosecurity Law Literature and Topic Popularity
The research identifies eight key themes within the literature on China’s Biosecurity law, using LDA topic modeling. The core aspects of these themes are captured in Figure 5, which is based on the distribution of words associated with each theme. A word cloud has been generated to visually emphasize the significance of the terms. This diagram displays the top 15 terms, ordered by their likelihood, with the size of each term indicating its probability.

Word clouds of the topics in literature of China Biosecurity Law. The word clouds depicted illustrate the respective topics derived from the analysis. Within each diagram, the prominence of a word is directly proportional to its frequency, making the most significant terms stand out visually.
The literature on China’s Biosecurity Law reveals eight interrelated thematic areas, each reflecting distinct yet interconnected facets of biological safety, governance, and international engagement. Public health and governance emerge as a central theme, highlighting the interplay between epidemic management, multi-level governmental coordination, legislative frameworks, and the integration of human and animal health measures. Closely linked is genetic modification and food safety, which underscores the role of biotechnology in agriculture, the necessity of stringent GMO regulations, and the balance between technological benefits, biodiversity conservation, and international harmonization of safety standards. National policy and international relations capture the strategic and legislative dimensions of biosecurity, spanning domestic lawmaking, public safety imperatives, and diplomatic engagement, with policies evolving through identifiable phases to address national and transboundary risks.
Legal principles and risks reflect the foundational legal doctrines guiding biosecurity governance, particularly at the intersection of technological innovation, ecological protection, and the rule of law, emphasizing adaptive legal approaches to emerging biotechnological and environmental challenges. Ecology and environmental protection centers on conserving biodiversity and ecosystem health as a defense against zoonotic disease transmission, combining legislative measures, technological tools, and cultural values to sustain ecological integrity. Societal impact and disease control addresses the broad consequences of infectious disease outbreaks, advocating for coordinated national and global responses, robust legal and systemic frameworks, and the integration of technology into surveillance, diagnostics, and treatment.
Agricultural technology and biosecurity focus on the convergence of genetic and technological innovations with robust quarantine measures, legislative oversight, and continuous risk monitoring to protect agricultural productivity and prevent biological threats. Finally, laboratory standards and international trade emphasize the importance of rigorous testing, adherence to legal frameworks, and customs enforcement in ensuring the safe cross-border movement of biological materials, with laboratories serving as critical nodes for quality assurance and pathogen detection.
Taken together, these themes outline a comprehensive and multi-scalar perspective on China’s biosecurity discourse in the academic literature—one that integrates governance, law, science, technology, environmental stewardship, and international collaboration to address the complex challenges of biosecurity in an interconnected world. These themes are summarized in Table 3, with further details available in Supplemental Table 2.
Each Topic Meaning and Related Words.
Topic Popularity
After employing Latent Dirichlet Allocation (LDA) to extract topics from the literature on the research of the Chinese Biosecurity Law, this study further delves into the popularity of each topic in conjunction with temporal data. This approach aids in assessing the popularity of various topics and identifying those that are most prominent versus those that are less so, thereby shedding light on the interests of Chinese scholars in this domain.
In the analytical assessment presented in Table 4 and Figure 6, eight distinct thematic areas emerge, each exhibiting varying degrees of prominence based on their normalized probability and trend scores. At the forefront is Legal Principles and Risks (T4), which attains the highest popularity score (
Topic Popularity of China’s Biosecurity Law Literature.
Note. Rank 1 means the most popular topics of all, which is Legal Principles and Risks (T4).

Topic popularity of published paper dataset.
To assess the robustness of these popularity rankings, a leave-1-year-out (LOYO) analysis was performed, with results illustrated in Figure 7. The rank variation plots demonstrate that the leading topics, notably Legal Principles and Risks (T4) and Societal Impact and Disease Control (T6), consistently maintain their top positions regardless of the exclusion of any single year. Similarly, Genetic Modification and Food Safety (T2) and Laboratory Standards and International Trade (T8) exhibit relatively stable rankings, underscoring their enduring relevance. In contrast, Public Health and Governance (T1) and Agricultural Technology and Biosecurity (T7) reveal greater fluctuations, indicating that their relative prominence is more sensitive to specific years of publication activity.

Leave 1 year out topic rank variation.
Figure 8a and b further quantify this stability by reporting the mean and maximum rank variation across topics. Topics such as T3 and T4 exhibit minimal deviations, highlighting their robustness, while T1 and T7 display the highest levels of instability, with maximum rank shifts of more than six positions. Taken together, these robustness checks confirm that the core hierarchy of popular topics is not an artifact of temporal distribution, lending confidence to the validity of the observed trends.

(a) Mean rank variation of topics based on standard deviation; (b) Max rank variation.
Topic Comparison and Legal Implications
This study also employed topic similarity comparison to identify the closely related themes in the literature on the Chinese Biosecurity Law and research on the Chinese Biosecurity Law. Taking the Chinese Biosecurity Law as a benchmark, this study aimed to explore whether the research by Chinese scholars in this field aligns with the basic content of the law. The comparative results can provide some guiding significance for future researchers or legislators.
The Figure 9 analysis reveals the thematic correlations between CBL (China Biosecurity Law) and CBLL (China Biosecurity Law Literature). Overall, the alignment between these two bodies of text remains moderate, with only a few topics demonstrating close correspondence (Overall similarity: 0.223). For instance, Biological Risks and Response Measures (CBL Topic 6) shows the highest similarity with Legal Principles and Risks (CBLL Topic 4), suggesting a shared focus on regulatory mechanisms and risk preparedness. This topic also aligns to a lesser extent with Societal Impact and Disease Control (CBLL Topic 6) and Agricultural Technology and Biosecurity (CBLL Topic 7), both of which emphasize the practical implementation of risk response systems.

The columns in the picture represent China’s Biosecurity Law, Topic 1 to Topic 6 respectively represent: Epidemic and Rural Development (T1), Biological Resources and Scientific Experiments (T2), Biological Safety Mechanisms and Management (T3), Protection of Flora and Fauna and Medical Institutions (T4), Laws and Regulations and Safety Measures (T5), Biological Risks and Response Measures (T6). The topics of research literature on China’s biosecurity laws represented by the industry, from Topic 1 to Topic 8 are: Public Health and Governance (T1), Genetic Modification and Food Safety (T2), National Policy and International Relations(T3), Legal Principles and Risks (T4), Ecology and Environmental Protection (T5), Societal Impact and Disease Control (T6), Agricultural Technology and Biosecurity (T7), Laboratory Standards and International Trade (T8).
Another notable correspondence is observed between Biological Safety Mechanisms and Management (CBL Topic 3) and National Policy and International Relations (CBLL Topic 3), indicating that institutional coordination and global engagement have been addressed both in legislation and in academic discourse. Similarly, Protection of Flora and Fauna and Medical Institutions (CBL Topic 4) correlate with Genetic Modification and Food Safety (CBLL Topic 2), reflecting shared concerns about biodiversity protection and the regulation of bioengineered products.
However, significant disparities are also evident. Epidemic and Rural Development (CBL Topic 1) displays minimal correlation with Genetic Modification and Food Safety (CBLL Topic 2), highlighting a legislative gap in addressing biotechnological risks in rural health contexts. Likewise, Biological Resources and Scientific Experiments (CBL Topic 2) show weak alignment with Laboratory Standards and International Trade (CBLL Topic 8), suggesting a lack of legal coverage for global biosafety norms and laboratory internationalization.
These mismatches reveal areas where legislative refinement may be warranted. Specifically, Chapter III of the Biosecurity Law, which governs the prevention and control of epidemics and animal/plant diseases, could be updated to explicitly incorporate genetically modified organisms (GMOs) and their relevance to rural public health and agricultural systems. Such an update would respond to the concerns expressed in CBLL Topic 2.
Similarly, Chapter IV, which addresses biotechnology research and applications, could be improved by integrating international lab safety standards and cross-border biosafety governance practices. These elements are actively discussed in the literature under CBLL Topic 8, yet remain underdeveloped in the current legal text.
In addition, although Chapter VI does mention invasive species and biodiversity protection (e.g., Article 60), the ecological risks explored in CBLL Topic 5—such as ecosystem degradation and ecological restoration—are not fully reflected in current provisions. Introducing operational guidelines related to ecological impact assessments and habitat restoration could strengthen this area of the law.
Overall, the analysis not only reveals thematic gaps between law and literature, but also identifies specific chapters—Chapter III, IV, and VI—that could be enhanced through better integration of emerging academic insights. Bridging these gaps may promote a more adaptive, forward-looking, and evidence-informed biosecurity governance framework in China.
Discussion
This study provides a detailed examination of China’s Biosecurity Law and related research literature by integrating machine learning with bibliometric analysis, enabling the identification of structural patterns and thematic discrepancies that may elude traditional legal analysis (Lehr & Ohm, 2017). The results reveal six core themes within the law: (1) the intersection of biosecurity and rural development, emphasizing agricultural resilience, ecological protection, and coordinated governance ( Sridhar et al., 2023); (2) the balance between biological resource utilization and scientific experimentation, underscoring regulatory oversight and standardized management (D’Amato & Korhonen, 2021; Grzywacz et al., 2014); (3) mechanisms and management frameworks for biosecurity, requiring clear responsibilities, risk assessment, and timely responses; (4) the protection of biodiversity and medical institutions as interdependent components of public health (Hulme, 2021); (5) the refinement of legal regulations and safety measures, supported by international cooperation and enforcement (Linkous et al., 2021); and (6) risk monitoring and response, integrating legal mandates, technological measures, and public awareness (Hao et al., 2022).
The topic modelling results reveal eight complementary themes: public health and governance aligned with “Healthy China 2030” (Zhang & Gong, 2019); genetic modification and food safety within global regulatory debates; national policy and international relations; legal principles and risks as the most prominent research focus; ecology and environmental protection; societal impact and disease control, closely linked to pandemic responses (Gao & Yu, 2020; Kim & Kreps, 2020; You et al., 2024); agricultural technology and biosecurity; and laboratory standards in international trade. The prominence of Legal Principles and Risks (T4) and Societal Impact and Disease Control (T6) reflect their theoretical and practical significance in advancing rule of law, governance capacity, and public health preparedness. Their popularity is reinforced by alignment with national priorities, funding concentration, and interdisciplinary relevance (Ahmed & Wahed, 2020; Bolt et al., 2021; D’Este & Robinson-García, 2023; Forestier & Kim, 2020). Lower popularity for other topics likely reflects resource allocation, expert distribution, and strategic alignment, rather than intrinsic value.
While the literature exhibits broader thematic coverage than the legal text, this should not be interpreted as evidence that scholarly research directly influences legislation. Although some regulatory domains such as synthetic biology have called for greater engagement of scientific expertise and stakeholder deliberation, no direct citations from academic publications have been identified in legislative drafts or committee reports. The connection between research and law appears correlative rather than causal, largely reflecting overlapping concerns and shared national priorities such as public health and biosafety.
Comparative perspectives reinforce this interpretation. In the European Union and the United States, researchers have documented similar misalignments between scientific discourse and regulatory frameworks. For instance, Bogner and Torgersen (2018) highlight the persistent gap between technoscientific innovation and governance structures in biotechnology, where regulatory tools often lag behind academic knowledge. Moreover, empirical studies of regulatory impact assessments in the United States have shown that scientific evidence is inconsistently cited, often depending on political salience rather than academic relevance (Costa et al., 2015). These observations suggest that the misalignments identified in China’s biosecurity domain may be generalizable rather than unique. Comparative analysis shows partial thematic alignment between the law and literature. While the law emphasizes technical and administrative provisions, the literature expands to policy, legal, and societal dimensions. Cosine similarity results highlight strong alignment between Biological Safety Mechanisms and Management (T3) and Legal Principles and Risks (T4), as well as between Laws and Regulations (T5) and Societal Impact and Disease Control (T6), and between Biological Risks (T6) and Laboratory Standards (T8). However, these alignments should be interpreted cautiously, as they reflect thematic similarity rather than directional influence from academic research to legislative content.
To address the identified gaps and improve the practical implementation of biosecurity governance, a series of policy recommendations are proposed and organized according to their feasibility and the corresponding responsible agencies. In the near term, priority should be given to increasing investment in rural biosecurity infrastructure. This effort, to be led by the Ministry of Agriculture and Rural Affairs (MARA), aims to strengthen local capacity for epidemic prevention and pest control. Simultaneously, oversight on the utilization of biological resources should be enhanced through coordinated efforts by the Ministry of Science and Technology (MOST) and the Ministry of Ecology and Environment (MEE), ensuring regulatory compliance and minimizing misuse. Public awareness is also a critical component; thus, the National Health Commission (NHC), in collaboration with the Ministry of Education (MOE), should initiate nationwide biosecurity education campaigns to improve public understanding and participation.
In the mid- to long-term, it is essential to revise the Biosecurity Law to incorporate emerging scientific and technological domains, such as synthetic biology and digital bio-risk modeling. This legislative revision would require engagement from the Law Committee of the National People’s Congress (NPC), ensuring that regulatory frameworks remain adaptive and forward-looking. In parallel, international legal harmonization should be promoted through the Ministry of Foreign Affairs (MFA) and relevant academic-policy platforms, aligning China’s biosecurity system with global standards. Finally, a more integrated governance framework should be constructed, involving MARA, MEE, and NHC, to facilitate coordinated risk management, cross-sectoral data exchange, and early-warning system development.
The observed divergence between legislation and academic research may stem from both technical and institutional causes. In some cases, legislative omissions may reflect political prioritization or regulatory inertia rather than a lack of scientific knowledge. Furthermore, the lack of formal channels for integrating academic input into legislative processes—such as expert consultation mechanisms or impact assessments—may limit the law’s responsiveness to evolving research findings.
Several limitations should be acknowledged. First, the literature dataset is derived exclusively from CNKI, which may introduce coverage bias due to the underrepresentation of non-mainland Chinese and international publications. Second, language bias is possible, as studies published in English or other languages were excluded. Third, the preprocessing stage—particularly word segmentation and keyword translation—may introduce errors in boundary detection or semantic clarity, especially for legal and technical terms. Fourth, the topic modeling approach employed—Latent Dirichlet Allocation (LDA)—is based on the bag-of-words assumption, which may overlook important contextual and syntactic nuances. Lastly, while cosine similarity is used to assess the degree of alignment between legal and academic themes, no formal statistical tests were applied to determine the significance of these scores. Future studies could address this by incorporating permutation-based tests, bootstrapped confidence intervals, or regression-based approaches to more rigorously evaluate the robustness of thematic similarity. Additionally, transformer-based models (e.g., BERTopic, CTM) and cross-lingual corpora could be employed to enhance semantic depth and generalizability.
Conclusion
This study systematically compared the thematic structures of China’s Biosecurity Law and related academic literature, offering a data-driven perspective on the alignment between legislative content and scientific discourse. Using topic modeling and cosine similarity analysis, six core legal themes and eight dominant research themes were identified. While some areas of convergence were observed—such as legal risk management and public health—noticeable gaps emerged in domains like rural biosecurity, international standards, and GMO regulation. The overall similarity score of 0.223 indicates a moderate level of alignment, underscoring the need for more responsive and research-informed legislation. By integrating machine learning with legal and bibliometric analysis, this study provides a replicable framework for evaluating research–policy coherence. While the approach demonstrates clear utility, its contributions should be considered alongside certain methodological limitations, including restricted data sources, language constraints, and model assumptions. Future research may enhance this framework using cross-lingual corpora, transformer-based models, and statistical validation methods.
Supplemental Material
sj-docx-1-sgo-10.1177_21582440251396540 – Supplemental material for Does Literature Research Have Any Impact on Legislation? A Case Study of China’s Biosecurity Law
Supplemental material, sj-docx-1-sgo-10.1177_21582440251396540 for Does Literature Research Have Any Impact on Legislation? A Case Study of China’s Biosecurity Law by Yang Liu in SAGE Open
Footnotes
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Declaration of Generative AI in Scientific Writing
After completing the writing of this manuscript, the author used ChatGPT-4 solely to improve the language and enhance readability. The tool was not used to generate original content, and the author carefully reviewed and edited the output as necessary. The author takes full responsibility for the content of the published article.
Data Availability Statement
All data analyzed in this study were obtained from publicly accessible sources, including the CNKI database and the official website of the Biosecurity Law. Due to CNKI licensing restrictions, full-text articles cannot be shared, but metadata and analysis code are available from the author upon reasonable request. Specific links to data sources are provided in the Data Source section.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
