Abstract
Background
The increasing body of evidence has been stimulating the application of artificial intelligence (AI) in precision medicine research for lung cancer. This trend necessitates a comprehensive overview of the growing number of publications to facilitate researchers’ understanding of this field.
Method
The bibliometric data for the current analysis was extracted from the Web of Science Core Collection database, CiteSpace, VOSviewer ,and an online website were applied to the analysis.
Results
After the data were filtered, this search yielded 4062 manuscripts. And 92.27% of the papers were published from 2014 onwards. The main contributing countries were China, the United States, India, Japan, and Korea. These publications were mainly published in the following scientific disciplines, including Radiology Nuclear Medicine, Medical Imaging, Oncology, and Computer Science Notably, Li Weimin and Aerts Hugo J. W. L. stand out as leading authorities in this domain. In the keyword co-occurrence and co-citation cluster analysis of the publication, the knowledge base was divided into four clusters that are more easily understood, including screening, diagnosis, treatment, and prognosis.
Conclusion
This bibliometric study reveals deep learning frameworks and AI-based radiomics are receiving attention. High-quality and standardized data have the potential to revolutionize lung cancer screening and diagnosis in the era of precision medicine. However, the importance of high-quality clinical datasets, the development of new and combined AI models, and their consistent assessment for advancing research on AI applications in lung cancer are highlighted before current research can be effectively applied in clinical practice.
Introduction
Lung cancer has been a serious threat to human health. Approximately 80–90% of lung cancers are caused by smoking, and are also associated with secondhand smoke, radon exposure, coal combustion, occupational exposure to carcinogens and cooking fumes, and air pollution.1–3 The latest Global Cancer Statistics 2022 report has been shown that lung cancer was the most frequently diagnosed cancer in 2022, responsible for almost 2.5 million new cases, or 1 in 8 cancers worldwide (12.4% of all cancers globally). 4 Complications arising from lung cancer severely reduce the quality of life and life expectancy of patients. The 5-year survival rate for patients is only 10–20% after diagnosis, attributed to the fact that lung cancer is often not diagnosed until late in life and has a poor prognosis. 5 During the clinical workup of lung cancer, massive multidimensional datasets including text, images, vital sign data, genetic data, and other rich data types have been generated.6,7 Thorough and iterative statistical data, analysis, and reading images or pathology slides to make clinical decisions lead to physician exhaustion. In addition, high false-positive and false-negative results,8,9 cost-effectiveness, 10 and other issues in daily practice pose challenges for precision medicine. 11
In recent years, the emerging artificial intelligence (AI) holds certain potential for solving these problems, the holistic definition of AI is quite broad, and it is regarded in the medical field as applications or technologies capable of learning and recognizing patterns and features from large amounts of representative data, which imitate the cognitive functions associated with human thought.12,13 This information is then integrated into a domain-specific decision-making process (Figure 1). This application includes datasets for training, preprocessing methods, algorithms for generating predictive models and speeding up model construction, and pretrained models that inherit and utilize the experience of previous generations. 14 The core of AI is machine learning, which includes powerful algorithms such as deep learning, convolutional neural networks (CNNs), decision trees, etc.15,16 These advanced AI algorithms build models that enhance medical image and data analysis, enabling them to efficiently analyze multidimensional datasets, enhance image analysis and interpretation, and provide decision support systems that allow researchers and clinicians to navigate the complex lung cancer data to provide valuable insights and recommendations for precision medicine.17,18 Currently, the value of AI in clinical decision making in lung cancer is being revealed in a growing number of clinical and experimental studies, including lung cancer screening, 19 assisting in lung cancer diagnosis, 20 prediction, 21 and assessing the treatment efficacy and prognosis. 22 This will attract both newcomers and seniors to consider research topics in this field, driving the field to an urgent need for a systematic description of the current state of research, development processes, and future research hotspots.

The general process of artificial intelligence model building.
Bibliometrics is widely used in the fields of medicine, architecture, and psychology.23,24 Different from traditional reviews with specific subjective characteristics, bibliometrics is an interdisciplinary discipline that analyzes knowledge carriers quantitatively via mathematical and statistical methods. It not only reveals important bibliometric indicators such as authoritative and productive countries, authors, journals, institutions, etc., to further identify research themes in the field, but also identifies highly cited key literature and keywords so as to explore research hotspots and frontier directions, and visually presents a panoramic view of the research field. This study aims to summarize global research trends and hotspots by identifying core contributing authors, institutions, countries, and regions in the field as well as visual measurements of keywords and cited literature to provide new perspectives on future directions for researchers and clinical decisions for clinicians.
Methods
Database sources and search strategy
We searched the PubMed database for Medical Subject Headings (MeSH) terms to help identify search terms. The Web of Science Core Collection was then selected as the original database for evaluation of the publications of more than 12,000 core journals. The retrieval time is set to 15 March 2024. Table 1 shows our search strategy. XL and WZ conducted a screening to include only original research articles and literature reviews and excluded publications that did not meet the inclusion criteria. The complete exported plain text records include the article title, author name, abstract, publication date, keywords, citations, etc. The retrieved files were deduplicated by CiteSpace to obtain a total of 4062 valid records, including 122,308 references. Figure 2 displays the flow of data extraction and analysis.

The flow of data extraction and analysis in the study (by Figdraw).
The topic search queries.
Statistical analysis
CiteSpace is a Java-based bibliometric software developed by Professor Chaomei Chen. 25 It enables quantitative analysis of domain-specific literature (collections) to explore valuable information and knowledge about the evolution of subject areas. CiteSpace parameter settings: set time partitioning parameters, the time slice is set to 1 year, TOP N is set to 50, the rest are kept as the system default, generate the author cooperation network; the threshold value (Top N% per slice) is selected as 30, generate the institutional cooperative network map and the burst keywords map.
VOSviewer, developed by Professors Van Eck and Waltman, is a document visualization software. It analyzes the frequency of co-occurrence of keywords and the co-citation frequency of cited literature to determine the relationship between topics. This clarification helps to understand the research content and structure of the field.26,27 The analysis type is set to co-occurrence; the “complete count” option is selected; the minimum occurrence of the keyword is set to five based on the research requirements. Select network visualization to generate a keyword co-occurrence knowledge map; select density visualization to generate a co-occurrence knowledge map of the cited literature.28,29
An online bibliometrics website (http://bibliometric.com/) is used to visualize the national cooperation network. To further analyze the scientificity of the studies, the retrieved articles and journals were checked for the latest impact factor (IF) and the number of citations.
Result
Over trend
The first article was published back in 1992. 30 Nevertheless, the past 33 years can be divided into 2 periods (the search date for 2024 ends on March 15) (Figure 3(a)). From 1992 to 2013, when the number of publications was low and growing slowly, accounted for only 7.73% of the entire publications, with an average of 14 publications per year; from 2014 to 15 March 2024, when the number of publications accounted for 92.27% of the total, with an average of 344 publications per year, reaching 1003 publications in 2023. Overall, the number of publications on AI applications to lung cancer research has grown rapidly each year over the past decade, demonstrating the growing academic interest in the field (Figure 3(b) and (c)).

(a) Distribution of national annual publications. (b) Visual map of cross-country/regional collaborations. The thickness and quantity of boundaries between countries reflect the frequency of collaboration. (c) Geographical distribution: map of the geographical distribution based on the total number of publications in different countries/regions.
Country and institution distribution
Centrality is an important indicator for evaluating the importance of nodes in the network, and the higher the centrality, the larger the weight of the node in the network. 31 Eighty-nine countries/regions contributed to the publications, and the top four countries were China, the United States, India, and South Korea. The top four countries in terms of centrality (Table 2) are the United States (0.15), India (0.14), England (0.12), and Germany (0.09). China has produced the largest number of publications since 2008, accounting for 22.63%. Many countries around the world have participated in and enriched research in this field since the beginning of the twenty-first century, especially in East Asian countries.
Top 10 productive countries and institutions.
In total, 742 institutions are involved in the development of this research field. We list the 10 most productive institutions including specific information (Table 2). Shanghai Jiao Tong University is the most productive institution, with 88 publications, indicating its great contribution to this field.
Author analysis
The top 10 most prolific authors in AI applied to lung cancer research are listed in Table 3, as well as their H-index, total citations, and affiliation. The most productive author was Li, Weimin from West China Hospital, China, with 24 articles, an H-index of 44, and 380 citations. The most cited author was Aerts, Hugo J. W. L from the Netherlands, who works at Harvard Medical School and Maastricht University, and has published 21 articles with an average of 249.43 citations per article.
The top 10 prolific authors.
Journal analysis
The 4062 records cover 1046 journals. Table 4 listed the top 10 journals in which the research results of AI applications for lung cancer are mainly published. “Frontiers in Oncology” is ranked first for the number of publications. Moreover, “Medical Physics” is ranked first in terms of citation frequency. The average Impact Factor (IF) of the top 10 journals was 4.57, and the average number of citations for these journals was 1451.2. In total, 1046 journals were involved in 135 categories of Radiology Nuclear Medicine, Medical Imaging, Oncology, Computer Science, Biomedical Engineering, etc.
The top 10 productive journals.
Highly co-cited references analysis
The top 10 most highly cited papers in AI applied to lung cancer are listed in Table 5, and they have been co-cited more than 11,000 times. The most cited article was “Computational Radiomics System to Decode the Radiographic Phenotype,” published by Van Griethuysen, JJM from the Netherlands Cancer Institute in Cancer Research in 2017, cited 3083 times in 9 years. This paper described the workflow and architecture of “PyRadiomics” and demonstrated its application in characterizing lung lesions. The second-ranked paper investigated how the performance of deep CNNs trained from scratch compared with that of pre-trained CNNs when fine-tuned in a layer-wise manner, specifically when applied to lung medical imaging tasks. The third-ranked paper trained a deep CNN (inception v3) to classify adenocarcinoma, carcinoma, and normal lung tissue. The fourth-ranked paper presented a deep learning algorithm that used a patient's current and prior computed tomography volumes to predict the risk of lung cancer.
The top 10 co-cited publications.
Analysis of keywords and co-citation clustering
Keywords directly reflect the central concept of certain literature. The more occurrences in the same literature, the hotter the research in the field. Closely linked keywords depict the core themes and contents of the field. In addition, co-citation clustering discovers the topics of the research field by visualizing the high co-citation relationships among a set of literature because the references constitute the knowledge base of the field. 42 We, therefore, grouped studies with high relevance to identify the central topics in the field of AI applied to lung cancer. 43 Keyword co-occurrence clustering and co-citation clustering graphs were constructed in VOSviewer (Figures 4 and 5). Table 6 lists the representative keywords featured in each module as well as the relevant literature to help understand better.

Co-occurrence keyword clustering. The size of the circles represents the total frequency of keyword occurrences, the lines indicate the strength of the association between keywords, and the same color means that their co-occurrence is under the same cluster.

Co-citation clustering. Each heading is a reference, and references with relevance form color blocks that can be defined as a cluster.
Four theme clusters and their representative keywords and cited literature.
Cluster #1 (screening)
Cluster 1 focused on the application of AI in lung cancer screening, such as pulmonary nodule, molecular biomarkers, computer-aided detection, automatic detection, false-positive reduction. The earliest application of AI was in lung cancer screening. Various factors, primarily smoking, cause damage to lung tissue and trigger an inflammatory response, which can lead to the formation of nodules or other lesions, and chest X-rays, computed tomography (CT) scans or other imaging techniques are common preventive screening tools. 44
Cluster #2 (diagnosis)
Cluster 2 focused on the application of AI in the diagnosis of lung cancer, such as diagnosis, computer-aided diagnosis, image segmentation, feature, and CT. AI has made significant strides in lung cancer diagnosis, and it has created a noninvasive way of detection. Thanks to the widespread use of whole-section imaging and imaging techniques applied to tissue sections for clinical applications, a wealth of high-resolution pathology images and medical images is available. These images can be used to train AI models in pathology tasks such as lung nodule segmentation, cancer cell identification, and cancer type classification. 45
Cluster #3 (treatment)
Cluster 3 focused on the application of AI in the treatment of lung cancer, such as cell lung cancer, radiotherapy, therapy, image classification, and immunotherapy. AI affords the opportunity to model intelligent treatments through computer systems that hinge on staging, tumor location, histology, and genetic changes, thereby aiding in the interpretation of crucial information concerning a patient's disease. By providing pertinent evidence, AI assists doctors in formulating treatment plans and boosts clinical decision-making efficiency for patients.
Cluster #4 (prognosis)
Cluster 4 focused on the application of AI in the prognosis of lung cancer, such as survival, prognostic factor, risk, risk factor, benign, prognostic value, and survival prediction. Multiple factors are associated with lung cancer prognosis, however, improving prognostic outcomes based solely on these factors can be inefficient and subjective. 46 Predictive models for lung cancer combined with AI can effectively improve the survival of patients with lung cancer by predicting treatment outcomes and shaping personalized clinical care plans. 47
To show the trends of the four clusters, a quantitative visualization of the annual publications for each cluster is given in Figure 6. The results of this visualization align with the findings from the cluster analysis. This may be attributed to AI facilitating the integration of multimodal data, including imaging, genomics, and clinical data, leading to a more comprehensive and accurate assessment of lung cancer risk and prognosis. Conversely, the treatment cluster has the fewest number of published articles because of its complex characteristics. Advances in screening and diagnosis are likely to have a large impact on the foundation of AI applications in lung cancer.

Timeline of publications in four clusters.
Burst keywords analysis
A burst detection module in CiteSpace identifies significant changes in keywords within a period, determining if a topic is declining or rising. A high-breaking keyword indicates rapid growth in interest among researchers. Through burst analysis, research topics, and themes are revealed as they emerge, evolve, and decline, and research hotspots shift. As shown in Figure 7, the top 25 keywords in AI applications for lung cancer from 1992 to 2024 have experienced a dynamic evolution. CNN, CT image, radiogenomics, COVID-19, generative adversarial network, artificial intelligence, deep learning, immunotherapy, deep, and immune checkpoint inhibitor are likely to be the research hotspots in the future.

Twenty-four keywords with strong bursts. Time interval is represented by the blue line, and burst keywords by the red line.
Discussion
The great potential of AI in lung cancer research has led more researchers to consider the research topic in this field. A bibliometric study using information technology as a medium has presented current research results related to the field of AI application in lung cancer, with accurate and intuitive bibliometric indicators and knowledge maps to provide a more comprehensive and objective reference for the evolution process, scientific evaluation, and trend prediction of research topics.
The study reveals that the number of publications on the use of AI in lung cancer has been increasing during 1992 to 2024. The United States and China dominate in terms of the number of publications among the 10 countries. East Asian countries have an advantage in the volume of publications. It may be explained by the high demand for medical resources from their populations and the institutions’ focus on academic collaboration and knowledge exchange. The United States occupies a leadership position in the global cooperation network. Developed countries collaborate more frequently and produce more publications. This phenomenon may be related to good economic support, a well-developed healthcare system, and excellent hardware and software, so that the demand for clinical research, big data collection, and AI model building could be satisfied.48,49 This is a good trend for the wider application of AI in lung cancer.
Five of the top 10 institutions are located in the United States, suggesting that US research agencies are critical in the domain of AI in lung cancer applications and that they may be conducting deeper and more pioneering work. Interestingly, five of these are comprehensive universities, such as the Shanghai Jiao Tong University, the University of California system, and the Chinese Academy of Sciences. This indicates that multidisciplinary crossover and interinstitutional collaboration can increase the productivity and impact of research.
“Frontiers in Oncology” has the highest number of publications, “Medical Physics” has the potential to produce more high-quality papers in the future. The top 10 journals all have high impact factor, citation counts, and JCR divisions and are considered core journals. Remarkably, 116 JCR categories are covered, implying multidisciplinary collaboration for the flourishing of the field.
The top 10 highly cited references mostly appeared post-2016, with research topics including the proposal and application of multiple novel deep learning frameworks, and the application of AI in lung cancer screening and diagnosis. The presentation of new AI models can often contribute to the flourishing of the field.
From the perspective of keyword co-occurrence and co-citation clusters, we can observe that the knowledge base of AI applications in lung cancer is divided into four clusters that are mostly understood: screening, diagnosis, treatment, and prognosis. The statistical results of the annual publication volume for the four clusters reveal that more research is used more widely for lung cancer screening and diagnosis. Novel AI models are being actively applied in the field of research, screening, and diagnostic interpretation of image information. This is attributed to the increasing number of patients undergoing lung cancer screening and early-stage diagnosis, as well as the lower occurrence of complications compared to patients in advanced stages. These factors facilitate patient cooperation with researchers and their understanding of the utilization of these data. The process of standardized data collection ensures the quality and consistency of the data, enabling the models to analyze and interpret diverse image data more reliably. These advancements indicate substantial breakthroughs in the future and may revolutionize early clinical detection and characterization of lung cancer.
The analysis of burst keywords has shown clear evolutionary progress in the application of AI in lung cancer research. The early stage was exploratory due to the limitations of computer hardware and software technology. The United States is at the forefront of research on AI applications in lung cancer, and researchers’ attention is focused on the development and selection of algorithmic models, with the artificial neural network being the most popular algorithm; exploring the feasibility of automated AI detection in cancer, mainly by screening lung nodules in chest CT scans and computer-aided study of P53 tumor suppressor gene. The emergent words include automated detection, P53, solitary pulmonary nodule, carcinoma, cancer, and computer-aided detection.
Limitation
This study has some limitations. First, this study may only include literature from specific databases with specific keywords. This study only covers literature written in English, which may lead to the oversight of research findings published in other languages. However, we believe that these differences may not have altered the overall trend of this study.
Conclusion
This bibliometric analysis reveals a global expansion of research on the application of AI in lung cancer. The substantial increase in publications after 2014 reflects the growing importance of this research field. This study identifies the top institutions, researchers, and journal worldwide involved in the application of AI in lung cancer research. Shanghai Jiao Tong University is the most productive institution of articles, Hugo J. W. L is the most influential author and “Frontiers in Oncology” is the most active journal. Key research areas include screening, diagnosis, treatment, and prognosis. Research hotspots identified include lung nodules, hepatocellular carcinoma, computer-aided diagnosis, image analysis, and the consistency of AI algorithms. In summary, this study provides insights into current trends, key contributors, and research hotspots for AI applications in lung cancer. These findings contribute to the understanding of the field and provide valuable guidance for future AI research in precision medicine for lung cancer and other cancers.
Footnotes
Acknowledgements
The authors would like to thank Fuyuan He and Xue Pan of School of Pharmacy, Hunan University of Chinese Medicine for providing us the research idea.
Contributorship
YW, XL, FH, and XP were involved in conceptualization; YW and WZ in data curation, visualization, and writing—original draft; YW, WZ, and LT in formal analysis; YW and XL in methodology; WL and PH in project administration; XL and XP in supervision;; and SH, FH, and XP in writing—review & editing. All authors have read and agreed to the published version of the manuscript.
Data availability
All data generated or analyzed in this study were obtained from the Web of Science Core Collection database.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors received the following financial support for the research, the National Natural Science Foundation of China (grant no. 82274215), Changsha Science and Technology Plan Project (kq2208192), Hunan Provincial Department of Education Project (22B0379), and Hunan University of Traditional Chinese Medicine University-level graduate innovation project (2022CX75), Hunan Provincial Health Commission, general project (D202313058493), Pharmaceutical Open Fund of Domestic First-class Disciplines(cultivation) of Hunan Province.
