Abstract
Background:
Artificial intelligence (AI) in skin cancer is a promising research field to assist physicians and to provide support to patients remotely. Physicians’ awareness to new developments in AI research is important to define the best practices and scope of integrating AI-enabled technologies within a clinical setting.
Objectives:
To analyze the characteristics and trends of AI skin cancer publications from dermatology journals.
Methods:
AI skin cancer publications were retrieved in June 2022 from the Web of Science. Publications were screened by title, abstract, and keywords to assess eligibility. Publications were fully reviewed. Publications were divided between nonmelanoma skin cancer (NMSC), melanoma, and skin cancer studies. The primary measured outcome was the number of citations. The secondary measured outcomes were articles’ general characteristics and features related to AI.
Results:
A total of 168 articles were included: 25 on NMSC, 77 on melanoma, and 66 on skin cancer. The most common types of skin cancers were melanoma (134, 79.8%), basal cell carcinoma (61, 36.3%), and squamous cell carcinoma (45, 26.9%). All articles were published between 2000 and 2022, with 49 (29.2%) of them being published in 2021. Original studies that developed or assessed an algorithm predominantly used supervised learning (66, 97.0%) and deep neural networks (42, 67.7%). The most used imaging modalities were standard dermoscopy (76, 45.2%) and clinical images (39, 23.2%).
Conclusions:
Most publications focused on developing or assessing screening technologies with mainly deep neural network algorithms. This indicates the eminent need for dermatologists to label or annotate images used by novel AI systems.
Introduction
The rise of artificial intelligence (AI) is transforming medicine by increasing accessibility to healthcare and by optimizing disease management worldwide.1,2 AI is a broad term that encompasses a vast range of technologies designed to mimic human intelligence to accomplish a defined task (Figure 1). An important subfield of AI is machine learning (ML), which uses algorithms and statistical models to learn from past data to achieve a desired outcome such as classifying images based on disease presence. The subset of ML that is used to analyze and understand content in images and video is referred as computer vision. Since the early 2000s, deep learning (DL) is revolutionizing the field of image recognition by replicating the functioning of human neurons through neural layers or networks to identify and predict correlations in large datasets. 3 Datasets can be composed of any type of information (eg, images or text) and are used to train and test an algorithm. Notably, DL is a common ML technique used in medical specialties requiring visual analysis, such as dermatology, ophthalmology, radiology, and pathology. However, while being versatile and effective, DL algorithms require large quantity of imaging data to train upon, which is currently a pressing challenge in medicine.4-6

Definitions of common AI terminology terms. AI, artificial intelligence.
The training of ML algorithms occurs in diverse learning environments: supervised learning (SL), unsupervised learning (UL), and reinforced learning (RL). In SL, the algorithm is provided with images that have already been labelled or annotated by experts (eg, images are already classified according to the presence or absence of disease). In this scenario, the algorithm is given pathological and normal images to learn by example. Next, in UL, imaging databases are not previously labelled or identified by experts. In this case, the algorithm will identify on its own the characteristics that differentiate the images and provide different clusters according to these observations. Ultimately, the research team draws a conclusion based on the results. Last, within RL, the model learns from its execution to optimize its algorithm and reinforce its path to a positively desired outcome. The tuning of the model is decided based on a reward signal, which signifies if an outcome was positive or negative.
Promising research in dermatology has shown the potential of using AI technologies as diagnostic and screening tools for physicians and patients.4,7-11 Multiple studies have reported the efficacy of AI algorithms in identifying skin diseases and differentiating benign from malignant lesions.12-18 In dermatopathology, AI algorithms are being developed to optimize image assessment and disease classification of histopathology slides.19,20 AI can also be used to facilitate the management of patients and to increase access to healthcare. For instance, teledermatology, an emerging field within dermatology, is increasing access to healthcare for patients living in remote areas.21-24 In fact, it was estimated that 53% of dermatology cases can first be assessed through teledermatology. 23 All these diverse applications of AI in dermatology are announced to optimize patient care inside and outside the clinical workflow.25-27
Historically, AI in dermatology was centred on skin cancer detection, 5 and ever since, it has expanded to other diseases such as eczema and psoriasis. 1 As a leader in AI research in dermatology, skin cancer is one of the most advanced field going beyond clinical usage. The emergence of mobile applications that detect, track, and predict patients’ skin cancer outcomes is a novel avenue for disease management.7,15,28-31 The strong interest in utilizing AI in skin cancer stems from the fact that skin cancer is one of the most prevalent types of cancer globally, while being easily treatable when detected early. 32 Due to this, the use of AI technologies to detect and treat skin cancer early becomes critical in the management of this disease. However, despite promising findings and advances, AI is still at its beginning in dermatology.33,34 To date, no literature tracks the features of AI skin cancer articles published in dermatology journals. Analyzing AI skin cancer studies could help understand the evolution and future direction of this leading subfield in dermatology and in medicine as a whole. Additionally, identifying potential hurdles to the implementation of AI in dermatology could facilitate and foster collaboration between the dermatology community and AI researchers, as well as developer. Therefore, the purpose of this study is to analyze and review the evolving trends of AI skin cancer studies that are published in dermatology journals.
Material and Methods
AI publications on skin cancer, namely nonmelanoma skin cancers (NMSCs), melanoma, and studies including both, were retrieved on June 1, 2022, from the Web of Science (WoS). All types of publications were searched in 3 different queries using the WoS Category (WC) keyword “dermatology.” (WC is defined as the WoS journal category. 35 ) All publications were thus issued from dermatology journals. Topic subjects (TS) were personalized according to the query. (TS in WoS is defined as a topic term either present in the title, the abstract, the author keywords, and/or Keywords Plus. 35 ) The first query identified the NMSC studies with the following keywords: ((WC = (Dermatology)) AND TS = (artificial intelligence OR deep learning OR machine learning)) AND TS = (non-melanoma skin cancer OR squamous cell carcinoma OR basal cell carcinoma OR Merkel cell carcinoma OR cutaneous T-cell lymphoma OR Kaposi sarcoma OR dermatofibrosarcoma protuberans). Forty-five publications were retrieved. The second query identified the melanoma skin cancer studies with the following keywords: (WC = (dermatology)) AND TS = (machine learning OR artificial intelligence OR deep learning) AND TS = (melanoma OR melanoma skin cancer). A total of 124 publications were retrieved. In cases where some articles only used “skin cancer” as a keyword, the third query identified skin cancer publications with the following keywords: (WC = (dermatology)) AND TS = (machine learning OR artificial intelligence OR deep learning) AND TS = (skin cancer). A total of 107 publications were retrieved. A total of 276 publications was retrieved. Ninety-four duplicates and 2 publications unavailable online were removed. As seen in similar bibliometric studies, articles’ titles, abstracts, and keywords were screened, followed by a full-text review.36-40 To be eligible, the focus of the publication needed to be on AI and skin cancer. Two independent reviewers (LL and MJC) screened 180 publications by title, abstract, and keywords, followed by a full-text reading. In case of disagreements, authors with experience in AI and/or skin cancer could be consulted (MLK and PL). After reaching consensus, 11 publications unrelated to AI and 1 article that was not published in a dermatology journal were excluded. A total of 168 publications were analyzed: 25 NMSC studies, 77 melanoma studies, and 66 skin cancer studies including both. A bibliometric analysis was performed. The following characteristics were collected from all publications: type of skin cancer, year of article publication, number of citations, study design, type of database(s) used, imaging modalities, article language, document type, and research area.
Among original studies developing a screening algorithm, the following AI characteristics were collected: type of ML process, algorithm performance metrics, types of algorithms and tasks, number of participants, images, and lesions. We considered the 3 types of ML processes as either SL, UL, or RL. Performance algorithm metrics included the maximum area under the receiving operating characteristic (AUROC) and maximum accuracy, sensitivity, and specificity. If there were multiple results for a single metric, only the best score was retained because it demonstrates an algorithm’s ideal performance in the context of its study. When a neural network had more than 1 intermediary layer, we labelled that neural network as a DL algorithm. For the type of algorithm tasks, image classification was defined as associating an image with its correct label (eg, identifying images with cancer). Binary classification was defined as a classification task where there are only 2 labels (eg, classifying malignant from benign lesions). In contrast, multiclass was defined as a classification task where there were more than 2 labels (eg, a situation where a computer algorithm tries to classify 3 different types of cancers). Among studies which developed or assessed a screening algorithm, a 1-way ANOVA test was performed to compare the mean between the 3 skin cancer groups (NMSC, melanoma, and skin cancer studies) for the following variables: algorithm performance metrics, number of participations, lesions, images, and citations. The statistical significance of
Results
Type of Articles
A total of 168 articles on AI and skin cancer were retrieved: 25 focusing on NMSCs [predominantly basal cell carcinoma (13, 52.0%) and squamous cell carcinoma (10, 40.0%)], 77 on melanoma studies, and 66 on both (data available at https://data.mendeley.com/datasets/r2rfktpj4c). The most common types of skin cancers were melanoma (134, 79.8%), basal cell carcinoma (61, 36.3%), and squamous cell carcinoma (45, 26.9%). According to the reprint address countries, the most productive countries were the United States (50, 30.0%), Australia (16, 10.0%), and Germany (13, 8.0%; Supplemental Figure S1). All articles were published between 2000 and 2022, with 49 (29.2%) of them being published in 2021 (Supplemental Figure S2). The top 4 journals were
The mean number of citations for NMSCs, melanoma, and skin cancer studies were 17 [standard deviation (SD): 21], 25 (SD: 29), and 29 (SD: 49), respectively. There were 95 (56.5%) original studies, followed by 57 (33.9%) reviews, 8 (4.8%) editorial material, 4 (2.4%) commentary articles, and 4 (2.4%) book chapters. The most used imaging modalities were standard dermoscopy (76, 45.2%), clinical images (39, 23.2%), reflectance confocal microscopy (17, 10.1%), total body digital photography (17, 10.1%), optical coherence tomography (12, 7.1%), macroscopic image (8, 4.8%), histology slide (7, 4.2%), and whole-slide images (6, 3.6%).
AI Variables of Original Research Studies That Developed or Assessed a Screening Algorithm
Original research studies that developed or assessed a screening algorithm were analyzed (Supplemental Table S1). NMSC studies had a larger proportion of original research studies that developed or assessed a screening algorithm (20, 80.0%), compared to melanoma (28, 41.8%) and skin cancer studies (19, 28.8%). Among 68 studies, 66 used SL (97.0%). SL is the predominant ML process in all 3 groups: 95.0% (15) for NMSCs, 96.4% (27) for melanoma, and 100.0% (19) for skin cancer studies. Among 62 studies, the most common types of algorithms were deep neural networks (42, 67.7%), decision trees (11, 17.7%), and support vector machine (8, 12.9%). Skin cancer studies used the most neural network algorithms (15, 93.8%), followed by NMSC (16, 80.0%) and melanoma studies (11, 42.3%). The most common types of algorithm tasks were image classification (39, 62.9%), computer vision (35, 56.5%), binary classification (25, 40.3%), and multiclass classification (21, 33.9%). Computer vision was the most frequent algorithm task for NMSC (10, 55.6%) and skin cancer studies (12, 70.6%), while image classification was the most frequent algorithm task for melanoma studies (23, 85.2%). There were no significant differences between NMSCs, melanoma, and skin cancer studies for algorithm performance metrics (AUROC:
Discussion
The objective of this article was to identify the characteristics and trends of AI skin cancer articles published in dermatology journals. Melanoma (134, 79.8%) was the most represented skin cancer, followed by basal cell carcinoma (61, 36.3%) and squamous cell carcinoma (45, 26.9%). The high prevalence of articles that used image classification (39, 62.9%) and computer vision (35, 56.5%) reinforces the main use of AI for screening pathologies through image recognition.41,42
The Growth of AI in Dermatology
Most articles (143, 83.6%) were published in the past 5 years since 2018, which shows a growing trend in AI applied to the field of skin cancer. However, the mean number of citations (25, SD = 36) was lower than expected. As a reference, the mean number of citations in the top 100 most cited skin cancer publications was 558.5. 43 This citation gap may be explained by a very limited number of dermatologists working in AI. As AI is not formally taught during residency training, this may explain the limited presence of dermatologists actively engaged in AI research. Of note, most authors of AI skin cancer articles that developed or assessed algorithms were non-dermatologists, which shows a lack of physician representation in this field of research. 44
The Need of Dermatologists in the Deployment of AI Technologies
Our findings reveal that 97.0% (66 out of 68) of original research studies developing or assessing a screening algorithm primarily employed SL, while 67.7% (42 out of 62) utilized deep neural networks. As SL refers to the use of pre-annotated images to train upon, the use of SL emphasizes the crucial need for qualified skin experts to label or annotate images (eg, benign vs malignant lesions) to generate sufficient and validated databases. However, the increasing use of DL represents a major challenge for dermatologists and other clinicians, namely due to the unsolved black box dilemma. The black box dilemma can be defined as the impossibility to understand an algorithm’s decision-making to the answer (ie, the features that are important within intermediate layers). 45 DL algorithms analyze information through their connected neural layers. However, it is impossible to understand the relationship existing between these layers. As technologies relying on DL algorithms will become more prevalent in dermatology practice, digital health training would be pertinent to inform healthcare professionals of the underlying mechanisms behind these technologies. 46
Overview of Challenges in AI Skin Cancer Research
Among 81 studies that specified their database(s) used to develop at least 1 algorithm, 60 (74.0%) were private. This finding highlights potential issues, namely the access to data, the possibility to reproduce results, and to verify the quality of databases.
Data access and scarcity
Access to imaging data is a current challenge since algorithms require large quantities of imaging data (eg, generally over 10,000) to learn from. 34 As a reference, we report a total number of images of 38,516 (SD: 163,940) among 49 studies which developed or assessed an algorithm to screen pathologies. As neural network and DL algorithms need a tremendous number of images to train upon, 47 the increasing use of these types of algorithms is an indicator of the perpetuation of this issue. Collaboration between clinicians, hospitals, and engineering teams will be crucial to ensure enough training and validating data. 34 A potential solution to overcome data sharing challenges is federated learning, which allows an algorithm to be trained across the databases of multiple institutions without transferring the data.48,49
Data quality and representation
The quality of databases is crucial to any AI studies since it is the algorithm’s source of learning. It is important to assess the quality of databases before training because if biases are present, the algorithm will replicate them. Biases in dermatology databases such as the absence of certain ethnic and demographic groups, including skin color and indigenous patients have been reported.50-53 Researchers should be aware of existing bias in certain databases and trained models as well as steps that can be undertaken to limit them.50,51,54,55
Limitations
We only retrieved articles published in dermatology journals, which limits the scope of this study. Articles were retrieved on June 1, 2022, thus the total number of publications in 2022 is not accurately represented. Some article variables were not found, especially in AI algorithm performance metrics. This could be explained by a lack of standardization of methodology and result reporting in AI studies.56,57 Performance metrics were collected by taking the best score for each metric (AUROC/AUC (Area Under the ROC Curve), accuracy, sensitivity, specificity). In future studies, the use of automated computer scripts or DL to extract variables could be used to verify data, 58 as a portion of the results in this article were curated manually by reading through the articles that were accessible at the time of writing of this article. Using proven DL methods or automated computer scripts, along with relevant checks to ensure the accuracy of computer-based results, would potentially reduce human error in processing and analyzing data.
Future Directions
While AI remains at its beginning in dermatology, AI in skin cancer is a growing research field, particularly through computer vision and image classification. Mobile applications equipped with image classification models are becoming prevalent, and it becomes crucial to ensure that the emerging use cases of screening applications are safe and regulated, while outputting diagnostics that are representative across different demographic groups.59-62 Most important, the increasing prevalence of SL in dermatology is indicative of the urgent need for dermatologists to annotate images and be involved in the validation of imaging databases. As the applications of AI become more widespread in clinical practice and in dermatology patients’ daily lives, it becomes essential for clinicians to have access to updated documentation to bridge everyday clinical practice with the integration of such AI-powered tools. 62
Supplemental Material
sj-docx-1-cms-10.1177_12034754241229361 – Supplemental material for The State of Artificial Intelligence in Skin Cancer Publications
Supplemental material, sj-docx-1-cms-10.1177_12034754241229361 for The State of Artificial Intelligence in Skin Cancer Publications by Maxine Joly-Chevrier, Anne Xuan-Lan Nguyen, Laurence Liang, Michael Lesko-Krleza and Philippe Lefrançois in Journal of Cutaneous Medicine and Surgery
Supplemental Material
sj-tif-2-cms-10.1177_12034754241229361 – Supplemental material for The State of Artificial Intelligence in Skin Cancer Publications
Supplemental material, sj-tif-2-cms-10.1177_12034754241229361 for The State of Artificial Intelligence in Skin Cancer Publications by Maxine Joly-Chevrier, Anne Xuan-Lan Nguyen, Laurence Liang, Michael Lesko-Krleza and Philippe Lefrançois in Journal of Cutaneous Medicine and Surgery
Supplemental Material
sj-tif-3-cms-10.1177_12034754241229361 – Supplemental material for The State of Artificial Intelligence in Skin Cancer Publications
Supplemental material, sj-tif-3-cms-10.1177_12034754241229361 for The State of Artificial Intelligence in Skin Cancer Publications by Maxine Joly-Chevrier, Anne Xuan-Lan Nguyen, Laurence Liang, Michael Lesko-Krleza and Philippe Lefrançois in Journal of Cutaneous Medicine and Surgery
Footnotes
Acknowledgements
None.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: PL received grants from the Lady Davis Institute for Medical Research, from the Jewish General Hospital Foundation, and from the Jewish General Hospital Department of Medicine, from the Fonds de recherche du Québec—Santé (#312768 and #324151) for this work.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
