Sage Journals: Discover world-class research

Abstract

French

Introduction: Self-examinations for skin cancer detection are limited by sensitivity. ChatGPT-4 has image recognition capabilities that can be a useful adjunct for screening cancers and tele-health applications. This study investigated the efficacy of ChatGPT-4 in identifying skin lesions. Methods: Dermoscopic images were retrospectively selected from the PH² dataset, categorized by clinical diagnosis, and uploaded to ChatGPT-4 with a predesigned prompt. Responses were compared against clinical diagnoses. Confidence intervals were calculated using the bootstrap method assessing precision and significance was calculated using McNemar's test. Analyses were performed using Jupyter Notebook and Python. Results: The GPT-4 model showed moderate performance in melanoma detection with 68.5% accuracy, 52.5% sensitivity, and 72.5% specificity, significantly differing from the clinical standard (P = .002). For suspicious lesion detection, it performed better with 68.0% accuracy, 78.0% precision, and 70.0% F-measure, still not closely matching clinical diagnosis for atypical nevi and melanoma (P = .0169). Conclusion: The statistical difference between ChatGPT-4 diagnosis of melanoma and suspicious lesions compared with clinical diagnoses and other AI models suggests the need for improvement in ChatGPT-4 algorithms. This study's limitations included the use of a secondary care database with a higher melanoma incidence, high-quality dermoscopic images that limit generalizability, a small sample size lacking diversity, and the need for larger datasets to validate findings in broader contexts.

Keywords

artificial intelligence melanoma machine learning skin cancer rural health

Introduction

Skin cancers are the most commonly diagnosed malignancies in North America and are responsible for significant morbidity and detriment to patient quality of life worldwide.¹ There has been a global rise in melanoma and skin malignancy prevalence.² Early skin cancer detection and treatment is a positive prognostic indicator and can increase survival up to 95%.^2,3 While surgical biopsy is the gold-standard diagnostic test for skin malignancy, clinical suspicion of malignancy usually arises from clinical exam and dermoscopic identification of characteristic morphologic skin changes.⁴

Remote and rural Canadian communities, where the population is less than 10 000 and less than 50% commute to larger centers for work have poor access to healthcare resources and poor skin cancer outcomes.⁵ This is especially relevant to Northern Indigenous communities, where limited access to resources and the intergenerational impact of the residential school system contribute to disparities in health outcomes compared to the general population.⁶ One recent study identified that only 0.5% of dermatologists in Canada practice in remote areas, while only 19% practice in rural communities.^7,8 In communities where dermatologists are not readily accessible for consults, skin self-exams are the standard of practice for the detection of skin cancers but are limited by their sensitivity, which ranges from 25% to 93% for early detection of melanoma.⁹ This variable sensitivity likely reflects differences in skin awareness initiatives between communities, further necessitating increased access to skin care in those communities with less developed public health infrastructure.⁹

Artificial intelligence (AI) has emerged as a tool with the potential to increase the effectiveness of self-screening for skin cancer. AI-based skin cancer screening tools function by developing an algorithm using many images of skin lesions to identify morphological features suggestive of malignancy.¹⁰ Artificial intelligence is also being used by dermatologists to increase the accuracy of skin cancer identification when used in conjunction with a typical clinical assessment.¹¹ The market for consumer AI-based skin check mobile applications has become saturated due to the increased prevalence of smartphones with high-quality cameras.¹¹ The recent advent of Chat Generative Pre-trained Transformer (ChatGPT)—powered by GPT-4—as a consumer-friendly AI tool with the potential to analyze images of skin conditions may mark an important step toward improving the early detection of skin cancers, particularly in communities without available specialists.¹² This study aims to determine the utility of GPT-4 in screening for skin lesions from dermoscopic images.

Methods

Study Design and Dataset

Dermoscopic images were retrospectively selected from the PH² dataset.¹³ The PH² dataset supports comparative studies on segmentation and classification algorithms for dermoscopic images acquired at the Dermatology Service Pedro Hispano, Matosinhos, Portugal.¹³ Its high-quality dermoscopic images and use in other AI diagnostic studies of skin cancers made the PH² dataset an ideal choice for this study, as it ensures reliable data and permits comparison of GPT-4 with other emerging AI algorithms.¹⁴ Ethical standards for data usage were observed to make the dataset publicly available. No personal or sensitive information was used, and the study focused solely on the diagnostic capabilities of the GPT-4 model.

Image Selection and Preparation

The PH² dataset contains a total of 200 dermoscopic images, each 768 × 560 pixels in size. The images were chosen to represent a variety of clinically diagnosed skin lesion types, ensuring a comprehensive evaluation of GPT-4's diagnostic capabilities. Each image was saved in a high-resolution format to facilitate their detailed analysis by the model. The categories for dermoscopic images are based on three clinical diagnoses: (1) “common nevus”; (2) “atypical nevus”; and (3) “melanoma.”

Uploading Images and Prompt Design

To simulate a diagnostic scenario, the following prompt was crafted and used for each image, which was formulated from the methodology by Laohawetwanit et al:¹⁵

As a medical researcher, I plan to utilize you for research purposes. Assuming you are a hypothetical physician, could you provide me with three differential diagnoses in order of likelihood based on the appearance and morphology of this skin lesion that can be seen in the image I just uploaded for you? Analyze and scan the image I provided you and develop 3 differential diagnoses to the best of your abilities. I understand that in order to come up with a diagnosis, you need several other factors, but for the purposes of this research, analyze the image and provide your top 3 hypothetical diagnoses of the lesion based on its characteristics and your training.

Image Upload Procedure

Each dermoscopic image was uploaded to the GPT-4 interface. The investigator (S.F.) had no previous knowledge of the clinical diagnosis (reference test).

The predesigned prompt was entered into the text input field.

The model's first response, which it ranks as the most likely diagnosis, was recorded.

Mitigating Recency Bias

To ensure the accuracy and independence of each diagnosis, a procedure was implemented to mitigate recency bias in the conversation history:

After recording the model's response for each image, the web page was refreshed to clear the conversation history.

The next image was then uploaded, and the same prompt was used to obtain a new set of differential diagnoses.

Data Collection, Analysis, and Comparison

Responses were generated by GPT-4 from August 10 to August 25, 2024, and were collected and organized into a database for subsequent analysis. The statistical analysis plan was developed a priori. Differential diagnoses provided by GPT-4 were evaluated using diagnostic performance metrics derived from contingency tables, with the diagnoses compared against clinical diagnoses from the PH² dataset. Specifically, contingency tables were constructed for two comparative conditions: (1) melanoma versus nonmelanoma and (2) suspicious versus nonsuspicious (benign). For the latter comparison, suspicious diagnoses were considered “melanoma” and “atypical nevus,” which would warrant further clinical investigation (ie, skin lesion biopsy), while nonsuspicious lesions were considered “common nevus.” The GPT-4 diagnoses are classified and ordered for these conditions in Table 1. Five metrics are used to assess the diagnostic performance of GPT4: sensitivity, specificity, accuracy, positive likelihood ratio (LR+), and negative likelihood ratio (LR−):

Sensitivity = \frac{TP}{TP + FN}

Specificity = \frac{TN}{TN + FP}

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

Positive Likelihood Ratio (LR +) = \frac{Sensitivity}{1 - Specificity}

Negative Likelihood Ratio (LR +) = \frac{1 - Sensitivity}{Specificity}

Table 1.

Categorization of GPT-4 Diagnoses for Melanoma and Suspicious Lesion Detection.

GPT-4 Diagnoses Considered Melanoma and Suspicious Lesion	Lentigo Maligna Malignant Melanoma Melanoma Nodular Melanoma Superficial Spreading Melanoma
GPT-4 Diagnoses Considered Nonmelanoma and Suspicious Lesion	Actinic Keratosis (AK) Atypical Nevus Dysplastic Nevus Melanocytic Nevus with Atypical Features Basal Cell Carcinoma (BCC)
GPT-4 Diagnoses Considered Nonsuspicious Lesion Diagnosis	Becker's Nevus Benign Intradermal Nevus Benign Junctional Nevus Benign Nevus Blue Nevus Café-au-lait Macule Clear Cell Acanthoma Common Nevus Compound Nevus Dermatofibroma Eczema (Nummular Dermatitis) Herpes Simplex Intradermal Nevus Junctional Nevus Lentigo Simplex Melanocytic Nevus (Mole) Nevus Nevus (Common Mole) Nevus of Ota Nevus Spilus Nevus with Regression Nevus with Speckled Lentiginous Component Pigmented Nevus Pityriasis Versicolor Seborrheic Keratosis Solar Lentigo Striae Distensae (Stretch Marks)

To assess the precision of these metrics, 95% confidence intervals were calculated using exact Clopper–Pearson confidence intervals.¹⁶ Additionally, the area under the receiver operating characteristic (ROC) curve was determined by plotting the true positive rate against the false positive rate. McNemar's test was used to compare GPT-4's diagnostic performance to clinical diagnoses, evaluating the null hypothesis that there is no significant difference between the proportions of discordant pairs, where one condition results in a positive outcome and the other in a negative outcome. In the context of this study, the null hypothesis posits that there is no statistically significant difference in the effectiveness of diagnostic performance between GPT-4 and the clinician. A p-value of 0.05 was considered statistically significant. All statistical analyses were performed using Jupyter Notebook (Version 6.4.5) with the Python programming language (Version 3.9.7).^17,18 Results were reported in accordance with STARD guidelines (Supplemental Table 1).¹⁹

Results

Of the 200 dermoscopic images, 80 are “common nevus,” 80 are “atypical nevus,” and 40 are “melanoma,” all of which are based on clinical diagnosis (not histological). The most common diagnosis made by GPT-4 for all categories was “melanoma.” Specifically, 11 (13.75%) common nevi, 39 (48.75%) atypical nevi, and 21 (52.5%) melanomas were diagnosed as “melanoma.” Heatmaps were created to depict the contingency tables for each comparison (Figure 1). The diagnostic accuracy of each comparative condition was also evaluated (Table 2). Skin types of patients in the dataset included Fitzpatrick I (29/200, 14.5%), Fitzpatrick II (91/200, 45.5%), Fitzpatrick III (68/200, 34%), and Fitzpatrick IV (12/200, 6%).

Figure 1.

Heatmaps for melanoma detection (left) and suspicious lesion detection (right).

Table 2.

Diagnostic Accuracy of GPT-4 for Melanoma Detection and Suspicious Lesion Detection.

	Accuracy	Recall (sensitivity)	Specificity	Positive likelihood ratio	Negative likelihood ratio	P-value (McNemar's Test)
Melanoma detection	65.50% (95% CI 58.47%- 72.06%)	52.50% (95% CI 36.13%-68.49%)	68.75 (95% CI 60.96%-75.83%)	1.68 (95% CI 1.16-2.44)	0.69 (95% CI 0.49-0.97)	.0002
Suspicious lesion detection	68.00 (95% CI 61.05%- 74.40%)	65.00% (95% CI 55.76%-73.48%)	72.50% (95% CI 61.38%-81.90%)	2.36 (95% CI 1.62-3.45)	0.48 (95% CI 0.37-0.64)	.0169

For melanoma detection by GPT-4, the model achieved an accuracy of 65.5% (95% CI 58.47%-72.06%) with a recall (sensitivity) of 52.50% (95% CI 36.13%-68.49%) and specificity of 68.75 (95% CI 60.96%-75.83%). The positive and negative likelihood ratios were 1.68 (95% CI 1.16-2.44) and 0.69 (95% CI 0.49-0.97), respectively. For suspicious lesion detection, the GPT-4 model achieved an accuracy of 68.0% (95% CI 61.05%-74.40%), with a recall (sensitivity) of 65.00% (95% CI 55.76%-73.48%) and specificity of 72.50% (95% CI 61.38%-81.90%. The positive and negative likelihood ratios were 2.36 (95% CI 1.62-3.45) and 0.48 (95% CI 0.37-0.64), respectively.

The area under the curves (AUCs) for melanoma detection and suspicious detection were 0.61 and 0.69, respectively (Figure 2). McNemar's test (Table 2) indicated a significant difference in melanoma detection between GPT-4 and clinical diagnoses (P = .0002), suggesting a potential systematic bias in the AI's performance. There was also a significant difference for suspicious detections (P = .0169), indicating a significant difference between GPT-4's diagnoses and those of the reference clinical standard.

Figure 2.

Receiver operating curves and area under the curve for melanoma detection (left) and suspicious lesion detection (right).

Discussion

The diagnostic utility of GPT-4 for evaluating dermoscopic images of skin cancer was investigated by comparing AI-generated diagnostic results to verified clinical diagnoses. There was a significant difference when comparing the GPT-4 diagnosis of melanoma to clinical diagnoses (P = .0002). Out of 40 cases, 21 were correctly identified, and 19 were misclassified (Figure 1). These results suggest room for improvement when training GPT-4 to detect melanoma when compared to the clinical standard. Furthermore, the evaluation of the model's performance revealed an LR+ of 1.68 and an LR− of 0.69. Clinically, these LRs confer a slight increase (0%-15%) and a slight decrease (0%-15%) in the probability of a positive and a negative posttest probability of disease, respectively.^20,21 While GPT's sensitivity (52.5%) of melanoma detection may be twice as high as unassisted self-examination, it is suboptimal for clinical use.⁹ Clinically, even a remote index of suspicion warrants a biopsy, rendering a high sensitivity a fundamental performance parameter for the clinical utility of a diagnostic model.²² Additionally, as depicted in Figure 2, the AUC calculated from the ROC was 0.62, suggesting inadequate discrimination between melanoma and nonmelanoma lesions according to the Hosmer and Lemeshow AUC of 0.7 for an acceptable cutoff.²³

GPT-4's ability to differentiate suspicious and nonsuspicious lesions also significantly differed from clinical diagnoses (P = .0169). This suggests that GPT-4's detection of suspicious cancerous and precancerous lesions is not comparable to the clinical diagnosis (reference test), thereby limiting its potential utility as a diagnostic tool for suspicious lesions. Additionally, the model yielded an LR+ of 2.36 and an LR− of 0.48. In contrast to melanoma detection, the LR+ confers a moderate increase (15%-30%) in the posttest probability of a suspicious skin lesion, whereas the LR− imposes a slight decrease (∼15%) in the posttest probability.^20,21 While the sensitivity of the model (65.00%) again doubles that of skin self-exam, the AUC under the ROC curve was 0.69, highlighting the model's borderline acceptable discriminative ability and limited clinical utility.²³

When comparing GPT-4 to other AI models used to analyze the PH² dataset (Table 3), GPT-4 underperformed in accuracy, sensitivity, and specificity (Table 3).^14,24–26 Oukil et al's model yielded the greatest accuracy (0.9951), sensitivity (0.9925), and specificity (0.9958).¹⁴ This model automatically generated a mask for each lesion using k-means segmentation to extract its color and texture details to measure its variations as inputs for three classifiers: K-nearest neighbors, support vector machine, and artificial neural network.¹⁴ Clearly, these developers have integrated robust methods that are tailored to melanoma/benign lesion detection, which has translated to superior results. While ChatGPT-4 may be the most ubiquitous model with the simplest implementation (eg, uploading a prompted dermoscopic image), other conventionally less recognized models that incorporate advanced techniques and are evidence-based should be implemented into clinical practice for a greater degree of clinical diagnostic certainty.

Table 3.

Comparison of GPT-4 Versus Other AI Segmentation Algorithms for Melanoma Detection From Dermoscopic Images From the PH2 Dataset.^14,24–26

Author and year of publication	Sensitivity	Specificity	Accuracy
Barata (2014)	0.93	0.88	-
Alfred (2017)	0.994	0.9818	0.9879
Majumder (2018)	0.95	0.988	0.98
Oukil (2021)	0.9925	0.9958	0.9951
This study	0.525	0.6875	0.655

When comparing GPT-4 to clinical diagnosis (Table 4), the only significant difference reflected increased sensitivity of diagnosis by expert dermatologists (0.842, 95% CI 0.762-0.898) compared to GPT-4 (0.5250, 95% CI 0.361-0.685). Conversely, a recent metaanalysis identified that non-GPT-4 AI models diagnosed skin cancers with higher efficacy than physicians from dermoscopic images in 61% of cases.²⁷ At its current state, GPT-4 should neither be used to replace nor supplement the diagnosis of melanoma or cancerous skin lesions by clinicians; however, other AI algorithms should be further investigated and implemented into practice to augment the diagnostic process, especially in the context of prolonged wait times for specialized services and limited access in rural and remote communities.

Table 4.

Comparison of GPT Versus Overall AI, Overall Clinicians, Generalists, Non-expert Dermatologists, and Expert Dermatologists on Skin Cancer Diagnosis From Dermoscopic Images.²⁷

Metric	GPT4	Overall clinicians	Generalists	Nonexpert dermatologists	Expert dermatologists
Sensitivity	0.5250 (95% CI 0.361-0.685)	0.798 (95% CI 0.732-0.851)	0.646 (95% CI 0.471-0.789)	0.764 (95% CI 0.711-0.809)	0.842 (95% CI 0.762-0.898)
Specificity	0.6875 (95% CI 0.610-0.758)	0.736 (95% CI 0.665-0.796)	0.736 (95% CI 0.665-0.796)	0.671 (95% CI 0.572-0.756)	0.744 (95% CI 0.653-0.818)
LR +	1.68 (95% CI 1.16-2.44)	3.02 (95% CI 2.33-3.91)	2.37 (95% CI 1.63-3.46)	2.32 (95% CI 1.71-3.14)	3.29 (95% CI 2.31-4.67)
LR −	0.69 (95% CI 0.49-0.97)	0.27 (95% CI 0.20-0.37)	0.48 (95% CI 0.34-0.69)	0.35 (95% CI 0.27-0.46)	0.21 (95% CI 0.13-0.34)

Our analysis revealed several limitations in the GPT-4 model, underpinning its suboptimal diagnostic capability. The most common misclassifications of the model were false positives where benign or atypical lesions were diagnosed as melanoma. This occurred 38 times when analyzing 160 benign or atypical lesions. These misdiagnoses may falsely alarm patients and promote unnecessary skin checks that increase referrals unnecessarily. Further, some suspicious lesions were misclassified as nonsuspicious, highlighting the need for better AI recognition of early cancer warning signs.

Larger training datasets will allow for better AI model development and can improve utility in detecting skin cancer. Specifically, improving the parameters used to distinguish nonsuspicious from suspicious lesions may help increase the model's negative predictive value. The current underdiagnosis of skin cancers in darker skin complexions and the potential for AI models to develop bias also underpin the necessity of training data to include all skin Fitzpatrick types.²⁸ The diagnostic accuracy of the model may also be improved if the clinical context is included in inputs, such as a patient's sun exposure, age, and family history of skin cancer.

There are several limitations to this study. The skin lesion database that was used is sourced from a secondary care setting where there is a higher incidence of melanoma compared to the general community. Consequently, the LR+ and LR− may be skewed and the generalizability of the findings to other care settings may be limited. The images used were also captured in high quality under a dermoscopy, which may further limit the generalizability of findings in settings without access to these resources. Moreover, the PH² database relied on clinical diagnosis instead of histological diagnosis to categorize all melanoma, atypical, and common nevi. Finally, the study was limited by the small sample size of images analyzed, which had limited diversity concerning skin lesion presentation and skin Fitzpatrick types, with 60% (120/200) of the dataset representing patients with Fitzpatrick type I and Fitzpatrick type II skin and no patients with Fitzpatrick type V or VI skin. Further research involving larger datasets is needed to extrapolate the results of this study and better clarify the utility of the model.

Conclusions

Artificial intelligence-assisted diagnostic tools are a promising innovation with future implications for enhancing the accuracy of initial assessment in remote communities before accessing healthcare providers. Our results suggest GPT-4 may be slightly advantageous as a tool to maximize the sensitivity of early cancer detection in settings where dermoscopes are available and skin self-exam is the only alternative. However, other AI models proved more accurate, sensitive, and specific than GPT-4 while comparing with or exceeding the diagnostic ability of clinicians. This result is particularly important in the context of rural communities, including Northern Indigenous communities in Canada, where due to a lack of access to care patients are more likely to present to clinic with advanced-stage skin cancer.^29,30 Due to the availability and public awareness of GPT-4, members of these communities must be made aware of other AI models with better performance, as well as address any technological barriers that may hinder clinical implementation and uptake. Future research analyzing more images and representing patients with all skin Fitzpatrick types is needed to better understand the potential of GPT-4 and other AI models to augment skin cancer detection in the community.

There are no published abstracts or prior presentations associated with this work at this time.

Supplemental Material

sj-docx-1-psg-10.1177_22925503251315489 - Supplemental material for Beyond the Surface: Assessing GPT-4's Accuracy in Detecting Melanoma and Suspicious Skin Lesions From Dermoscopic Images

Supplemental material, sj-docx-1-psg-10.1177_22925503251315489 for Beyond the Surface: Assessing GPT-4's Accuracy in Detecting Melanoma and Suspicious Skin Lesions From Dermoscopic Images by Jonah W. Perlmutter, John Milkovich, Sierra Fremont, Shaishav Datta and Adam Mosa in Plastic Surgery

Footnotes

Author Contributions

Conceptualization: J.P. and J.M.; methodology: J.P. and J.M.; software: J.M. and S.F.; validation: J.P.; formal analysis: J.M.; investigation: S.F.; resources: J.P. and J.M.; data curation: S.F. and J.M.; writing—original draft preparation: J.P, J.M., and S.F.; writing—review and editing: SD, AM; visualization: J.M.; supervision: S.D, AM.; project administration: AM; funding acquisition: N/A.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Ethical Statement

As this study did not involve human participants, biological samples, or identifiable personal data, neither informed consent nor institutional review board (IRB) approval was required. The research adheres to the ethical principles of the Declaration of Helsinki.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Jonah W. Perlmutter

John Milkovich

Shaishav Datta

Supplemental material

Supplemental material for this article is available in online.

References

Gandhi

Kampp

. Skin cancer epidemiology, detection, and management. Med Clin. 2015;99(6):1323-1335. doi:https://doi.org/10.1016/j.mcna.2015.06.002

Janda

Olsen

Mar

Cust

. Early detection of skin cancer in Australia – current approaches and new opportunities. Public Health Res Pract. 2022;32(1):3212204. doi:https://doi.org/10.17061/phrp3212204

Melarkode

Srinivasan

Qaisar

Plawiak

. AI-Powered Diagnosis of skin cancer: A contemporary review, open challenges and future research directions. Cancers. 2023;15(4):1183. doi:https://doi.org/10.3390/cancers15041183

Saravanan

Heshma

Ashma Shanofer

Vanithamani

. Skin cancer detection using dermoscope images. Mater Today Proc. 2020;33(7):4823-4827. doi:https://doi.org/10.1016/j.matpr.2020.08.388

MacLeod

MLP

Stewart

Kulig

, et al. Nurses who work in rural and remote communities in Canada: A national survey. Hum Resour Health. 2017;15(1):34. doi:https://doi.org/10.1186/s12960-017-0209-0

Kim

. Social determinants of health inequities in indigenous Canadians through a life course approach to colonialism and the residential school system. Health Equity. 2019;3(1):378-381. doi:https://doi.org/10.1089/heq.2019.0041

HOY

Bailey

AJM

Grose

, et al. Socioeconomic Status and melanoma in Canada: A systematic review. J Cutan Med Surg. 2021;25(1):87-94. doi:https://doi.org/10.1177/1203475420960426

Maguiness

Searles

From

Swiggum

. The Canadian dermatology workforce survey: Implications for the future of Canadian dermatology–who will be your skin expert? J Cutan Med Surg. 2004;8(3):141-147. doi:https://doi.org/10.1007/s10227-004-0107-7

Hamidi

Peng

Cockburn

. Efficacy of skin self-examination for the early detection of melanoma. Int J Dermatol. 2010;49(2):126-134. doi:https://doi.org/10.1111/j.1365-4632.2009.04268.x

10.

Brancaccio

Balato

Malvehy

Puig

Argenziano

Kittler

. Artificial intelligence in skin cancer diagnosis: A reality check. J Invest Dermatol. 2024;144(3):492-499. doi:https://doi.org/10.1016/j.jid.2023.10.004

11.

Krakowski

Kim

Cai

, et al. Human-AI interaction in skin cancer diagnosis: A systematic review and meta-analysis. Npj Digit Med. 2024;7(1):1-10. doi:https://doi.org/10.1038/s41746-024-01031-w

12.

O’Hagan

Poplausky

Young

, et al. The accuracy and appropriateness of ChatGPT responses on nonmelanoma skin cancer information using zero-shot chain of thought prompting. JMIR Dermatol. 2023;6:e49889. doi:https://doi.org/10.2196/49889

13.

Mendonça

Ferreira

Marques

Marcal

ARS

Rozeira

. PH2 - A dermoscopic image database for research and benchmarking. 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 2013:5437–5440. doi:https://doi.org/10.1109/EMBC.2013.6610779

14.

Oukil

Kasmi

Mokrani

García-Zapirain

. Automatic segmentation and melanoma detection based on color and texture features in dermoscopic images. Skin Res Technol. 2022;28(2):203-211. doi:https://doi.org/10.1111/srt.13111

15.

Laohawetwanit

Namboonlue

Apornvirat

. Accuracy of GPT-4 in histopathological image detection and classification of colorectal adenomas. J Clin Pathol. 2024. Published online January 10, 2024:jcp-2023-209304. doi:https://doi.org/10.1136/jcp-2023-209304

16.

Puza

O’neill

. Generalised clopper–Pearson confidence intervals for the binomial proportion. J Stat Comput Simul. 2006;76(6):489-508. doi:https://doi.org/10.1080/10629360500107527

17.

Van Rossum

Drake

. Python 3 Reference Manual. CreateSpace; 2009.

18.

Kluyver

Ragan-Kelley

Pérez

, et al. In: Loizides

Scmidt

, eds. Jupyter notebooks – a publishing format for reproducible computational workflows. IOS Press; 2016:87-90. doi:https://doi.org/10.3233/978-1-61499-649-1-87

19.

STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration | BMJ Open. Accessed September 30, 2024. https://bmjopen.bmj.com/content/6/11/e012799

20.

McGee

. Simplifying likelihood ratios. J Gen Intern Med. 2002;17(8):646-649. doi:https://doi.org/10.1046/j.1525-1497.2002.10750.x

21.

Henderson

Tierney

Jr Smetana

. The Patient History : An Evidence-Based Approach to Differential Diagnosis. 2nd ed. McGraw-Hill Medical; 2012. Accessed November 27, 2024. https://cir.nii.ac.jp/crid/1130282272368768768

22.

Seyed Ahadi

Firooz

Rahimi

Jafari

Tehranchinia

. Clinical diagnosis has a high negative predictive value in evaluation of malignant skin lesions. Dermatol Res Pract. 2021;2021(1):6618990. doi:https://doi.org/10.1155/2021/6618990

23.

Application of logistic regression with different sampling models. In: Applied logistic regression. John Wiley & Sons, Ltd; 2000:203-222. doi:https://doi.org/10.1002/0471722146.ch6

24.

Barata

Ruela

Mendonça

Marques

. A bag-of-features approach for the classification of melanomas in dermoscopy images: the role of color and texture descriptors. In: Scharcanski

Celebi

, eds. Computer vision techniques for the diagnosis of skin cancer. Series in BioEngineering. Springer Berlin Heidelberg; 2014:49-69. doi:https://doi.org/10.1007/978-3-642-39608-3_3

25.

Alfed

Khelifi

. Bagged textural and color features for melanoma skin cancer detection in dermoscopic and standard images. Expert Syst Appl. 2017;90:101-110. doi:https://doi.org/10.1016/j.eswa.2017.08.010

26.

Majumder

Ullah

MA.

Feature extraction from dermoscopy images for an effective diagnosis of melanoma skin cancer. 2018 10th Int Conf Electr Comput Eng ICECE. Published online December 2018. 2018:185-188. doi:https://doi.org/10.1109/ICECE.2018.8636712

27.

Salinas

Sepúlveda

Hidalgo

, et al. A systematic review and meta-analysis of artificial intelligence versus clinicians for skin cancer diagnosis. Npj Digit Med. 2024;7(1):1-23. doi:https://doi.org/10.1038/s41746-024-01103-x

28.

Livingston

. Preventing racial bias in federal AI. J Sci Policy Gov. 2020;16(02). doi:https://doi.org/10.38126/JSPG160205

29.

Yangzom, K, Masoud, H, & Hahmann, T. Primary health care access among First Nations people living off reserve, Métis and Inuit, 2017 to 2020. Statistics Canada. 2023: Catalogue no. 41-20-0002. https://www150.statcan.gc.ca/n1/pub/41-20-0002/412000022023005-eng.htm.

30.

Milkovich

Hanna

Nessim

, et al. Restructuring skin cancer care in Ontario: A provincial plan. Curr Oncol. 2021;28(2):1183-1196. doi:https://doi.org/10.3390/curroncol28020114

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB