Sage Journals: Discover world-class research

Abstract

Background:

Teledermoscopy (TDS) emerges as an efficient tool for diagnosing skin lesions. In Sweden, double reading is the standard of care, but risk factors for misdiagnosis or mismanagement using single reader evaluations (SRE) are not well-studied. This study aimed to assess the accuracy of SRE compared with the gold standard in TDS.

Methods:

This retrospective cohort study involved 1,997 TDS referrals sent from general practitioners to dermatologists in Stockholm, Sweden, selected based on dermoscopic diagnoses. All referrals underwent double reader evaluations (DRE). Each case was reassessed by a single external assessor, blinded to the DRE result. Based on predefined rules, a gold standard for the most correct diagnosis was established. Diagnostic accuracy and risk factors for misdiagnosis were evaluated. The trial was registered on ClinicalTrials.gov (ID NCT05033678).

Results:

Primary diagnosis by SRE agreed with the gold standard on benign-malignant classification in 84% of cases. Discordance was linked to lower diagnostic confidence and more frequent recommendations for further intervention. SRE achieved a benign-malignant sensitivity and specificity of 84% (95% confidence interval: 81–87% and 82–86%, respectively). The risk of overdiagnosis increased 96 times when assessors reported being “very unconfident.” Out of a total of 311 melanomas, melanoma in situ, lentigo maligna, and severely dysplastic nevi, 62 were not recognized in the SRE primary diagnosis. However, 50 of these misdiagnosed lesions were still recommended for accurate management.

Conclusions:

The confidence level of TDS assessors heavily influences diagnostic accuracy. Therefore, when diagnostic confidence is perceived as moderate or low, additional interventions should be considered.

Introduction

The incidence of skin cancer is increasing worldwide.¹ This poses a burden on the already strained health care system due to the rising demand for skin lesion assessments. In many countries, patients with suspicious skin lesions initially seek medical advice at primary care centers. General practitioners (GPs) can successfully diagnose and manage common types of skin lesions, but they do not have the same level of expertise as dermatologists.² Previous studies have shown that the diagnostic concordance between GPs and dermatologists is 45–54% in a clinical setting.^3,4 To improve diagnostic accuracy in primary care, streamlining the referral and triage process from GPs to dermatologists could be beneficial.

Communication technologies (e.g., computers, the internet, and smartphones) are especially suitable for dermatology considering their visual character.^5,6 This is being utilized in teledermoscopy (TDS) (i.e., electronic referrals including clinical and dermoscopic images obtained in primary care and assessed by dermatologists). The introduction of smartphone-compatible dermoscopes has made TDS a feasible alternative for distant assessments of skin lesions.⁷ Nevertheless, previous studies have shown that the diagnostic accuracy of TDS is somewhat impaired compared to face-to-face consultations by dermatologists.^8
–10

Following a pilot study on TDS, the Swedish Regional Cancer Centres recommend double reader evaluations (DRE) (i.e., two dermatologists assessing each TDS referral) to increase diagnostic accuracy).¹¹ However, double reading in TDS is not common practice internationally.¹² Our hypothesis posited that certain circumstances exist in which relying solely on single reader evaluations (SRE) in TDS might lead to mismanagement of lesions. Hence, the overall aim of this study was to improve skin cancer triage, with the specific objective of evaluating the accuracy of SRE in TDS and providing evidence for when SRE alone may not ensure proper management of skin lesions.

Methods

COHORT

This cohort study was conducted at the Department of Dermatology, Skåne University Hospital, Lund, Sweden, in accordance with the standards for the reporting of diagnostic accuracy studies criteria from 2015.¹³ Regional ethical review board approvals (2020-04763 and 2021-01817) were acquired. The trial was registered on ClinicalTrials.gov with the ID NCT05033678, and the study protocol can be requested from the corresponding author. The cohort was derived from a previously established teledermoscopic database. Images were captured using a Heine iC1 dermoscope (Heine Optotechnik, Herrsching, Germany) attached to an iPhone (Apple, Cupertino, CA, USA).

Inclusion criteria were TDS referrals, including pigmented and nonpigmented lesions, sent between 2017 and 2021 from GPs to assessors in the Stockholm health care region.¹⁴ The referrals were registered on the web platform Dermicus® (Dermicus, Gothenburg, Sweden). Emanating from a power calculation, 2,000 cases were extracted with the following distribution of dermoscopic diagnosis (with priority for the most recent cases): 30% nevi, 19% melanomas (MM), MM in situ, lentigo maligna and severely dysplastic nevi (henceforth referred to as the MM group lesions), 10% basal cell carcinomas (BCC), 8% squamous cell carcinomas (SCC), 23% seborrheic keratoses, 7% dermatofibromas, and 3% vascular lesions.

Every TDS referral included standardized information on the patient age, gender, lesion location, first appearance of the lesion, and any perceived change or symptoms. The DRE was performed as part of ongoing health care and involved two TDS assessors, with at least one having extensive experience, independently reviewing the clinical and dermoscopic images (Fig. 1). Subsequently, the assessors discussed each case in a chat function until consensus was reached. The joint referral response included a dermoscopic description, a primary diagnosis, and recommended management. All cases were then prepared for SRE by concealing the previous DRE consultation report, before being randomly assigned to one out of 13 assessors (0 to >15 years of dermoscopic experience) in the Skåne health care region. SRE assessors provided information on image quality (poor, reduced, and good), primary and differential diagnoses, level of diagnostic confidence (very unconfident, unconfident, fairly confident, confident, or very confident), and management recommendations.

Fig. 1.

Example of clinical and dermoscopic images attached to teledermoscopy referrals. (a) Image that provides an overview of lesion location. (b) Clinical image, showing the lesion and surrounding skin. (c) Polarized dermoscopic image. (d) Nonpolarized dermoscopic image.

STATISTICAL METHODS

Statistical analyses were performed using Stata version 18.0 (StataCorp LLC, College Station, TX, USA). The primary outcome was diagnostic accuracy for SRE compared to the gold standard (see below). The secondary outcome was to assess accurate management in SRE and to identify risk factors for misdiagnosis in TDS. For the purpose of analysis, diagnoses were categorized as follows: “nevus” (including all subtypes), “MM group lesions,” “lentigo solaris+seborrheic keratosis+benign lichenoid keratosis,” “verruca,” “dermatofibroma,” “SCC+SCC in situ+keratoacanthoma+actinic keratosis,” “BCC,” “vascular lesion,” “dermatitis+dermatosis,” “benign tumor/inflammation,” “unclear but likely benign,” or “unclear but likely malignant.” Accurate management for malignant lesions included “biopsy,” “excision,” “referral to dermatology clinic,” “initiation of standardized care pathway melanoma,” “dermoscopic follow-up in 3–6 months,” and, for actinic keratosis, “topical treatment.” p-Values <0.05 were considered statistically significant.

A gold standard for the most correct diagnosis of a lesion was constructed and based on diagnosis from (1) histopathology, (2) dermoscopic follow-up after 3–6 months, or (3) DRE (Fig. 2). Cases were analyzed to compare the assessors’ mean diagnostic confidence (Student’s t-test) and frequency in recommending an intervention (Pearson’s chi-squared test) based on the benign-malignant concordance or discordance between SRE primary diagnosis and the gold standard.

Fig. 2.

Flowchart illustrating the hierarchical rules and construction of the most correct diagnosis, that is, the gold standard diagnosis, for each teledermoscopic case.

Sensitivity, specificity, positive predictive value, and negative predictive value were calculated to determine the accuracy of benign-malignant classification in SRE when compared with the gold standard. The following diagnoses were categorized as malignant: MM group lesions, BCC, SCC, SCC in situ, keratoacanthoma, actinic keratosis, and lesions categorized as “unclear but likely malignant.” The same analyses were conducted for MM recognition (defined as MM group lesions/not MM group lesions). In addition, receiver operating characteristic (ROC) curves were plotted for both outcome measures.

Multivariate logistic regression models were constructed to investigate the correlation between misdiagnosis in SRE and various patient, assessor, lesion, or image quality characteristics. Lesion locations were categorized based on the expected degree of sunlight exposure (Table 1). Two different outcome variables were used: overdiagnosis of benign lesions (using correctly diagnosed benign lesions as a reference) and underdiagnosis of malignant lesions (using correctly diagnosed malignant lesions as a reference).

Table 1.

Patient and Lesion Characteristics in the Teledermoscopic Cohort

	TELEDERMOSCOPY REFERRALS n = 1,997
Age, years, mean (range)	52 (17–99)
Gender, n (%)
Men	763 (38)
Women	1,234 (62)
Lesion location, n (%)^a
Back+chest+stomach+shoulder	1,078 (54)
Face+ear+neck+lips	347 (17)
Scalp	71 (3.6)
Thigh+hip+upper arm+glutes+pubic hair+sole	289 (14)
Lower arm+lower leg+palm	211 (11)
Unknown	1 (0.1)
Primary diagnosis by gold standard, n (%)^b
Nevus (including all subtypes)	666 (33)
Lentigo solaris+seborrheic keratosis+benign lichenoid keratosis	572 (29)
Dermatofibroma	100 (5.0)
Vascular lesion	39 (2.0)
Unclear, but likely benign	22 (1.1)
SCC+SCC in situ+keratoacanthoma+actinic keratosis	102 (5.1)
BCC	185 (9.3)
MM in situ+lentigo maligna+severely dysplastic nevus	183 (9.2)
Invasive MM	128 (6.4)

Categorized based on the expected degree of sunlight exposure.

Gold standard constructed using (1) histopathology, (2) dermoscopic follow-up in 3–6 months, or (3) double reader evaluation.

BCC, basal cell carcinoma; MM, malignant melanoma; SCC, squamous cell carcinoma.

Results

PATIENT, ASSESSOR, LESION, AND IMAGE QUALITY CHARACTERISTICS

Three cases were lost due to incorrect data entry. Thus, the final cohort consisted of 1,997 TDS referrals (Table 1). The mean patient age was 52 years (range 17–99), and the majority were women (62%). Histopathological diagnosis (highest-ranked correct diagnosis) was available for 568 lesions (28%). A review of all cases that were recommended for dermoscopic follow-up in 3–6 months by DRE (n = 153) revealed that most cases lacked information on follow-up. Only 31 lesions had confirmed dermoscopic follow-up (TDS or face-to-face consultation at a dermatology clinic), and in all cases, the initial DRE primary diagnosis was confirmed. Moreover, 17 out of the 153 lesions had received a histopathological diagnosis after the DRE. In Fig. 3, additional data from the SRE assessors are shown. Most referrals (54%) were evaluated by assessors with ≤ 5 years of dermoscopic experience. Assessors rated their diagnostic confidence as confident or very confident in about 50% of the cases. Only 3.6% of the clinical and dermoscopic images were of poor quality. Further analysis revealed that assessors recommended an additional intervention in 93% of cases with poor image quality. In comparison, further intervention was recommended in 63% of cases with reduced image quality and 50% of cases with good image quality.

Fig. 3.

Additional data from single reader evaluation assessors evaluating teledermoscopy referrals (n = 1,997).

CORRELATION OF BENIGN-MALIGNANT DISCORDANCE WITH DIAGNOSTIC CONFIDENCE AND MANAGEMENT

When comparing SRE primary diagnosis to the gold standard, 1,684 cases (84%) had benign-malignant concordance (Table 2). In benign-malignant concordant assessments, the assessor’s diagnostic confidence (scale 1–5) was significantly higher (mean 3.7) compared with discordant assessments (mean 2.7) (p-value < 0.001). When the assessor reported being “very confident,” there was benign-malignant concordance to the gold standard in 500 out of 509 cases (98%). This can be compared with 81% in “fairly confident” and 64% in “very unconfident.” Moreover, assessors were more inclined to recommend no further intervention in cases with benign-malignant concordance compared with discordance (50% vs. 7.4%, p-value < 0.001).

Table 2.

Comparison of Benign-Malignant Concordant/Discordant Cases by Single Reader Evaluation Primary Diagnosis and the Gold Standard

	CONCORDANT CASES n = 1,684	DISCORDANT CASES n = 313	p-VALUE
Diagnostic confidence, n (%)
Very unconfident	23 (1.4)	13 (4.2)	0.001
Unconfident	194 (12)	121 (39)	<0.001
Fairly confident	534 (32)	126 (40)	0.003
Confident	433 (26)	44 (14)	<0.001
Very confident	500 (30)	9 (2.9)	<0.001
Diagnostic confidence (scale 1–5), mean	3.7	2.7	<0.001
Image quality, n (%)
Poor	55 (3.3)	16 (5.1)	0.11
Reduced	529 (31)	129 (41)	0.001
Good	1,100 (65)	168 (54)	<0.001
Recommended intervention by SRE, n (%)
No further intervention	843 (50)	23 (7.4)	<0.001
Dermoscopic follow-up in 3–6 months	128 (7.6)	22 (7.0)	0.72
Biopsy	128 (7.6)	46 (15)	<0.001
Excision	267 (16)	99 (32)	<0.001
Referral to dermatology clinic for assessment	73 (4.3)	36 (12)	<0.001
Referral to dermatology clinic for treatment	16 (1.0)	3 (1.0)	0.99
Initiation of standardized care pathway melanoma	185 (11)	77 (25)	<0.001
Topical treatment^a	27 (1.6)	2 (0.6)	0.19
Other	17 (1.0)	5 (1.6)	0.36

Gold standard constructed using (1) histopathology, (2) dermoscopic follow-up in 3–6 months, or (3) double reader evaluation.

Imiquimod, 5-Fluorouracil, cortisone, and so on.

SRE, single reader evaluation.

DIAGNOSTIC ACCURACY OF SRE

Diagnostic agreement on the primary diagnosis between SRE and the gold standard was 68%. Detailed information on the diagnostic accuracy of SRE is presented in Table 3. Using benign-malignant categorization, SRE achieved a sensitivity and specificity of 84%. The sensitivity for recognition of MM group lesions was 80%, while the specificity was 90%. Cross-tabulation of SRE by the results of the gold standard is available in Supplementary Table S1. Fig. 4 presents ROC curves for SRE regarding benign-malignant classification and MM recognition. Subanalyses of sensitivity and specificity based on assessors’ dermoscopic experience only had a marginal effect on the results (Supplementary Table S2).

Fig. 4.

Receiver operating characteristic (ROC) curves for single reader evaluations, illustrating benign-malignant classification (a) and melanoma recognition (melanoma group lesions/not melanoma group lesions) (b). Melanoma group lesions included melanomas, melanoma in situ, lentigo maligna, and severely dysplastic nevi.

Table 3.

Sensitivity, Specificity, and Positive and Negative Predictive Values of Single Reader Evaluations When Assessing Teledermoscopy Referrals

	SINGLE READER EVALUATIONS n = 1,997
Benign/malignant diagnosis, % (95% CI)
Sensitivity	84 (81–87)
Specificity	84 (82–86)
Positive predictive value	70 (66–73)
Negative predictive value	93 (91–94)
MM recognition,^a % (95% CI)
Sensitivity	80 (75–84)
Specificity	90 (89–92)
Positive predictive value	60 (55–65)
Negative predictive value	96 (95–97)

Defined as MM group lesions (invasive MM, MM in situ, lentigo maligna, and severely dysplastic nevi)/not MM group lesions.

CI, confidence interval; MM, malignant melanoma.

MISDIAGNOSED LESIONS AND RECOMMENDED MANAGEMENT

There were 311 MM group lesions in the gold standard (Table 1), with 294 lesions histopathologically confirmed as invasive MM (n = 115), MM in situ (n = 119), lentigo maligna (n = 1), and severely dysplastic nevus (n = 59). Histopathological reports were unavailable for 17 lesions classified as MM group lesions by DRE. In SRE primary diagnosis, 62 MM group lesions were misdiagnosed, out of which 55 (89%) had histopathological confirmation and 7 (11%) were diagnosed by DRE (Supplementary Table S3). Importantly, the SRE differential diagnosis was an MM group lesion in 71% of these cases, and 81% received accurate management recommendations. Topical treatment was advised for one lesion, and the remaining 11 (3.5% of all MM group lesions) were recommended no further intervention. Four of the dismissed lesions were invasive MM, and in three of these cases, the SRE was conducted by an assessor with < 5 years of experience (additional information is available in Supplementary Table S4). When excluding MM group lesions, there were still 40 malignant lesions that had been misdiagnosed as benign. However, accurate management was recommended in 28 (70%) cases. In the remaining 12 cases (6 BCCs, 3 actinic keratoses, and 3 SCC in situ), no further intervention was recommended.

In contrast, 219 out of 1,399 benign lesions (16%) were misdiagnosed as malignant and managed accordingly. Further analysis of these lesions revealed that the assessors’ mean diagnostic confidence was 2.7, whereas it was 3.6 in the entire dataset. Image quality was rated as poor in 5.0% of the cases (compared with 3.6% in the entire dataset). Moreover, 212 lesions (15% of all benign lesions) were correctly classified as benign, but an additional intervention (excluding dermoscopic follow-up in 3–6 months) was still recommended. In this group, the assessors’ mean diagnostic confidence was 2.6, and 13% of images were of poor quality.

RISK FACTORS FOR OVERDIAGNOSIS AND UNDERDIAGNOSIS

The results of multivariate logistic regression analyses are presented in Table 4. For each additional year of patient age, the risk of overdiagnosis increased by 2%, while the risk of underdiagnosis decreased by 4%. Male gender was associated with an increased risk of overdiagnosis (odds ratio [OR] 1.5; confidence interval [CI] 95% 1.1–2.1). The risk of misdiagnosis correlated strongly with assessors’ decreasing diagnostic confidence. Using “very confident” as a reference, being “very unconfident” increased the risk of overdiagnosis 96 times and underdiagnosis nine times. Lesion location, dermoscopic experience, or image quality was not associated convincingly with the risk of an erroneous diagnosis.

Table 4.

Multivariate Logistic Regression Analyses Assessing Risk of Overdiagnosis and Underdiagnosis in Single Reader Evaluations

EXPOSURE VARIABLES	RISK OF OVERDIAGNOSIS^a OR (95% CI)	RISK OF UNDERDIAGNOSIS^b OR (95% CI)
Age	1.02 (1.01–1.03)	0.96 (0.95–0.98)
Gender
Women	Reference	Reference
Men	1.5 (1.1–2.1)	1.1 (0.7–1.7)
Lesion location
Back+chest+stomach+shoulder	Reference	Reference
Face+ear+neck+lips	0.6 (0.4–1.0)	1.2 (0.6–2.1)
Scalp	0.5 (0.2–1.3)	0.9 (0.2–4.3)
Thigh+hip+upper arm+ glutes+pubic hair+sole	1.1 (0.7–1.8)	1.3 (0.6–2.6)
Lower arm+lower leg+palm	0.9 (0.5–1.1)	1.1 (0.5–2.2)
Dermoscopic experience, years
≥11	Reference	Reference
6–10	0.4 (0.2–0.7)	1.5 (0.8–2.9)
≤5	0.7 (0.5–1.1)	0.7 (0.4–1.3)
Quality of images
Good	Reference	Reference
Reduced	1.0 (0.7–1.4)	1.3 (0.8–2.1)
Poor	0.4 (0.2–0.9)	0.8 (0.2–2.8)
Diagnostic confidence
Very confident	Reference	Reference
Confident	7.4 (2.8–19)	2.7 (0.9–8.6)
Fairly confident	22 (8.7–55)	3.9 (1.3–11)
Unconfident	67 (26–172)	8.5 (2.8–26)
Very unconfident	96 (25–374)	8.7 (1.4–53)

Bold letters indicate statistically significant results (p = < 0.05).

Defined as benign lesions that were misdiagnosed as malignant (n = 219), using correctly diagnosed benign lesions as reference (n = 1,180).

Defined as malignant lesions that were misdiagnosed as benign (n = 94), using correctly diagnosed malignant lesions as reference (n = 504).

CI, confidence interval; OR, odds ratio.

Discussion

This study found that SRE in TDS achieves moderate to high diagnostic accuracy regarding benign-malignant classification and MM recognition. Nevertheless, 18 out of 311 MM group lesions were not recognized in primary or differential diagnosis by SRE. Furthermore, there was a high proportion of benign lesions that were misdiagnosed or mismanaged. Mismanagement of benign lesions occurred more frequently when assessors expressed low confidence in their diagnosis or when the image quality was deemed poor. The risk of an erroneous diagnosis strongly correlated with assessors’ decreasing diagnostic confidence. Investigating methods to increase diagnostic accuracy in TDS holds significant value, and double reading, already employed in several medical specialties such as radiology and pathology, is a feasible and advantageous option.

Vestergaard et al. reported benign-malignant sensitivities (85% and 86%) and specificities (82% and 83%) when two independent SRE were performed on 600 TDS referrals.⁹ Gold standard was constructed using diagnosis from histopathology (37%), follow-up visit (12%), or face-to-face consultation (51%).⁹ This is similar to the results of our study, in which the SRE benign-malignant sensitivity and specificity were both 84%. An Estonian study by Koop et al. presented a somewhat higher sensitivity (90%) and specificity (93%) for MM recognition in TDS.¹⁵ However, diagnostic accuracy was calculated based on a combination of diagnosis and management plan data. MacLellan et al. analyzed the diagnostic accuracy of a teledermoscopist in diagnosing MM (including in situ) compared to histopathological diagnosis. The authors found a sensitivity of 90% and a specificity of 66%.¹⁶ In our study, the sensitivity and specificity for MM recognition were 80% and 90%, respectively. In a primary care setting study, GPs in Spain and Italy classified lesions (using dermoscopy) as banal or suggestive of skin cancer and achieved a sensitivity of 79% and a specificity of 72%.¹⁷ Moreover, Menzies et al. found that GPs utilizing a dermoscopy had a sensitivity of 53%, whereas the specificity was 89% for identifying MM.¹⁸ When comparing the diagnostic accuracy of TDS and dermoscopy conducted by GPs, it becomes reasonable to argue that employing TDS enhances the triage of skin lesions.

Although the diagnostic accuracy of SRE in TDS might be regarded as relatively high in this study, we also found a considerable number of malignant and benign lesions that were either misdiagnosed or mismanaged. As cases were selected based on registered dermoscopic diagnosis, the prevalence of MM group lesions (n = 311, 16%) was not representative of the natural ratio (6.7%) in our teledermoscopic database. Nevertheless, 20% of all MM group lesions were misdiagnosed by SRE primary diagnosis, and 5.8% were still misdiagnosed when including the assessor’s differential diagnosis. In a study by Koelink et al., GPs misdiagnosed 5 out of 13 (38%) MM while using a dermoscope.¹⁹ Vestergaard et al. reported that 6 out of 23 (26%) histopathologically confirmed MM (including in situ) had been misdiagnosed as benign in TDS.⁹ In this study, most misdiagnosed MM group lesions were recommended an intervention that increased the probability of a correct diagnosis, but 11 lesions (3.5% of all MM group lesions) were recommended no further intervention. Our results are slightly better than those of two recently published studies in which 7–9% of MM (including in situ) were dismissed without accurate management plans in TDS.^9,16 Mismanaged benign lesions in TDS are also important to consider as they lead to unnecessary interventions and, in extension, societal costs. In SRE, 31% of all benign lesions were either misdiagnosed as malignant or correctly diagnosed as benign but still recommended further intervention.

Our findings emphasize the benefits of greater specificity in TDS, achievable through various approaches, including support from diagnostic machine learning.^20,21 While prior research has primarily focused on the diagnostic accuracy of artificial intelligence tools in experimental settings, a recent systematic review highlights the need for more studies on potential advantages in clinical practice.^{20

–27} Currently, double reading is an advantageous option, as it is used in several other medical specialties with scientific evidence showing improved diagnostic accuracy, primarily through increased sensitivity.^28
–30 Two studies evaluating double reading of dermoscopy-reflectance confocal microscopy images found improved sensitivity and management safety.^28,30

Moreover, Tschandl et al. demonstrated that the mean correct rating of dermoscopic images significantly increased when using collective evaluations by three to five assessors (74%) compared with an individual assessor (65%).²¹ Other benefits of DRE are continuous peer-to-peer training and quality control and shared medicolegal responsibility. Arguably, double reading could offer a low-cost alternative to increasing diagnostic accuracy since an extra TDS assessment only adds a few minutes.³¹ However, in some cases even DRE may not suffice to reach the correct diagnosis and/or management, necessitating additional interventions such as face-to-face consultations or biopsies. To what extent DRE in TDS improve diagnostic accuracy and pose a cost-effective alternative to SRE needs further evaluation.

The strong correlation between the risk of misdiagnosis and assessors’ decreasing diagnostic confidence is supported by two previous studies.^9,31 Our analysis revealed that decreasing diagnostic confidence primarily correlates to the risk of overdiagnosis (i.e., benign lesions being misdiagnosed as malignant). We hypothesize that assessors are more likely to overcall when in doubt due to fear of missing malignant lesions. This is further strengthened by the finding that assessors’ mean diagnostic confidence was lower when benign lesions were either misdiagnosed as malignant or recommended unnecessary interventions. We also found that each additional year of patient age increased the risk of overdiagnosis and decreased the risk of underdiagnosis. One explanation could be that assessors are inclined to diagnose lesions as malignant in older age groups, knowing that skin cancer is more common in the elderly.³² The gender distribution in this study population was unequal, with 62% being women, reflecting the unequal gender distribution in the cohort of TDS referrals.^33,34 Nonetheless, male gender was associated with an increased risk of overdiagnosis, contradicting a previous study that found no correlation between gender and the risk of misdiagnosis.³¹ Moreover, poor image quality was significantly associated with a decreased risk of overdiagnosis yet simultaneously linked with recommending additional interventions. Our findings are in concordance with a study by van der Heijden et al., showing that good quality images are associated with higher accuracy regarding diagnosis and, in particular, management.⁸ This further emphasizes the importance of good image quality in TDS.

The main strength of this study is the relatively large cohort of 1,997 authentic TDS referrals. There are also several limitations to this study. First, referrals were selected based on registered diagnosis, which resulted in a somewhat skewed distribution of diagnoses. Second, only about a third of the lesions had histopathological confirmation. Last, when interpreting our results, consideration should be given to the artificial circumstances in which the SRE was performed compared with the ongoing health care that resulted in the gold standard and, likewise, the different local management customs in these two settings.

In conclusion, diagnostic accuracy in SRE heavily depends on the assessor’s confidence in the diagnosis. When assessors perceive their diagnostic confidence as moderate or low another intervention (such as DRE, biopsy, or face-to-face consultation) is necessary. Furthermore, low image quality in TDS triggers unnecessary interventions in benign lesions, highlighting the importance of developing tools to improve image quality and standardization.

Footnotes

Acknowledgments

The authors thank Johan Palmgren, Karim Saleh, Fredrik Johansson, Johan Kappelin, and Teo Helkkula for their help with data collection. Many thanks to the statisticians at Kliniska Studier Forum Söder.

Authors’ Contributions

C.N.: Conceptualization, methodology, validation, data curation, formal analysis, investigation, writing—original draft, and visualization. H.K., B.P., C.S., S.L., A.P.M., J.I., and A.D.: Validation, data curation, and writing—review and editing. J.L., L.U.I., N.R., and K.S.: Methodology, resources, data curation, validation, and writing—review and editing. K.N.: Methodology, data curation, writing—review and editing, and funding acquisition. Å.I.: Conceptualization, methodology, validation, data curation, writing—review and editing, supervision, and funding acquisition.

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

Disclosure Statement

The authors declare that they participate in a research project with Chalmers Industry and Dermicus, the company that provides the digital platform Dermicus. However, Dermicus has not contributed to or had any influence on the performance of this study. Å.I. has received speaker and consulting honoraria from Galderma Sweden, Perrigo Sweden, MSD Sweden, and Biofrontera Sweden. Not related to the current study, K.N. has during the last three years received speaker honoraria from Galderma Sweden, LEO Pharma, Novartis Sweden, and UCB Pharma and has served on one advisory board for MSD.

Funding Information

The study was funded by Hudfonden, Märta Wrinklers stiftelse för främjande av medicinsk forskning, S.R Gorthon foundation, the Krapperup foundation, and Swedish governmental funding of clinical research (ALF).

Supplementary Material

Supplementary Table S1

Supplementary Table S2

Supplementary Table S3

Supplementary Table S4

References

Leiter

, Keim

, Garbe

. Epidemiology of skin cancer: Update 2019. Adv Exp Med Biol, 2020; 1268:123–139; doi: 10.1007/978-3-030-46227-7_6

Regional Cancer Centres. Slutrapport Teledermatoskopi Mellan Primärvårds- och Hudspecialist. Available from: https://cancercentrum.se/globalassets/cancerdiagnoser/hud/stockholm-gotland/slutrapport-teledermatoskopi-mellan-primarvards–och-hudspecialist.pdf [Last accessed: October 15, 2024 ].

Morrison

, O’Loughlin

, Powell

. Suspected skin malignancy: A comparison of diagnoses of family practitioners and dermatologists in 493 patients. Int J Dermatol, 2001; 40(2):104–107; doi: 10.1046/j.1365-4362.2001.01159.x

Tran

, Chen

, Lim

, et al. Assessing diagnostic skill in dermatology: A comparison between general practitioners and dermatologists. Australas J Dermatol, 2005; 46(4):230–234; doi: 10.1111/j.1440-0960.2005.00189.x

Tensen

, van der Heijden

, Jaspers

MWM

, et al. Two decades of teledermatology: Current status and integration in national healthcare systems. Curr Dermatol Rep, 2016; 5:96–104; doi: 10.1007/s13671-016-0136-7

World Health Organization. Telemedicine: Opportunities and Developments in Member States. Available from: http://apps.who.int/iris/bitstream/handle/10665/44497/9789241564144_eng.pdf?sequence=1; 2010. [Last accessed: October 15, 2024 ].

Börve

, Terstappen

, Sandberg

, et al. Mobile teledermoscopy-there’s an app for that!. Dermatol Pract Concept, 2013; 3(2):41–48; doi: 10.5826/dpc.0302a05

van der Heijden

, Thijssing

, Witkamp

, et al. Accuracy and reliability of teledermatoscopy with images taken by general practitioners during everyday practice. J Telemed Telecare, 2013; 19(6):320–325; doi: 10.1177/1357633x13503437

Vestergaard

, Prasad

, Schuster

, et al. Diagnostic accuracy and interobserver concordance: Teledermoscopy of 600 suspicious skin lesions in Southern Denmark. J Eur Acad Dermatol Venereol, 2020; 34(7):1601–1608; doi: 10.1111/jdv.16275

10.

Warshaw

, Lederle

, Grill

, et al. Accuracy of teledermatology for pigmented neoplasms. J Am Acad Dermatol, 2009; 61(5):753–765; doi: 10.1016/j.jaad.2009.04.032

11.

Regional Cancer Centres (RCC). Tidig Upptäckt av Hudcancer med Teledermatoskopi. Available from: https://cancercentrum.se/globalassets/vara-uppdrag/prevention-tidig-upptackt/hudcancer/rcc-rapport_tidig_upptackt_teledermatoskopi_11dec18.pdf [Last accessed: October 15, 2024 ].

12.

Deda

, Goldberg

, Jamerson

, et al. Dermoscopy practice guidelines for use in telemedicine. NPJ Digit Med, 2022; 5(1):55; doi: 10.1038/s41746-022-00587-9

13.

Bossuyt

, Reitsma

, Bruns

, et al.; STARD Group. STARD 2015: An updated list of essential items for reporting diagnostic accuracy studies. Clin Chem, 2015; 61(12):1446–1452; doi: 10.1373/clinchem.2015.246280

14.

Schultz

, Ivert

, Lapins

, et al. Lead time from first suspicion of malignant melanoma in primary care to diagnostic excision: A cohort study comparing teledermatoscopy and traditional referral to a dermatology clinic at a tertiary hospital. Dermatol Pract Concept, 2023; 13(1):e2023018; doi: 10.5826/dpc.1301a18

15.

Koop

, Kruus

, Hallik

, et al. A country-wide teledermatoscopy service in Estonia shows results comparable to those in experimental settings in management plan development and diagnostic accuracy: A retrospective database study. JAAD Int, 2023; 12:81–89; doi: 10.1016/j.jdin.2023.02.019

16.

MacLellan

, Price

, Publicover-Brouwer

, et al. The use of noninvasive imaging techniques in the diagnosis of melanoma: A prospective diagnostic accuracy study. J Am Acad Dermatol, 2021; 85(2):353–359; doi: 10.1016/j.jaad.2020.04.019

17.

Argenziano

, Puig

, Zalaudek

, et al. Dermoscopy improves accuracy of primary care physicians to triage lesions suggestive of skin cancer. J Clin Oncol, 2006; 24(12):1877–1882; doi: 10.1200/jco.2005.05.0864

18.

Menzies

, Emery

, Staples

, et al. Impact of dermoscopy and short-term sequential digital dermoscopy imaging for the management of pigmented lesions in primary care: A sequential intervention trial. Br J Dermatol, 2009; 161(6):1270–1277; doi: 10.1111/j.1365-2133.2009.09374.x

19.

Koelink

, Vermeulen

, Kollen

, et al. Diagnostic accuracy and cost-effectiveness of dermoscopy in primary care: A cluster randomized clinical trial. J Eur Acad Dermatol Venereol, 2014; 28(11):1442–1449; doi: 10.1111/jdv.12306

20.

Esteva

, Kuprel

, Novoa

, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature, 2017; 542(7639):115–118; doi: 10.1038/nature21056

21.

Tschandl

, Rinner

, Apalla

, et al. Human-computer collaboration for skin cancer recognition. Nat Med, 2020; 26(8):1229–1234; doi: 10.1038/s41591-020-0942-0

22.

Chanda

, Hauser

, Hobelsberger

, et al.; Reader Study Consortium. Dermatologist-like explainable AI enhances trust and confidence in diagnosing melanoma. Nat Commun, 2024; 15(1):524; doi: 10.1038/s41467-023-43095-4

23.

Haenssle

, Fink

, Schneiderbauer

, et al.; Reader study level-I and level-II Groups. Man against machine: Diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann Oncol, 2018; 29(8):1836–1842; doi: 10.1093/annonc/mdy166

24.

Maron

, Utikal

, Hekler

, et al. Artificial intelligence and its effect on dermatologists’ accuracy in dermoscopic melanoma image classification: Web-based survey study. J Med Internet Res, 2020; 22(9):e18091; doi: 10.2196/18091

25.

Tschandl

, Codella

, Akay

, et al. Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: An open, web-based, international, diagnostic study. Lancet Oncol, 2019; 20(7):938–947; doi: 10.1016/s1470-2045(19)30333-x

26.

Tschandl

, Rosendahl

, Akay

, et al. Expert-level diagnosis of nonpigmented skin cancer by combined convolutional neural networks. JAMA Dermatol, 2019; 155(1):58–65; doi: 10.1001/jamadermatol.2018.4378

27.

Krakowski

, Kim

, Cai

, et al. Human-AI interaction in skin cancer diagnosis: A systematic review and meta-analysis. NPJ Digit Med, 2024; 7(1):78; doi: 10.1038/s41746-024-01031-w

28.

Łudzik

, Witkowski

, Roterman-Konieczna

, et al. Improving diagnostic accuracy of dermoscopically equivocal pink cutaneous lesions with reflectance confocal microscopy in telemedicine settings: Double reader concordance evaluation of 316 cases. PLoS One, 2016; 11(9):e0162495; doi: 10.1371/journal.pone.0162495

29.

Pow

, Mello-Thoms

, Brennan

. Evaluation of the effect of double reporting on test accuracy in screening and diagnostic imaging studies: A review of the evidence. J Med Imaging Radiat Oncol, 2016; 60(3):306–314; doi: 10.1111/1754-9485.12450

30.

Witkowski

, Łudzik

, Arginelli

, et al. Improving diagnostic sensitivity of combined dermoscopy and reflectance confocal microscopy imaging through double reader concordance evaluation in telemedicine settings: A retrospective study of 1000 equivocal cases. PLoS One, 2017; 12(11):e0187748; doi: 10.1371/journal.pone.0187748

31.

Ferrándiz

, Ojeda-Vila

, Corrales

, et al. Internet-based skin cancer screening using clinical images alone or in conjunction with dermoscopic images: A randomized teledermoscopy trial. J Am Acad Dermatol, 2017; 76(4):676–682; doi: 10.1016/j.jaad.2016.10.041

32.

Niino

, Matsuda

. Age-specific skin cancer incidence rate in the world. Jpn J Clin Oncol, 2021; 51(5):848–849; doi: 10.1093/jjco/hyab057

33.

Gao

, Swetter

, Hawryluk

, et al. Screening motivations among participants of the American Academy of Dermatology’s SPOT skin cancer screening program from 2018 to 2019: A cross-sectional analysis. J Am Acad Dermatol, 2023; 88(3):674–676; doi: 10.1016/j.jaad.2022.06.1194

34.

Ingvar

, Nielsen

, Ingvar

. Factors for not performing total body skin examinations in primary care in association with teledermoscopy. BMC Prim Care, 2023; 24(1):76; doi: 10.1186/s12875-023-02034-4

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.02 MB

0.00 MB

When Are Single Reader Evaluations Insufficient in Teledermoscopic Assessments? Analyses of a Retrospective Cohort Study

Abstract

Background:

Methods:

Results:

Conclusions:

Introduction

Methods

COHORT

STATISTICAL METHODS

Results

PATIENT, ASSESSOR, LESION, AND IMAGE QUALITY CHARACTERISTICS

CORRELATION OF BENIGN-MALIGNANT DISCORDANCE WITH DIAGNOSTIC CONFIDENCE AND MANAGEMENT

DIAGNOSTIC ACCURACY OF SRE

MISDIAGNOSED LESIONS AND RECOMMENDED MANAGEMENT

RISK FACTORS FOR OVERDIAGNOSIS AND UNDERDIAGNOSIS

Discussion

Footnotes

Acknowledgments

Authors’ Contributions

Data Availability Statement

Disclosure Statement

Funding Information

Supplementary Material

References

Supplementary Material