Abstract
Background:
Teledermoscopy (TDS) emerges as an efficient tool for diagnosing skin lesions. In Sweden, double reading is the standard of care, but risk factors for misdiagnosis or mismanagement using single reader evaluations (SRE) are not well-studied. This study aimed to assess the accuracy of SRE compared with the gold standard in TDS.
Methods:
This retrospective cohort study involved 1,997 TDS referrals sent from general practitioners to dermatologists in Stockholm, Sweden, selected based on dermoscopic diagnoses. All referrals underwent double reader evaluations (DRE). Each case was reassessed by a single external assessor, blinded to the DRE result. Based on predefined rules, a gold standard for the most correct diagnosis was established. Diagnostic accuracy and risk factors for misdiagnosis were evaluated. The trial was registered on ClinicalTrials.gov (ID NCT05033678).
Results:
Primary diagnosis by SRE agreed with the gold standard on benign-malignant classification in 84% of cases. Discordance was linked to lower diagnostic confidence and more frequent recommendations for further intervention. SRE achieved a benign-malignant sensitivity and specificity of 84% (95% confidence interval: 81–87% and 82–86%, respectively). The risk of overdiagnosis increased 96 times when assessors reported being “very unconfident.” Out of a total of 311 melanomas, melanoma in situ, lentigo maligna, and severely dysplastic nevi, 62 were not recognized in the SRE primary diagnosis. However, 50 of these misdiagnosed lesions were still recommended for accurate management.
Conclusions:
The confidence level of TDS assessors heavily influences diagnostic accuracy. Therefore, when diagnostic confidence is perceived as moderate or low, additional interventions should be considered.
Introduction
The incidence of skin cancer is increasing worldwide. 1 This poses a burden on the already strained health care system due to the rising demand for skin lesion assessments. In many countries, patients with suspicious skin lesions initially seek medical advice at primary care centers. General practitioners (GPs) can successfully diagnose and manage common types of skin lesions, but they do not have the same level of expertise as dermatologists. 2 Previous studies have shown that the diagnostic concordance between GPs and dermatologists is 45–54% in a clinical setting. 3,4 To improve diagnostic accuracy in primary care, streamlining the referral and triage process from GPs to dermatologists could be beneficial.
Communication technologies (e.g., computers, the internet, and smartphones) are especially suitable for dermatology considering their visual character. 5,6 This is being utilized in teledermoscopy (TDS) (i.e., electronic referrals including clinical and dermoscopic images obtained in primary care and assessed by dermatologists). The introduction of smartphone-compatible dermoscopes has made TDS a feasible alternative for distant assessments of skin lesions. 7 Nevertheless, previous studies have shown that the diagnostic accuracy of TDS is somewhat impaired compared to face-to-face consultations by dermatologists. 8 –10
Following a pilot study on TDS, the Swedish Regional Cancer Centres recommend double reader evaluations (DRE) (i.e., two dermatologists assessing each TDS referral) to increase diagnostic accuracy). 11 However, double reading in TDS is not common practice internationally. 12 Our hypothesis posited that certain circumstances exist in which relying solely on single reader evaluations (SRE) in TDS might lead to mismanagement of lesions. Hence, the overall aim of this study was to improve skin cancer triage, with the specific objective of evaluating the accuracy of SRE in TDS and providing evidence for when SRE alone may not ensure proper management of skin lesions.
Methods
COHORT
This cohort study was conducted at the Department of Dermatology, Skåne University Hospital, Lund, Sweden, in accordance with the standards for the reporting of diagnostic accuracy studies criteria from 2015. 13 Regional ethical review board approvals (2020-04763 and 2021-01817) were acquired. The trial was registered on ClinicalTrials.gov with the ID NCT05033678, and the study protocol can be requested from the corresponding author. The cohort was derived from a previously established teledermoscopic database. Images were captured using a Heine iC1 dermoscope (Heine Optotechnik, Herrsching, Germany) attached to an iPhone (Apple, Cupertino, CA, USA).
Inclusion criteria were TDS referrals, including pigmented and nonpigmented lesions, sent between 2017 and 2021 from GPs to assessors in the Stockholm health care region. 14 The referrals were registered on the web platform Dermicus® (Dermicus, Gothenburg, Sweden). Emanating from a power calculation, 2,000 cases were extracted with the following distribution of dermoscopic diagnosis (with priority for the most recent cases): 30% nevi, 19% melanomas (MM), MM in situ, lentigo maligna and severely dysplastic nevi (henceforth referred to as the MM group lesions), 10% basal cell carcinomas (BCC), 8% squamous cell carcinomas (SCC), 23% seborrheic keratoses, 7% dermatofibromas, and 3% vascular lesions.
Every TDS referral included standardized information on the patient age, gender, lesion location, first appearance of the lesion, and any perceived change or symptoms. The DRE was performed as part of ongoing health care and involved two TDS assessors, with at least one having extensive experience, independently reviewing the clinical and dermoscopic images (Fig. 1). Subsequently, the assessors discussed each case in a chat function until consensus was reached. The joint referral response included a dermoscopic description, a primary diagnosis, and recommended management. All cases were then prepared for SRE by concealing the previous DRE consultation report, before being randomly assigned to one out of 13 assessors (0 to >15 years of dermoscopic experience) in the Skåne health care region. SRE assessors provided information on image quality (poor, reduced, and good), primary and differential diagnoses, level of diagnostic confidence (very unconfident, unconfident, fairly confident, confident, or very confident), and management recommendations.

Example of clinical and dermoscopic images attached to teledermoscopy referrals.
STATISTICAL METHODS
Statistical analyses were performed using Stata version 18.0 (StataCorp LLC, College Station, TX, USA). The primary outcome was diagnostic accuracy for SRE compared to the gold standard (see below). The secondary outcome was to assess accurate management in SRE and to identify risk factors for misdiagnosis in TDS. For the purpose of analysis, diagnoses were categorized as follows: “nevus” (including all subtypes), “MM group lesions,” “lentigo solaris+seborrheic keratosis+benign lichenoid keratosis,” “verruca,” “dermatofibroma,” “SCC+SCC in situ+keratoacanthoma+actinic keratosis,” “BCC,” “vascular lesion,” “dermatitis+dermatosis,” “benign tumor/inflammation,” “unclear but likely benign,” or “unclear but likely malignant.” Accurate management for malignant lesions included “biopsy,” “excision,” “referral to dermatology clinic,” “initiation of standardized care pathway melanoma,” “dermoscopic follow-up in 3–6 months,” and, for actinic keratosis, “topical treatment.” p-Values <0.05 were considered statistically significant.
A gold standard for the most correct diagnosis of a lesion was constructed and based on diagnosis from (1) histopathology, (2) dermoscopic follow-up after 3–6 months, or (3) DRE (Fig. 2). Cases were analyzed to compare the assessors’ mean diagnostic confidence (Student’s t-test) and frequency in recommending an intervention (Pearson’s chi-squared test) based on the benign-malignant concordance or discordance between SRE primary diagnosis and the gold standard.

Flowchart illustrating the hierarchical rules and construction of the most correct diagnosis, that is, the gold standard diagnosis, for each teledermoscopic case.
Sensitivity, specificity, positive predictive value, and negative predictive value were calculated to determine the accuracy of benign-malignant classification in SRE when compared with the gold standard. The following diagnoses were categorized as malignant: MM group lesions, BCC, SCC, SCC in situ, keratoacanthoma, actinic keratosis, and lesions categorized as “unclear but likely malignant.” The same analyses were conducted for MM recognition (defined as MM group lesions/not MM group lesions). In addition, receiver operating characteristic (ROC) curves were plotted for both outcome measures.
Multivariate logistic regression models were constructed to investigate the correlation between misdiagnosis in SRE and various patient, assessor, lesion, or image quality characteristics. Lesion locations were categorized based on the expected degree of sunlight exposure (Table 1). Two different outcome variables were used: overdiagnosis of benign lesions (using correctly diagnosed benign lesions as a reference) and underdiagnosis of malignant lesions (using correctly diagnosed malignant lesions as a reference).
Patient and Lesion Characteristics in the Teledermoscopic Cohort
Categorized based on the expected degree of sunlight exposure.
Gold standard constructed using (1) histopathology, (2) dermoscopic follow-up in 3–6 months, or (3) double reader evaluation.
BCC, basal cell carcinoma; MM, malignant melanoma; SCC, squamous cell carcinoma.
Results
PATIENT, ASSESSOR, LESION, AND IMAGE QUALITY CHARACTERISTICS
Three cases were lost due to incorrect data entry. Thus, the final cohort consisted of 1,997 TDS referrals (Table 1). The mean patient age was 52 years (range 17–99), and the majority were women (62%). Histopathological diagnosis (highest-ranked correct diagnosis) was available for 568 lesions (28%). A review of all cases that were recommended for dermoscopic follow-up in 3–6 months by DRE (n = 153) revealed that most cases lacked information on follow-up. Only 31 lesions had confirmed dermoscopic follow-up (TDS or face-to-face consultation at a dermatology clinic), and in all cases, the initial DRE primary diagnosis was confirmed. Moreover, 17 out of the 153 lesions had received a histopathological diagnosis after the DRE. In Fig. 3, additional data from the SRE assessors are shown. Most referrals (54%) were evaluated by assessors with ≤ 5 years of dermoscopic experience. Assessors rated their diagnostic confidence as confident or very confident in about 50% of the cases. Only 3.6% of the clinical and dermoscopic images were of poor quality. Further analysis revealed that assessors recommended an additional intervention in 93% of cases with poor image quality. In comparison, further intervention was recommended in 63% of cases with reduced image quality and 50% of cases with good image quality.

Additional data from single reader evaluation assessors evaluating teledermoscopy referrals (n = 1,997).
CORRELATION OF BENIGN-MALIGNANT DISCORDANCE WITH DIAGNOSTIC CONFIDENCE AND MANAGEMENT
When comparing SRE primary diagnosis to the gold standard, 1,684 cases (84%) had benign-malignant concordance (Table 2). In benign-malignant concordant assessments, the assessor’s diagnostic confidence (scale 1–5) was significantly higher (mean 3.7) compared with discordant assessments (mean 2.7) (p-value < 0.001). When the assessor reported being “very confident,” there was benign-malignant concordance to the gold standard in 500 out of 509 cases (98%). This can be compared with 81% in “fairly confident” and 64% in “very unconfident.” Moreover, assessors were more inclined to recommend no further intervention in cases with benign-malignant concordance compared with discordance (50% vs. 7.4%, p-value < 0.001).
Comparison of Benign-Malignant Concordant/Discordant Cases by Single Reader Evaluation Primary Diagnosis and the Gold Standard
Gold standard constructed using (1) histopathology, (2) dermoscopic follow-up in 3–6 months, or (3) double reader evaluation.
Imiquimod, 5-Fluorouracil, cortisone, and so on.
SRE, single reader evaluation.
DIAGNOSTIC ACCURACY OF SRE
Diagnostic agreement on the primary diagnosis between SRE and the gold standard was 68%. Detailed information on the diagnostic accuracy of SRE is presented in Table 3. Using benign-malignant categorization, SRE achieved a sensitivity and specificity of 84%. The sensitivity for recognition of MM group lesions was 80%, while the specificity was 90%. Cross-tabulation of SRE by the results of the gold standard is available in Supplementary Table S1. Fig. 4 presents ROC curves for SRE regarding benign-malignant classification and MM recognition. Subanalyses of sensitivity and specificity based on assessors’ dermoscopic experience only had a marginal effect on the results (Supplementary Table S2).

Receiver operating characteristic (ROC) curves for single reader evaluations, illustrating benign-malignant classification
Sensitivity, Specificity, and Positive and Negative Predictive Values of Single Reader Evaluations When Assessing Teledermoscopy Referrals
Defined as MM group lesions (invasive MM, MM in situ, lentigo maligna, and severely dysplastic nevi)/not MM group lesions.
CI, confidence interval; MM, malignant melanoma.
MISDIAGNOSED LESIONS AND RECOMMENDED MANAGEMENT
There were 311 MM group lesions in the gold standard (Table 1), with 294 lesions histopathologically confirmed as invasive MM (n = 115), MM in situ (n = 119), lentigo maligna (n = 1), and severely dysplastic nevus (n = 59). Histopathological reports were unavailable for 17 lesions classified as MM group lesions by DRE. In SRE primary diagnosis, 62 MM group lesions were misdiagnosed, out of which 55 (89%) had histopathological confirmation and 7 (11%) were diagnosed by DRE (Supplementary Table S3). Importantly, the SRE differential diagnosis was an MM group lesion in 71% of these cases, and 81% received accurate management recommendations. Topical treatment was advised for one lesion, and the remaining 11 (3.5% of all MM group lesions) were recommended no further intervention. Four of the dismissed lesions were invasive MM, and in three of these cases, the SRE was conducted by an assessor with < 5 years of experience (additional information is available in Supplementary Table S4). When excluding MM group lesions, there were still 40 malignant lesions that had been misdiagnosed as benign. However, accurate management was recommended in 28 (70%) cases. In the remaining 12 cases (6 BCCs, 3 actinic keratoses, and 3 SCC in situ), no further intervention was recommended.
In contrast, 219 out of 1,399 benign lesions (16%) were misdiagnosed as malignant and managed accordingly. Further analysis of these lesions revealed that the assessors’ mean diagnostic confidence was 2.7, whereas it was 3.6 in the entire dataset. Image quality was rated as poor in 5.0% of the cases (compared with 3.6% in the entire dataset). Moreover, 212 lesions (15% of all benign lesions) were correctly classified as benign, but an additional intervention (excluding dermoscopic follow-up in 3–6 months) was still recommended. In this group, the assessors’ mean diagnostic confidence was 2.6, and 13% of images were of poor quality.
RISK FACTORS FOR OVERDIAGNOSIS AND UNDERDIAGNOSIS
The results of multivariate logistic regression analyses are presented in Table 4. For each additional year of patient age, the risk of overdiagnosis increased by 2%, while the risk of underdiagnosis decreased by 4%. Male gender was associated with an increased risk of overdiagnosis (odds ratio [OR] 1.5; confidence interval [CI] 95% 1.1–2.1). The risk of misdiagnosis correlated strongly with assessors’ decreasing diagnostic confidence. Using “very confident” as a reference, being “very unconfident” increased the risk of overdiagnosis 96 times and underdiagnosis nine times. Lesion location, dermoscopic experience, or image quality was not associated convincingly with the risk of an erroneous diagnosis.
Multivariate Logistic Regression Analyses Assessing Risk of Overdiagnosis and Underdiagnosis in Single Reader Evaluations
Bold letters indicate statistically significant results (p = < 0.05).
Defined as benign lesions that were misdiagnosed as malignant (n = 219), using correctly diagnosed benign lesions as reference (n = 1,180).
Defined as malignant lesions that were misdiagnosed as benign (n = 94), using correctly diagnosed malignant lesions as reference (n = 504).
CI, confidence interval; OR, odds ratio.
Discussion
This study found that SRE in TDS achieves moderate to high diagnostic accuracy regarding benign-malignant classification and MM recognition. Nevertheless, 18 out of 311 MM group lesions were not recognized in primary or differential diagnosis by SRE. Furthermore, there was a high proportion of benign lesions that were misdiagnosed or mismanaged. Mismanagement of benign lesions occurred more frequently when assessors expressed low confidence in their diagnosis or when the image quality was deemed poor. The risk of an erroneous diagnosis strongly correlated with assessors’ decreasing diagnostic confidence. Investigating methods to increase diagnostic accuracy in TDS holds significant value, and double reading, already employed in several medical specialties such as radiology and pathology, is a feasible and advantageous option.
Vestergaard et al. reported benign-malignant sensitivities (85% and 86%) and specificities (82% and 83%) when two independent SRE were performed on 600 TDS referrals. 9 Gold standard was constructed using diagnosis from histopathology (37%), follow-up visit (12%), or face-to-face consultation (51%). 9 This is similar to the results of our study, in which the SRE benign-malignant sensitivity and specificity were both 84%. An Estonian study by Koop et al. presented a somewhat higher sensitivity (90%) and specificity (93%) for MM recognition in TDS. 15 However, diagnostic accuracy was calculated based on a combination of diagnosis and management plan data. MacLellan et al. analyzed the diagnostic accuracy of a teledermoscopist in diagnosing MM (including in situ) compared to histopathological diagnosis. The authors found a sensitivity of 90% and a specificity of 66%. 16 In our study, the sensitivity and specificity for MM recognition were 80% and 90%, respectively. In a primary care setting study, GPs in Spain and Italy classified lesions (using dermoscopy) as banal or suggestive of skin cancer and achieved a sensitivity of 79% and a specificity of 72%. 17 Moreover, Menzies et al. found that GPs utilizing a dermoscopy had a sensitivity of 53%, whereas the specificity was 89% for identifying MM. 18 When comparing the diagnostic accuracy of TDS and dermoscopy conducted by GPs, it becomes reasonable to argue that employing TDS enhances the triage of skin lesions.
Although the diagnostic accuracy of SRE in TDS might be regarded as relatively high in this study, we also found a considerable number of malignant and benign lesions that were either misdiagnosed or mismanaged. As cases were selected based on registered dermoscopic diagnosis, the prevalence of MM group lesions (n = 311, 16%) was not representative of the natural ratio (6.7%) in our teledermoscopic database. Nevertheless, 20% of all MM group lesions were misdiagnosed by SRE primary diagnosis, and 5.8% were still misdiagnosed when including the assessor’s differential diagnosis. In a study by Koelink et al., GPs misdiagnosed 5 out of 13 (38%) MM while using a dermoscope. 19 Vestergaard et al. reported that 6 out of 23 (26%) histopathologically confirmed MM (including in situ) had been misdiagnosed as benign in TDS. 9 In this study, most misdiagnosed MM group lesions were recommended an intervention that increased the probability of a correct diagnosis, but 11 lesions (3.5% of all MM group lesions) were recommended no further intervention. Our results are slightly better than those of two recently published studies in which 7–9% of MM (including in situ) were dismissed without accurate management plans in TDS. 9,16 Mismanaged benign lesions in TDS are also important to consider as they lead to unnecessary interventions and, in extension, societal costs. In SRE, 31% of all benign lesions were either misdiagnosed as malignant or correctly diagnosed as benign but still recommended further intervention.
Our findings emphasize the benefits of greater specificity in TDS, achievable through various approaches, including support from diagnostic machine learning. 20,21 While prior research has primarily focused on the diagnostic accuracy of artificial intelligence tools in experimental settings, a recent systematic review highlights the need for more studies on potential advantages in clinical practice. 20 –27 Currently, double reading is an advantageous option, as it is used in several other medical specialties with scientific evidence showing improved diagnostic accuracy, primarily through increased sensitivity. 28 –30 Two studies evaluating double reading of dermoscopy-reflectance confocal microscopy images found improved sensitivity and management safety. 28,30
Moreover, Tschandl et al. demonstrated that the mean correct rating of dermoscopic images significantly increased when using collective evaluations by three to five assessors (74%) compared with an individual assessor (65%). 21 Other benefits of DRE are continuous peer-to-peer training and quality control and shared medicolegal responsibility. Arguably, double reading could offer a low-cost alternative to increasing diagnostic accuracy since an extra TDS assessment only adds a few minutes. 31 However, in some cases even DRE may not suffice to reach the correct diagnosis and/or management, necessitating additional interventions such as face-to-face consultations or biopsies. To what extent DRE in TDS improve diagnostic accuracy and pose a cost-effective alternative to SRE needs further evaluation.
The strong correlation between the risk of misdiagnosis and assessors’ decreasing diagnostic confidence is supported by two previous studies. 9,31 Our analysis revealed that decreasing diagnostic confidence primarily correlates to the risk of overdiagnosis (i.e., benign lesions being misdiagnosed as malignant). We hypothesize that assessors are more likely to overcall when in doubt due to fear of missing malignant lesions. This is further strengthened by the finding that assessors’ mean diagnostic confidence was lower when benign lesions were either misdiagnosed as malignant or recommended unnecessary interventions. We also found that each additional year of patient age increased the risk of overdiagnosis and decreased the risk of underdiagnosis. One explanation could be that assessors are inclined to diagnose lesions as malignant in older age groups, knowing that skin cancer is more common in the elderly. 32 The gender distribution in this study population was unequal, with 62% being women, reflecting the unequal gender distribution in the cohort of TDS referrals. 33,34 Nonetheless, male gender was associated with an increased risk of overdiagnosis, contradicting a previous study that found no correlation between gender and the risk of misdiagnosis. 31 Moreover, poor image quality was significantly associated with a decreased risk of overdiagnosis yet simultaneously linked with recommending additional interventions. Our findings are in concordance with a study by van der Heijden et al., showing that good quality images are associated with higher accuracy regarding diagnosis and, in particular, management. 8 This further emphasizes the importance of good image quality in TDS.
The main strength of this study is the relatively large cohort of 1,997 authentic TDS referrals. There are also several limitations to this study. First, referrals were selected based on registered diagnosis, which resulted in a somewhat skewed distribution of diagnoses. Second, only about a third of the lesions had histopathological confirmation. Last, when interpreting our results, consideration should be given to the artificial circumstances in which the SRE was performed compared with the ongoing health care that resulted in the gold standard and, likewise, the different local management customs in these two settings.
In conclusion, diagnostic accuracy in SRE heavily depends on the assessor’s confidence in the diagnosis. When assessors perceive their diagnostic confidence as moderate or low another intervention (such as DRE, biopsy, or face-to-face consultation) is necessary. Furthermore, low image quality in TDS triggers unnecessary interventions in benign lesions, highlighting the importance of developing tools to improve image quality and standardization.
Footnotes
Acknowledgments
The authors thank Johan Palmgren, Karim Saleh, Fredrik Johansson, Johan Kappelin, and Teo Helkkula for their help with data collection. Many thanks to the statisticians at Kliniska Studier Forum Söder.
Authors’ Contributions
C.N.: Conceptualization, methodology, validation, data curation, formal analysis, investigation, writing—original draft, and visualization. H.K., B.P., C.S., S.L., A.P.M., J.I., and A.D.: Validation, data curation, and writing—review and editing. J.L., L.U.I., N.R., and K.S.: Methodology, resources, data curation, validation, and writing—review and editing. K.N.: Methodology, data curation, writing—review and editing, and funding acquisition. Å.I.: Conceptualization, methodology, validation, data curation, writing—review and editing, supervision, and funding acquisition.
Data Availability Statement
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
Disclosure Statement
The authors declare that they participate in a research project with Chalmers Industry and Dermicus, the company that provides the digital platform Dermicus. However, Dermicus has not contributed to or had any influence on the performance of this study. Å.I. has received speaker and consulting honoraria from Galderma Sweden, Perrigo Sweden, MSD Sweden, and Biofrontera Sweden. Not related to the current study, K.N. has during the last three years received speaker honoraria from Galderma Sweden, LEO Pharma, Novartis Sweden, and UCB Pharma and has served on one advisory board for MSD.
Funding Information
The study was funded by Hudfonden, Märta Wrinklers stiftelse för främjande av medicinsk forskning, S.R Gorthon foundation, the Krapperup foundation, and Swedish governmental funding of clinical research (ALF).
Supplementary Material
Supplementary Table S1
Supplementary Table S2
Supplementary Table S3
Supplementary Table S4
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
