Abstract
Background:
Inconsistent diagnostic test accuracies of immunohistological staining for squamous cell carcinoma (SQC) of the lung have been frequently reported. There have been few meta-analyses of the diagnostic accuracies of the immunohistochemical markers.
Methods:
A systematic review and meta-analysis were performed following standard guidelines for systematic reviews of diagnostic test accuracy. Immunohistochemical markers (p40, p63, CK5/6, and DSC3) were evaluated as index tests for SQC. The diagnostic odds ratio (DOR) was obtained by the DerSimonian–Laird variate model. Summary estimates of sensitivity and specificity were calculated using a bivariate model. The protocol registration ID is UMIN000041664.
Results:
The meta-analysis included 85 of the 1353 first-screened articles. The total number of patients was 17,893, which consisted 6151 SQC cases and 11,742 non-squamous non-small-cell lung cancer cases. The DOR was better for p40 (377, 95% confidence interval (CI) = 213–644, I2 = 0%) than for CK5/6 (120, 95% CI = 78–184, I2 = 2.5%), p63 (70, 95% CI = 55–88, I2 = 9.1%), and DSC3 (94, 95% CI = 35–250, I2 = 3.7%). Summary estimates of sensitivity and specificity were followings: p40 sensitivity 0.92 (95% CI = 0.89–0.95), specificity 0.94 (95% CI = 0.93–0.96); p63 sensitivity 0.92 (95% CI = 0.90–0.94), specificity 0.83 (95% CI = 0.80–0.86); CK5/6 sensitivity 0.90 (95% CI = 0.87–0.93), specificity 0.91 (95% CI = 0.89–0.93); DSC3 sensitivity 0.81 (95% CI = 0.73–0.88), and specificity 0.95 (95% CI = 0.85–0.98).
Conclusion:
P40 had the best DOR to diagnose SQC in non-small-cell lung carcinoma. Despite its lower sensitivity, DSC3 had the best specificity among the four markers and might be useful to rule-in the diagnosis of SQC.
Introduction
Lung cancer is the leading cause of cancer-related death. In 2017, there were 2.2 million incident cases of lung cancer and 1.9 million deaths. 1 Lung cancer is divided histologically into two main subtypes: small-cell lung carcinoma (SCLC) and non-small-cell lung carcinoma (NSCLC), accounting for 15% and 85% of all cases, respectively. NSCLC is further classified into three main types: squamous cell carcinoma (SQC), adenocarcinoma (ADC), and large-cell carcinoma. SQC accounts for 25–30% of all lung cancer cases. 2 ADC and large-cell carcinoma are usually called non-squamous non-small-cell lung cancer (NSQ-NSCLC).
The selection of medical management is based on the histological subtype. 3 Although the majority of anti-cancer agents had similar efficacy for NSQ-NSCLC and SQC of the lung, drugs such as pemetrexed and bevacizumab are only effective for patients with NSQ-NSCLC.4,5 Based on molecular advances and the clinical demand for accurate subclassification of lung cancer, the World Health Organization (WHO) updated the Classification of Tumors of the Lung, Pleura, Thymus, and Heart in 2015, which emphasized the expanded use of immunohistochemical techniques even for the diagnosis of SQC and NSQ-NSCLC and explicitly included some immunohistochemical markers. 6 The incidence of large-cell cancer of the lung has been decreasing since 2015 because these immunohistochemical markers can discern the difference between poorly differentiated SQC and ADC. 7
Numerous immunohistochemical and immunocytochemical markers have been explored to distinguish between pulmonary SQC and NSQ-NSCLC. p40, p63, cytokeratin 5/6 (CK5/6), and desmocollin-3 (DSC3) have been frequently used in the diagnosis of SQC. 8 Sensitivity and specificity are key metrics to understand the diagnostic test accuracy of immunohistochemical staining techniques. To the best of our knowledge, no systematic review has evaluated the diagnostic test accuracy of SQC immunohistochemical markers. The current systematic review and meta-analysis aimed to summarize data from the previous studies of diagnostic test accuracy of immunohistochemical markers used for the diagnosis of SQC.
Methods
Study overview
The protocol of this systematic review and meta-analysis of diagnostic test accuracy was prepared following standard guidelines for systematic reviews of diagnostic test accuracy and registered on the website of the University Hospital Medical Information Network Clinical Trials Registration (UMIN000041664).9,10 Approval of the Institutional Review Board was not required because of the nature of this study. A checklist of PRISMA was shown in Supplementary Table 1.
Study search
Four major online databases, PubMed, Web of Science, Cochrane, and Embase, were searched (January 31, 2020). The following search strategy was used for PubMed: ((p40 OR deltaNp63 OR ΔNP63) OR (p63 OR DBR16.1) OR (ck5/6 OR Cytokeratin 5/6) OR (desmocollin 3 OR desmocollin-3 OR DSC3 OR DSC-3) OR (TTF1 OR TTF-1 OR Thyroid transcription factor-1 OR Thyroid transcription factor 1) OR (NapsinA OR Napsin A OR TA02 OR aspartic protease) OR (CK7 OR cytokeratin7 OR cytokeratin 7)) AND (sensitivity and specificity) AND (NSCLC OR lung OR pulmonary OR bronchial OR pleural OR respiratory OR bronchoscopy) AND (NSCLC OR adenocarcinoma OR squamous OR squamous-cell OR non-small OR non small). The detailed information of the research stratagem was shown in Supplementary Table 2.
Two authors (SK and NH) independently screened the titles and abstracts and carefully evaluated full text to select eligible articles; in cases of discrepancy, they reached a consensus through discussion. Review articles and included original articles were hand-searched (HC and NH) for additional research papers that met the inclusion criteria.
Study selection
Full articles, brief reports, and conference abstracts published in any language that provided data for sensitivity and specificity of immunohistochemical markers to diagnose lung SQC were included. An article that provided data of either sensitivity or specificity was excluded since bivariate analysis is not applicable for such data. 9 A case–control study design that consisted of patients with ADC and SQC was accepted, though a case–control design may be considered to have a risk of bias according to Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2). 11
The target population was patients with NSCLC. Commonly used pathological criteria were accepted along with WHO 2015 criteria. A study that collectively evaluated both NSCLC and SCLC was excluded since such a study did not fit the clinical question. Studies focusing on non-pulmonary cancers and metastatic lung cancers of non-pulmonary origin were also excluded. Similarly, studies that compared NSCLC subtypes and mesothelioma were not accepted. Studies including patients with only ADC or SQC diagnosis were considered two-gate studies, and studies including all NSCLC patients were considered one-gate studies.
Specimens outside the lung such as lymph nodes and pleural effusion were accepted as well. Immunocytochemical staining using lung cytology or pleural effusion cell blocks was also accepted along with immunohistochemical staining. Small samples from cell blocks, lymph nodes, and pleural effusion was classified as biopsy specimen.
Target immunohistochemical markers included p40, p63, CK5/6, and DSC3 for SQC. Immunohistochemical techniques using any commercially available antibodies and non-commercial antibodies were accepted. The reference test had to be a pathological diagnosis by pathologists.
Risk of bias
QUADAS2 was applied to assess the risk of bias in each study. 11
Outcomes
Sensitivity, specificity, area under the curve (AUC), and the diagnostic odds ratio (DOR) were evaluated. If two or more cutoffs were applied in an original article, all of the weakly, moderately, and strongly positive were collectively considered positive. To diagnose SQC, both SQC and adenosquamous carcinoma were counted since adenosquamous carcinoma has a squamous cell component, whereas large-cell carcinoma and NSCLC not otherwise specified were not counted as SQC.
Data extraction
Two review authors, SK and NH, independently extracted data, including the name of the first author, publication year, publication country, types of immunohistochemical markers, numbers of patients with positive results, numbers of patients evaluated, and QUADAS-2-related information.
Statistics
A bivariate model was used to obtain pooled sensitivity and specificity and to draw a summary receiver operating characteristic curve (SROC). 12 The DOR was obtained by the DerSimonian–Laird random model. The DOR was calculated by the ‘madauni’ command (‘netmeta’ package of R project, Gerta Rücker, Denmark). Sensitivity, specificity, and AUC were pooled by the ‘reitsma’ command (‘netmeta’ package of R project, Gerta Rücker, Denmark). AUCs were interpreted as follows: ⩾0.97, excellent; 0.93–0.96, very good; 0.75–0.92, very good; and 0.5–0.74, fair. 13 The threshold for significance was set at 0.05. Heterogeneity evaluated using I2 statistics was interpreted as follows: I2 = 0%, no heterogeneity; I2 > 0% but <25%, minimal heterogeneity; I2 ⩾ 25% but <50%, mild heterogeneity; I2 ⩾ 50% but <75%, moderate heterogeneity; and I2 ⩾ 75%, strong heterogeneity. 14
Results
Study search and study characteristics
A total of 1346 articles, including 1336 articles through database search and 17 articles by hand search, were identified; 999, 229, and 85 articles were left after removing duplication, screening, and full-article reading, respectively (Figure 1). Finally, 85 reports, comprising 75 full-length articles and 10 conference abstracts, were included (Table 1). All were written in the English language except for one article written in the Chinese language. Prospective study designs were adopted in four articles, and the other 81 were retrospective studies. Of the 85 reports, 28 were from the United States, nine were from China, six were from Germany and Japan, five were from Turkey and the United Kingdom. Of the 17,893 patients who were enrolled in this study, 6151 had SQC based on the pathological diagnosis, and 11,742 had NSQ-NSCLC. Surgical specimens were assessed in 34 studies, and 31 studies evaluated biopsied specimens, whereas 10 studies collected both surgical and biopsy samples. Ten reports did not specify specimen type. Fifty-one studies were two-gate case–control studies, enrolling SQC and ADC, respectively, and the other 34 studies were one-gate studies that enrolled NSCLC specimens. The WHO classification of lung cancer pathology was used in 67 articles, and the other 18 studies did not mention classification criteria. The cutoff values for immunohistochemical markers were 1% in 29 studies, 5% in 6 studies, 10% in 15 studies, and 35 studies that did not report cutoff values.

Flow diagram of this study.
Characteristics of included studies.
Ad, adenocarcinoma; B, biopsy; CA, conference abstract; FA, full article; NS, not specified; NSCLC, non-small-cell lung cancer; Pros, prospective study; Retro, retrospective study; S, surgery; SQC, squamous cell carcinoma; WHO, World Health Organization.
Clones of used immunohistochemical markers were shown in Supplementary Table 3. Although different clones were used in studies, more than half of the studies used the same clone ploy antibody, 4A4, D5-16B4, and DSC3-U114 for p40, p63, CK5/6, and DSC3, respectively. The risk of bias assessment is shown in Figure 2. There were 45 studies with high patient selection bias, and 26 studies showed an unclear risk of selection bias. A total of 12 studies and 24 studies with high and unclear risk of bias compared to the reference. No study showed bias in patient selection applicability concerns, index test, index test applicability concerns, reference standard applicability concerns, and flow and timing.

Selection bias of studies.
Diagnostic accuracy of p40
Thirty-four studies with 6788 samples yielded a DOR of 377 (95% confidence interval (CI) = 213–644; I2 = 0%) and an AUC of 0.976. This AUC suggested that p40 had ‘excellent’ diagnostic test accuracy for SQC (Figure 3(a), Table 2). 13 The summary estimates of sensitivity and specificity were 0.92 (95% CI = 0.89–0.95) and 0.94 (95% CI = 0.92–0.96), respectively. The one-gate subgroup analysis including 14 studies found similar DOR, AUC, sensitivity, and specificity of 477 (95% CI = 154–1479; I2 = 0%), 0.976, 0.92 (95% CI = 0.88–0.95), and 0.94 (95% CI = 0.92–0.97), respectively (Table 2). The paired forest plots of sensitivity and specificity for each study of p40 are shown in Supplementary Figure 1. Fagan’s nomogram for p40 wash is shown in Supplementary Figure 5. For p40, likelihood positive (LR+) is 15.3, likelihood negative (LR−) is 0.85. In this example, the pretest probability is 90%. Posttest probability is 99.3% for the positive test and is 46% for the negative test.

Diagnosis accuracy of IHC markers: (a) p40, (b) p63, (c) CK5/6, and (d) DSC3.
Summary of diagnostic accuracy of markers.
AUC, area under the curve; CI, confidence interval; DOR, diagnostic odds ratio.
Diagnostic accuracy of p63
Data of 11,898 samples from 66 reports suggested a DOR of 70 (95% CI = 55–88; I2 = 9.1%) and an AUC of 0.942, which means that p63 had ‘very good’ diagnostic test accuracy for SQC (Figure 3(b), Table 2). The summary estimates of sensitivity and specificity were 0.92 (95% CI = 0.90–0.94) and 0.83 (95% CI = 0.80–0.86), respectively. One-gate subgroup analyses focusing on studies including all NSCLC were performed, and DOR, AUC, sensitivity, and specificity were 92 (95% CI = 57–148; I2 = 0%), 0.950, 0.89 (95% CI = 0.87–0.94), and 0.88 (95% CI = 0.83–0.91), respectively. Sensitivity and specificity for studies of p63 are shown in paired forest plots in Supplementary Figure 2.
Diagnostic accuracy of CK5/6
Forty-nine studies with 8962 specimens yielded a DOR of 120 (95% CI = 78–184; I2 = 2.5%) and an AUC of 0.957. This AUC value suggests that CK5/6 had ‘very good’ diagnostic test accuracy for SQC (Figure 3, Table 2). Using the data from 49 cohorts, the summary estimates of sensitivity and specificity were 0.90 (95% CI = 0.87–0.92) and 0.91 (95% CI = 0.88–0.93), respectively. A one-gate subgroup analysis including 21 cohorts of NSCLC yielded DOR, AUC, sensitivity, and specificity of 131 (95% CI = 62–282.4; I2 = 13.8%), 0.957, 0.89 (95% CI = 0.82–0.93), and 0.92 (95% CI = 0.89–0.94), respectively. The paired forest plots of sensitivity and specificity for each study of CK5/6 are shown in Supplementary Figure 3.
Diagnostic accuracy of DSC3
The diagnostic test accuracy of DSC3 was examined in 2664 samples of ADC and SQC in 10 cohorts. The DOR was 93.9 (95% CI = 35.3–249.7; I2 = 3.7%), and AUC was 0.909. The sensitivity and specificity were 0.81 (95% CI = 0.73–0.88) and 0.95 (95% CI = 0.85–0.98), respectively (Figure 3(d), Table 2). DSC3 showed a ‘good’ diagnostic accuracy, though in relatively limited studies compared with other markers. There were only two cohorts including NSCLC that suggested DOR, AUC, sensitivity, and specificity in a one-gate subgroup analysis of 198 (95% CI = 77.4–506.4; I2 = 0%), 0.899, 0.76 (95% CI = 0.63–0.85), and 0.98 (95% CI = 0.90–0.99), respectively. The paired forest plots of sensitivity and specificity for each study of CK5/6 are shown in Supplementary Figure 4.
Discussion
The diagnostic test accuracies of the immunohistochemical tumor markers p40, p63, CK5/6, and DSC3 in SQC were systematically reviewed. Based on our analysis, p40 showed the best DOR and AUC among these four markers, and the systematic review and meta-analysis provided evidence supporting the use of p40 as the first choice in the algorithm of diagnosis of predicting SQC, as in current guidelines.6,15 Given the AUCs of p63 and CK5/6, which were at least 0.93, suggesting ‘very good’ diagnostic test accuracy, 16 p63, and CK5/6 were all capable in the diagnosis of SQC as a choice, as suggested by some guidelines.17,18 DSC3 did not have ‘very good’ diagnostic accuracy; however, DSC3 had the highest specificity and may be useful for ruling-in SQC when p40 and some markers for ADC are all positive. This finding supported the recommendation of using p40 in the diagnosis of predicting SQC from Lung Cancer/American Thoracic Society/European Respiratory Society (IASLC/ATS/ERS), 17 2015 WHO classification of lung tumors, 6 and The European Society for Medical Oncology (ESMO). 18
Although the detailed diagnosis accuracies of immunohistochemical tumor markers were a litter different in one-gate and two-gate analysis. The expression of p40, p63, CK5/6, or DSC3 might be seen in, for example, LCNEC or other non-ADC NSCLC. The sequence of diagnostic accuracy of each tumor marker kept the same with the result in the overall analysis. Data of studies used in this meta-analysis compared diagnosis accuracy between SQC with ADC or NSQ-NSCLC. The test accuracy of the above immunohistochemical tumor markers to identify metastases to the lungs or salivary gland–type carcinomas was still unclear. The results are seen as just the markers’ ability to separate SQC from ADC.
The combination of TTF1 and p40 was recommended to identify SQC or ADC among NSCLC specimens. TTF1 single-positive suggests ADC of the lung, and p40 single-positive diagnoses SQC. When TTF1 and p40 are double-positive, the specimen should be further stained by highly specific markers such as Napsin A and DSC3, a protein found in desmosomes. 19 On the contrary, when TTF1 and p40 are double-negative, another sensitive marker for ADC, such as CK7. Although CK7 cannot be regarded as an ADC marker, for example, a significant proportion of SQC are positive for CK7, while the addition of CK7 or broad keratin in TTF1/p40-negative NSCC without clear morphology is recommended. 20 Additional sensitive markers for SQC are also required; p63 and CK5/6 are candidates additional immunohistochemical stains. It is true that p63 is more sensitive than CK5/6 for the diagnosis of SQC. Nonetheless, since p40 is the N-terminally truncated isoform of p63, 21 IHC results of p40 and p63 correlate with each other. CK5/6, intermediate-sized basic keratins with a molecular mass of 58 kDa, 22 had a different immunostaining target from p40. Although p63 was slightly more sensitive than CK5/6, CK5/6 might be a better additional marker when TTF1 and p40 are double-negative.
The largest number of studies of SQC IHC markers was conducted for p63, followed by CK5/6, p40, and DSC3. CK5/6 and p63 were the previous standards to diagnose SQC, whereas p40 and DSC3 have been investigated since around 2011. Although studies of p40 and DSC3 were relatively fewer, both of them had abundant samples. Across all analyses, observed heterogeneities were almost absent (I2 < 25%).
There were several limitations in this study. First, the included studies shown by QUADAS-2 were the two-gate study design. A high risk of patient selection was observed. However, results from sensitivity subgroup analysis focusing on one-gate studies were compatible with those from two-gate studies. Second, we searched data from 2001, and diagnosis standard was different with different periods. A total of 36 studies showed a high or unclear risk of reference standard. Third, although more than half of the studies used the same clones of immunohistochemical markers, different clones, and protocols might potentially exit a selection bias in this study.
Conclusion
P40 was the only marker with ‘excellent’ AUC to diagnose SQC among NSCLC. Both CK5/6 and p63 showed ‘very good’ AUC; however, CK5/6 may have slightly better diagnostic test accuracy. Despite the lower sensitivity, DSC3 had the best specificity among the four markers, and it might be useful to rule-in the diagnosis of SQC.
SK and HC contributed to the study search, quality check, data extraction, and drafting. NH worked on the study search, quality check, data extraction, and analysis as a principal investigator. IK, YH, NK, SF, and TK worked on the interpretation of data and the revision process. All the authors gave final approval.
Supplemental Material
sj-docx-1-tam-10.1177_17588359211065152 – Supplemental material for Immunohistochemical markers to diagnose primary squamous cell carcinoma of the lung: a meta-analysis of diagnostic test accuracy
Supplemental material, sj-docx-1-tam-10.1177_17588359211065152 for Immunohistochemical markers to diagnose primary squamous cell carcinoma of the lung: a meta-analysis of diagnostic test accuracy by Hao Chen, Seigo Katakura, Nobuyuki Horita, Ho Namkoong, Ikuma Kato, Yu Hara, Nobuaki Kobayashi, Satoshi Fujii and Takeshi Kaneko in Therapeutic Advances in Medical Oncology
Supplemental Material
sj-docx-2-tam-10.1177_17588359211065152 – Supplemental material for Immunohistochemical markers to diagnose primary squamous cell carcinoma of the lung: a meta-analysis of diagnostic test accuracy
Supplemental material, sj-docx-2-tam-10.1177_17588359211065152 for Immunohistochemical markers to diagnose primary squamous cell carcinoma of the lung: a meta-analysis of diagnostic test accuracy by Hao Chen, Seigo Katakura, Nobuyuki Horita, Ho Namkoong, Ikuma Kato, Yu Hara, Nobuaki Kobayashi, Satoshi Fujii and Takeshi Kaneko in Therapeutic Advances in Medical Oncology
Footnotes
Author contribution
Conflict of interest statement
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Data availability statement
The raw data are available by email on reasonable request to the corresponding author. Email is horitano@yokohama-cu.ac.jp.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
