Abstract
Introduction
Since the 1940s, immunohistochemistry (IHC) has been used to characterize neoplasms depending on the expression of cellular proteins. 1 Until now the conventional manual method of scoring IHC-stained sections is employed by well-trained histopathologists through direct observation of tissue under the light microscope. 2 The scoring should be done carefully so that the entire tumor area is examined not only the positive areas. In this setting, the pathologist divides the microscopic field into regions, scores each region for the intensity of positivity, adds the percentage of positivity from each region, and divides it by the total number of microscopic fields examined. 3
In recent years, PD-L1 (programmed cell death-ligand 1) IHC staining has become mandatory before starting PD-L1 immunotherapy.4,5 For monoclonal antibodies, such as clone 22C3 that stain membranous PD-L1 (mPD-L1), the pathologist must be well-trained to find cells that are considered positive for PD-L1. Any discernible membrane staining either partial or complete is recognized as distinct from cytoplasmic staining, regardless of staining intensity. In samples with a low percentage of positive cells, it is very difficult to unambiguously distinguish “true positive” protein staining and “false positive” artifacts. Causes of discrepancy in scoring between histopathologists include nonspecific staining at the edge of small biopsy specimens, artifacts, weak staining of PD-L1 in areas around hemorrhage, staining of PD-L1 in apoptotic cells, and heterogenicity of cells in the tissue.6,7 As a result, the IHC staining must be scored individually by more than one pathologist in a blinded, randomized manner, and then the results of each specimen are compared and tested for reproducibility. 8 Recent investigations have demonstrated that manual scoring is harder to sort with the development of different PD-L1 antibody assays, detection systems, platforms, scoring algorithms, and cut-offs, which cause no standardized methodology to measure PD-L1 expression. This creates more pressure to achieve reliability and reproducibility8,9 in addition to erroneous perceptions of the strength of the membrane and cytoplasmic staining and cognitive traps that cause rounding of scores. 10 Therefore, high interobserver and interlaboratory variability occurs especially in the presence of heterogeneous tissue samples. Increasing scoring discrepancies leads to false perceptions and biased scores. 11 This creates more pressure to achieve reliability and reproducibility during high workloads, making it time-consuming, requiring large backup storage, and harder to achieve the growing needs of medical care systems. 12
Technology is continuously ingrained in our daily lives. Despite technical advances in many areas, it remains poor in terms of cancer screening, staging, and treatment. 13 ImageJ previously known as NIH (National Institutes of Health) Image was developed in 1987. 14 ImageJ is free software developed by the NIH with technological advancements that have led to significant changes in the medical field. Using ImageJ, IHC-stained sections can be scored. First, a region in the tissue section is selected manually and then measured automatically. 4 ImageJ has been widely used since it is user-friendly. Also, it can be used easily with many resources available online, such as plugins and macros to extend its functionality, computer vision, and machine learning. 15 So, our main goal in this work is to achieve fast, cheap, and efficient IHC results’ reading with minimal human interaction using computer-aided systems known as semi-automated methods that are composed of cameras integrated with a conventional microscope to capture a small fraction of tissue images through objective lenses that allow for both static image capture and live image transmission that are transferred into bio-image analysis software using ImageJ. 16 Tumor scoring depends on the expression of cellular proteins examined and measured automatically which will be compared to well-trained pathologist scoring through direct observation under a microscope. 2
In a recent study, an ImageJ Java-based algorithm was used to score melanoma sections after IHC staining for PD-L1. No significant difference between automated image analysis scoring and pathologist scoring was obtained. 17 Since studies evaluating PD-L1 scoring using ImageJ in cancer are scarce, this study will perform this task by comparing the conventional manual method (ie, scoring by expert histopathologists) with the semi-automated method (human-to-machine interaction) using ImageJ software after IHC staining for mPD-L1. Despite the availability of several artificial intelligence software for scoring IHC sections, the high cost makes them not affordable for everyone, especially in low-income countries, and for researchers wishing to interpret their IHC readings and do not have enough experience in scoring sections.
In the present work, no new plugins were used with ImageJ. ImageJ uses a computerized pixel-profiling open-source plugin IHC Profiler, leading to the assignment of cytoplasmic, membranous, or nuclear expression of various prognostic biomarkers across various types of cancers and normal tissues, leading to automated quantitative IHC analysis and scoring in a given image. 18 The analysis is performed through the cell counter and area measurement, which depends on the staining intensity pattern and contrast that involves isolation of the color information from histological RGB images containing multiple stains, which highlight different cellular structures in the tissue. 19 Therefore, this research tests the ability of the free version of ImageJ for IHC scoring.
Materials and Methods
Ethical Consideration
The present study was approved by the Institutional Review Board of the KHCC (IRB # 21 KHCC 165 for HNSCC and IRB # KHCC 074 for DLBCL specimens). All specimens used in the present study were archived paraffin-embedded tumor tissue specimens with no direct patient identifier.
Patients and Tumor Specimens
For HNSCC tissue specimens were obtained from patients who underwent surgical resection of the primary lesions before receiving chemotherapy and/or radiotherapy. Patients with a history of another neoplasm were excluded. Patients with DLBCL patients whose biopsies were taken at the time of relapse were excluded.
The total selected cases were 120 cases that belonged to adult patients of both sexes diagnosed with primary HNSCC and DLBCL. Only 90 cases (30 cases of HNSCC and 60 of DLBCL) were selected from the 120 cases based on the exclusion criteria or section loss during sectioning due to poor processing of tissues.
Tissue Microarray Construction and IHC Staining
Tissue microarray blocks were constructed from archived paraffin-embedded tissue specimens of all primary HNSCC and DLBCL. IHC staining was performed as previously described. 20 Anti-human PD-L1 monoclonal mouse antibody Clone 22C3 (Dako, Agilent, Carpentaria, CA) was used. After diaminobenzidine (DAB) staining, sections were counterstained with hematoxylin. Human placenta and human tonsil tissue sections were used as positive controls. For negative controls, the primary antibody was omitted and replaced with buffer solution in tissue sections from the same sample under the same conditions.
PD-L1 IHC expression in the primary tumors was evaluated using a light microscope. Conventional manual scoring was performed by taking the average score of 5 nonoverlapping fields by experienced pathologists according to the combined positive score (CPS) where CPS = number of positive tumors and immune cells/total number of viable tumor cells × 100. The same specimens were scored using the ImageJ semi-automated scoring system (free ImageJ Fiji software, version 1.53t WS Rasband, NIH, Bethesda, MD) downloaded from the website: https://ImageJ.net/software/fiji/downloads.
In order to obtain section photographs 2 methods were employed: (1) Camera images were captured using MC 170 HD Leica camera, Switzerland attached to a Leica microscope with a computer system having LAS EZ software. Five fields were photographed using 20 × and 40 × objective lens. (2) Whole slide images (WSI) of the same specimens were scanned through the Leica Apeiro AT2 scanner (Vista, USA) under 20 × magnification. Each image was analyzed by choosing the following options: brightness/contrast, red green blue (RGB), and color space selection (color deconvolution) to enhance image resolution and hue saturation brightness (HSB). The threshold value was set to zero and the maximum threshold was adjusted so that the background signal was removed, without removing the true DAB signal. Then, the area percent was obtained. Details of using ImageJ for image analysis are available in the Supplemental Material.
Statistical Analysis
For statistical analysis, GraphPad Prism Version 6 was used. The normality of the distribution of variables “scores for IHC after staining for PD-L1” was checked by the Shapiro-Wilk tests. None of the tested variables was normally distributed. Therefore, nonparametric analysis was used for all parameters. Friedman test followed by Dunn's multiple comparisons test post hoc was used to compare different scoring methods including conventional, RGB, brightness/contrast, and HSB. Also, the Friedman test was used to compare scores obtained by conventional method, scores obtained by ImageJ for camera-captured images, and WSI obtained using a scanner. Wilcoxon matched-pairs test was used for comparing scores obtained by: (1) ImageJ of camera-captured microscopic photos taken using 40 × versus 20 × magnification power, (2) the average of scores obtained by ImageJ readings of 3 fields versus 5 fields and. Spearman's correlation coefficient was obtained after a 2-tailed bivariate correlation analysis for all variables in this study. A P value ≤ .05 was considered significant for all statistical tests.
Results
Scoring PD-L1 IHC-Stained DLBCL Specimens
No significant difference was found between the scores of the conventional method and ImageJ scores obtained using the options “RGB” or “Brightness/Contrast” for DLBCL sections. The option “Brightness/Contrast” in ImageJ yielded scores that were statistically insignificant from the option “RGB.” On the other hand, a significant difference was found between the conventional and HSB methods (P < .0001). Similarly, a significant difference was obtained between RGB and HSB method (P < .0001) as well as brightness/contrast and HSB method (P < .05).
A strong positive correlation between RGB and conventional scoring methods was found (r = .9789, P < .0001), Figure 1 and between brightness and conventional scoring (r = .9588, p < .0001), Figure 2B. Similarly, a strong positive correlation between RGB and brightness was obtained (r = 0.9342, P < .0001), Figure 1. However, a smaller positive correlation was observed between conventional and HSB (r = 0.8548, P < .0001), Figure 1. Also, a smaller positive correlation was seen between HSB with brightness (r = .8696, P < .0001) and between RGB with HSB (r = .8324, P < .0001).

(A) IHC scores for DLBCL obtained by different scoring methods. (B-D) Spearman's correlation coefficient for conventional scoring versus ImageJ scoring methods for DLBCL.

(A) IHC scores for HNSCC were obtained by different scoring methods. (B-D) Spearman's correlation coefficient for conventional scoring versus ImageJ scoring methods for HNSCC.
Scoring PD-L1 IHC-Stained HNSCC Specimens
No significant difference was found between the conventional method and the “Brightness/Contrast” method. On the other hand, a significant difference was found between the conventional method and RGB (P < .0001). Similarly, a significant difference was found between the conventional method and HSB (P < .0001) (Figure 2). A strong positive correlation was found between the conventional method and the “Brightness/Contrast” method (r = .9848, P < .0001) and between the conventional method and RGB method (r = .9466, P < .0001) as well as between conventional method and “HSB” method (r = .9762, P < .0001), Figure 2B to D.
Comparing ImageJ Analysis for the Same Specimen Using Different Magnification Powers
A significant difference in scores was obtained using images taken under 40 × and 20 × magnification powers (P < .0001). Scores obtained by 20 × magnification were higher than those obtained by 40 × magnification. A strong positive correlation between them was found (r = 0.9891, P < .0001), Figure 3.

(A) ImageJ scores using 20 × magnification power versus 40 × for diffuse large B-cell lymphoma (DLBCL) specimens. (B) Spearman's correlation coefficient between 20 × and 40 × magnification power for DLBCL.
Comparing Conventional, ImageJ of Scanner WSI and ImageJ of Camera-Captured Image
A significant difference between scores obtained by scanner image and scoring of scanner WSI images (P < .01), with a positive Spearman's correlation coefficient (r = .9899, P < .0001), Figure 4A. Similarly, a significant difference between camera—captured image scoring and conventional scoring was obtained (P < .0001), with a positive Spearman's correlation coefficient (r = .9902, P < .0001), Figure 4B. Also, scores obtained by scanner images versus camera-captured images were significantly different from each other (P < .0001).

(A) Spearman's correlation coefficient of scanner WSI scoring versus conventional scoring for DLBCL. (B) Spearman's correlation coefficient of camera-captured image scoring versus conventional scoring for DLBCL. (C) Comparison between camera and scanner-captured image scoring.
Comparing ImageJ Analysis for the Same Specimen Using the Average of Scores of 5 Fields Versus 3 Fields
A significant difference between the average of scores of 5 fields and 3 fields was obtained (P < .0001), with a positive correlation of (r = .9980, P < .0001), Figure 5.

(A) ImageJ scores using the average of 5 fields versus 3 fields for diffuse large B-cell lymphoma (DLBCL) specimens. (B) Spearman's correlation coefficient between 20× and 40× magnification power for DLBCL.
Discussion
One of the most frequently used methods for quantifying PD-L1 expression in tumor cells is IHC. However, since the assessment of positivity levels may be subjective, it might be challenging to accurately characterize a positive result which is crucially needed since it predicts a patient's response to anti-PD1/PD-L1 therapies in a variety of cancers.
Two tissue types were scored for PD-L1 positivity in this study namely: DLBCL and HNSCC. In DLBCL, no significant difference was found between conventional and ImageJ scores obtained using the options RGB or brightness/contrast. On the other hand, ImageJ faced some challenges in analyzing HNSCC because of tissue heterogenicity since it has a variety of different cell types where a significant difference was found between the conventional method and ImageJ scores using RGB or HSB but not the brightness/contrast option. In another study using QuPath software, scoring of HNSCC specimens indicated that there was no significant difference between CPS conventional (manual method) and semi-automated image analysis suggesting that semi-automated image analysis could possibly replace a manually determined score in treatment decision-making. 10 A possible explanation for the difference in outcomes between our study and their study is that WSIs were automatically annotated by trained pathologists as manual intervention with region of interest selection in QuPath so that they were capable of detecting tumor tissue of HNSCC and avoid and removing defragmented, gaps, visual and cognitive traps.
An advantage of using semi-automated image analysis scoring over conventional scoring is that it measures DAB color staining in the entire annotated tumor region as a continuous variable with a single density score in comparison to pathologists scoring which provides only a visual estimate of PD-L1 expression in the same tumor region.21–23 Another study suggested that the automated image analysis of the Dako 22C3 IHC assay yielded slightly lower percentages of positive cells compared with conventional scores. 24 This agrees with our results for DLBCL using the HSB method and for HNSCC using both RGB and HSB methods.
In our study, ImageJ scoring using microscopic images taken using different magnification powers was compared to check if changing the magnification power would affect the results. Using a higher magnification, cells appear larger and more distinct. However, it is much harder to determine the overall area percentage of a specimen stained with DAB since it is not representative of the entire sample. On the other hand, lower magnification resulted in less distinguishable image details while larger parts of the specimen were viewed. In our study, a significant difference in ImageJ scores using objective lenses 40 × and 20 × was detected despite the presence of a positive correlation between the scores obtained by the 2 magnification powers (r = .9891, P < .0001). Objective lens 20 × scores were slightly higher than expected as the number of positive cells/fields was quantitatively larger. In other studies, scoring under higher magnification was preferred in evaluating weakly stained tissue in non-small cell lung cancer (NSCLC) 25 and gastric cancer specimens. 26
Scanner WSIs are not affordable by all facilities. Accordingly, they are replaced with camera-captured images where the quality of the image depends on the camera's type. According to our results, a significant difference between camera-captured images’ scores and conventional scores was observed with a strong positive correlation coefficient (r = .9902, P < .0001). Likewise, a significant difference between scanner images’ scores and conventional scores was observed with a strong positive correlation coefficient (r = .9899, p < .0001).
In the camera-captured image, an average of scores of 5 nonoverlapping fields was taken. However, this is difficult to achieve in the case of small biopsies or specimens having sectioning errors, such as folding, cracking or edge effect tissue specimens, etc. In our study, when areas of tissue folding or artifacts were excluded, a smaller number of fields resulted. However, for an effective CPS score, a minimum of 100 viable tumor cells must be present in the section being evaluated for the specimen to be considered adequate for PD-L1 scoring. 27
According to our results, a significant difference between scores of camera-captured images of 3 fields and 5 fields was observed. Therefore, the average of at least 5 fields must be used to obtain a reliable result representative of the specimen.
Different analysis methods available in ImageJ were used in our study for the same specimen to detect the method that gives the closest results to the conventional scores. No significant difference was observed between the conventional score and ImageJ RGB or brightness/contrast scores for DLBCL specimens. Similar results were obtained by another study on PD-L1 staining in melanoma specimens with no significant difference between conventional and semi-automated methods. 17 However for HNSCC only the brightness/contrast method was not significantly different from the manual method. Another study showed a significant difference between semi-automated image analysis scoring using an analysis known as AQUA (automated quantitative analysis) and conventional scoring in NSCLC. This was attributed to the inability of software to distinguish between cytosol, nuclear, and membrane staining. 28
Automated scoring of PD-L1 was reported also using different software other than ImageJ. Using QuPath, no significant difference was found between the automated image analysis scoring algorithm and pathologist HNSCC CPS scores 10 or in urothelial carcinomas. 21 Similarly, no significant difference between the automated FDA-cleared Aperio Imagescope IHC Membrane Image Analysis software scoring algorithm and pathologist scoring in gastric cancer was found. 26 Likewise, no significant difference between the automated image analysis scoring algorithm and pathologist scores in pancreatic cancer was obtained using PyRadiomics software. 29 On the other hand, more precise results were obtained in automated image analysis than pathologist scores for lung squamous cell carcinoma using Res50-UNet and MicroNet. 22
Upon comparison of the different imaging procedures, scores obtained using conventional and ImageJ Brightness showed no significant difference in both DLBCL and HNSCC specimens. Contrast/brightness is a part of the preprocessing phase which alters image pixel intensity to remove background signal and improve signal-to-noise ratio, especially in blurred images. However, when selecting a region of interest, the counterstain (hematoxylin) needs to produce enough contrast to stain the tissues without interfering with the chromogen precipitate. When adjusting Contrast/Brightness, care is needed to adjust the image to appear vibrant and sharper, and enhance the distinction between light and dark areas. Also, enhancing brightness/contrast by adjusting the threshold is needed in order to alter image pixel intensity, to remove background signal, and improve signal-to-noise ratio, especially in the highly heterogeneous tissues.22,26,30 The importance of adjusting brightness was pointed out in a study that compared scores obtained using different PD-L1 antibodies that come in a different form, that is, membranous, nuclear, or cytoplasmic where the strongest correlation with semi-automated image analysis scores was observed in the cytoplasmic domain due to PD-L1 intensity as darker staining that allowed for easier quantification. 31
Specimen scoring obtained by conventional and semi-automated image analysis HSB option showed significant differences in both types of cancers studied especially if scores were high. HSB is a color module for image color representation in semi-automated image analysis. Also known as HSL (hue, saturation, lightness) or HSV (hue, saturation, value) where hue represents the wavelength (0-360) which determines color, saturation is seen as a percentage (0%: complete grey-100%: most vibrant color) that determine color purity or intensity, while brightness represents overall color lightness or darkness.
Conclusion
Semi-automated image analysis using ImageJ could possibly replace a manually determined score for homogeneous tissue such as DLBCL. Camera-captured images can be used instead of WSI scanner images provided that at least 5 fields were used for scoring. In the case of small biopsy specimens where only 2 to 3 fields are available for examination, accurate results would be difficult to obtain and will not give a true representation of the specimen. If highly heterogeneous tissue, tissue peeling, tissue folding, and edge effect are present, it is advised to use the conventional method rather than ImageJ scoring. In the case of nonspecific and weakly stained images, these challenges can be improved through adjustment, and enhancement in ImageJ to achieve applicable results.
Future studies are needed to compare the ImageJ scoring of PD-L1 expression with other softwares (such as QuPath/MATLAB, etc). Also comparing the scoring of PD-L1 expression using different PD-L1 antibodies and different tissues is required.
Supplemental Material
sj-docx-1-tct-10.1177_15330338241242635 - Supplemental material for Correlation Between ImageJ and Conventional Manual Scoring Methods for Programmed Death-Ligand 1 Immuno-Histochemically Stained Sections
Supplemental material, sj-docx-1-tct-10.1177_15330338241242635 for Correlation Between ImageJ and Conventional Manual Scoring Methods for Programmed Death-Ligand 1 Immuno-Histochemically Stained Sections by Rand Suleiman Al Taher, Manal A. Abbas, Khalid Halahleh and Maher A. Sughayer in Technology in Cancer Research & Treatment
Footnotes
Acknowledgments
The authors would like to thank Esraa Al-Khateeb and Abdallah Bader for their help in specimen collection.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical Approval
The present study was approved by the Institutional Review Board (IRB) of the King Hussein Cancer Center (KHCC) (IRB # 21 KHCC 165 for head and neck squamous cell carcinoma (HNSCC) and IRB # 21 KHCC 074 for DLBCL specimens). All specimens used in the present study were archived paraffin-embedded tumor tissue specimens with no direct patient identifier.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
