Abstract
Background/Aims
We developed an observer disfigurement severity scale for neurofibroma-related plexiform neurofibromas to assess change in plexiform neurofibroma–related disfigurement and evaluated its feasibility, reliability, and validity.
Methods
Twenty-eight raters, divided into four cohorts based on neurofibromatosis type 1 familiarity and clinical experience, were shown photographs of children in a clinical trial (NCT01362803) at baseline and 1 year on selumetinib treatment for plexiform neurofibromas (
Results
Mean baseline disfigurement severity score for plexiform neurofibroma ratings were similar for the selumetinib group (6.23) and controls (6.38). Mean paired differences between pre- and on-treatment ratings was −1.01 (less disfigurement) in the selumetinib group and 0.09 in the control (
Conclusion
This study demonstrates that our observer-rated disfigurement severity score for plexiform neurofibroma was feasible, reliable, and documented improvement in disfigurement in participants with plexiform neurofibroma shrinkage. Prospective studies in larger samples are needed to validate this scale further.
Background/aims
Neurofibromatosis type 1 (NF1) is a genetic disorder characterized by diverse, progressive cutaneous, neurological, skeletal, and neoplastic manifestations.1,2 Patients with NF1 have an increased risk of developing nervous system tumors including plexiform neurofibromas (PN) (20%–50%), optic pathway gliomas (15%–20%), and malignant peripheral nerve sheath tumors (8%–13%).3–5 PNs are histologically benign nerve sheath tumors that can cause significant morbidity and grow most rapidly in young children.6–8 Disfigurement, defined as a visible and negative alteration in appearance resulting from disruption of skin, soft tissue, or bony structures, is one of the most common PN-related symptoms. 9 In one review of 59 children and young adults with NF1 and PNs, 43 (73%) were reported to have PN-related disfigurement. 10 PN-related disfigurement varies in location and degree, worsens over time as tumors grow, and has the potential to pose formidable obstacles to the medical and psychosocial well-being of patients.11–16
Until recently, surgery was the only potential standard treatment of PNs. However, many PNs are inoperable due to their infiltrative nature. 17 The phase 2 trial of the mitogen-activated protein kinase (MEK) 1/2 inhibitor selumetinib for children with NF1 and inoperable PNs with clinically significant PN-related morbidity (SPRINT, NCT01362803) showed confirmed partial response (≥20% decrease in PN volume) on selumetinib in 68% of children (34 of 50), with most of the responsive tumors decreasing in size by 20%–50% from baseline. 18 Since the publication of the SPRINT trial, other MEK inhibitors (e.g. binimetinib, trametinib, mirdametinib) have shown similar efficacy in shrinking PNs, as has the oral tyrosine kinase inhibitor cabozantinib, with additional trials underway or planned.19–22
Given that current treatment options result in only moderate decreases in PN volume, assessment for clinical benefit of medical interventions is critical. The SPRINT trial prospectively utilized patient-reported and functional assessments to show that patients had clinically meaningful improvements beyond tumor shrinkage. 18 These assessments were critical in the Food and Drug Administration (FDA) approval of selumetinib as the first medication for the treatment of inoperable, symptomatic PNs in children with NF1. 23 One of these measures, the Global Impression of Change scale, measured perceived changes with treatment compared to baseline. 24 The quantitative results on an item of the Global Impression of Change indicated that the mean child-reported and parent-reported scores for the child’s “tumor-related problems other than pain” were “much improved” after approximately 1 year on selumetinib. In the comment section of the Global Impression of Change about changes they observed while on treatment, 56% of parents and 38% of children described improved appearance of their PNs. 18 Although 88% of SPRINT participants had PN-related disfigurement, the study was unable to directly assess for objective changes in this morbidity, as there is no standardized or validated method for doing so in people with NF1. 18
Another population where disfigurement causes significant morbidity is in patients with head and neck cancer. 9 A reliable, validated nine-point disfigurement scale was created for observers to assess the degree of disfigurement in these patients. This scale had high inter-rater reliability, and observer ratings of disfigurement were significantly correlated with patients’ ratings of their own disfigurement. 9 We modified this scale to develop an observer-rated disfigurement severity score for plexiform neurofibroma (DSS-PN) in NF1.
The objectives of this study were to assess the initial feasibility, reliability, and validity of the DSS-PN. In addition, by testing the DSS-PN on longitudinal photographs of participants with PNs from the SPRINT study, we evaluated if selumetinib treatment was associated with improvement in PN-related disfigurement.
Methods
Data were collected through an Institutional Review Board (IRB) approved protocol (NCT04879160), and participants were required to understand and be willing to sign a written assent or informed consent document, as applicable, prior to enrollment. Raters were required to review and sign a confidentiality form prior to enrollment.
Selection of participants with PNs
Twenty of the 50 children with NF1 and PN-related clinical morbidity enrolled on the SPRINT (NCT01362803) phase 2 study were included based on the process shown in Figure 1. With the addition of four control participants with NF1 not on treatment and obtained from the National Cancer Institute (NCI) natural history study for PNs (NCT00924196), raters were asked to review and rate photographs of a total of 24 children and young adults with PNs. Details on photograph selection and editing are presented in Supplemental Appendix S1.

Flow diagram for inclusion of photographs of SPRINT participants.
Rater selection
Raters were divided equally into four cohorts based on medical training background and NF1 familiarity. Cohort 1A were clinicians with NF1 familiarity, cohort 1B were clinicians without NF1 familiarity, cohort 2A were non-clinicians with NF1 familiarity, and cohort 2B were non-clinicians without NF1 familiarity. Clinicians were defined as those professionally involved with direct patient care who have a career in the medical field (included physicians, nurses, advanced practice practitioners, and occupational/physical/speech therapists). NF1 familiarity for non-clinicians was defined as those who either have NF1 themselves, have a first- or second-degree relative with NF1, or are otherwise closely associated with a person with NF1. NF1 familiarity for clinicians was defined as those who self-identify as having a clinical practice which routinely includes patients with NF1. A full list of eligibility criteria is presented in Supplemental Appendix S2.
Recruitment
Clinician and non-clinician raters familiar with NF1 were identified through NF1-related professional and advocacy organizations. Clinician and non-clinician raters without familiarity of NF1 were identified by recruitment advertisements posted on the National Institutes of Health (NIH) campus, website, and social media platforms through the NIH recruitment office or by word of mouth.
Measure
The DSS-PN is a scale with numbers from 0 to 10, with 0 being “not at all disfigured” and 10 being “very disfigured.” Raters assess the severity of disfigurement from photographs to produce an overall disfigurement severity score. The study team, with input from people with NF1 and experts in patient-reported outcome measure development, modeled the DSS-PN after the validated nine-point disfigurement scale developed for patients with head and neck cancer. 9 The scale was modified to an 11-point scale to be consistent with the Numeric Rating Scale-11, a measure often used to rate pain intensity and other domains.25,26
In addition to the DSS-PN, there were two questions that the raters answered for each of the slides they were shown. The first asked which photo set per participant had worse disfigurement. The second asked raters to rank the factors that contributed most to a change in disfigurement score if one was present: area (size of affected body region), distortion (change in normal body shape), or discoloration (change in normal skin tone).
Rating procedure
Prior to the session, the raters were sent a form to complete that included questions on demographics, NF1 familiarity, and employment (Supplemental Appendix S3). Each rater completed the study via a secure virtual teleconference with the same study investigator. The same instructions were provided to each of the 28 raters via a standardized script (Supplemental Appendix S4). Raters first were provided sample photographs showing participants with PNs with a varied range of disfigurement and associated ratings (as assessed by the study team) to provide anchors and demonstrate the rating process (Supplemental Appendix S5). Raters also were provided a copy of the DSS-PN questionnaire (Supplemental Appendix S6). For each participant, a PowerPoint™ slide with a set of pictures was presented to the raters (Figure 2). The order of baseline and 1-year pictures and the order of the slides for each rater were randomized. The raters were blinded to which participants were treated and which were control. The raters were given a minimum of 10 s to look at each slide, and then, with photographs still visible, they answered all the questions for that participant. After completing all slides, the raters were asked to score the ease to understand and ease to complete the questionnaire on a 1–5 scale (1 = very easy, 5 = very hard) and to provide additional qualitative feedback (Supplemental Appendix S7). The study investigator (whose face was not visible to the rater) recorded all responses electronically. The entire procedure was completed by each rater within a 1-h session. To assess for intra-rater reliability, a repeat rating session was conducted 4 months after the initial evaluation. A randomly selected subset of ten of the same SPRINT photographs and two of the same control photographs previously used was evaluated by a randomly selected subset of five raters from each cohort.

Example photographs of SPRINT participants with plexiform neurofibroma.
Statistical analysis
For each participant’s photograph, both the mean of the raters’ scores at each timepoint and the change in scores were calculated as the paired difference for each participant between mean score at baseline and mean score at 1 year. For controls, as both sets of photos were of the same timepoint, set B was arbitrarily assigned as “baseline” and set A as “1 year” for analysis.
To assess inter-rater reliability, weighted kappa statistics were compared between pairs of cohorts using a Wilcoxon rank sum test. To assess for intra-rater reliability, the results of an individual raters’ scores from the first and second rating sessions were compared. The difference between the absolute scores at each participant’s timepoint (baseline and 1 year) and the difference between the baseline and 1 year score at the two rating sessions were compared using a Wilcoxon signed-rank test.
To assess validity, we primarily analyzed if an association exists between achieving a partial response to selumetinib (>20% decrease in PN volume) and having an improvement in PN-related disfigurement using a Wilcoxon rank sum test. To determine if selumetinib treatment is associated with improvement in PN-related disfigurement from baseline to 1 year, a Wilcoxon signed-rank test was used.
For raters, the two clinical cohorts’, two non-clinical cohorts’, two NF1 familiar cohorts’, and two non-NF1 familiar cohorts’ mean scores were compared using a Wilcoxon rank sum test. In addition to the formal statistical comparisons described above, descriptive statistics were used to summarize the data throughout. Additional statistical analyses are presented in Supplemental Appendix S8.
Results
Rater characteristics
A total of 28 raters (median age = 35.5 years; range = 29–68) were enrolled in the study. There were more female than male raters (n = 17, 61%), with a mix of races (Table 1).
Characteristics of raters and participants with plexiform neurofibroma.
Gender is reported for raters, and biological sex is reported for participants.
Participants with PN characteristics
Twenty participants from SPRINT (median age = 9.8 years; range = 5.5–17.4) and four from the NF1 natural history study (median age = 21.8 years; range = 15.8–24.5) were enrolled. Participants had a median baseline tumor volume of 617 mL (range = 17.3–3314) with various locations of their target PNs. Some participants had visible characteristics overlying their PNs, including acne, discoloration, scarring, or scoliosis (Supplemental Appendix S9). Most SPRINT participants (
Disfigurement ratings
For the SPRINT participants, of the 560 rater/photo pairs, 91.6% of the time raters perceived the 1-year photo to have either less disfigurement (75.2%) or the same degree of disfigurement (16.4%) than the baseline photo, with only 8% rating the on-treatment photograph as having more disfigurement than the baseline. The mean of the raters’ baseline scores ranged from 2.50 to 9.39. Mean baseline scores for the treatment group and control group were 6.23 and 6.38, respectively. In the treatment group, ratings indicated that most participants (

Disfigurement ratings of participants.
Validity of disfigurement rating scale
There was a meaningful difference in the change in DSS-PN ratings between those in the treatment group (mean change = −1.01 points, 95% confidence interval = −1.31 to −0.72) compared to those in the control group (mean change = 0.09 points; 95% confidence interval = −0.31 to 0.49) (

Change in disfigurement score in relation to tumor volume percent change.
Reliability of rater scores
Within each of the four cohorts of raters, there was moderate to substantial agreement among the raters (weighted kappa statistics of 0.46, 0.52, 0.53, 0.66; scores between 0.40 and 0.80 can be interpreted as indicating moderate to substantial agreement) (Table 2). Overall, there was no meaningful difference in the reliability of the ratings of non-clinicians with and without NF1 familiarity (weighted kappa statistics of 0.53 and 0.52;
Comparison of rater cohorts.
NF1: neurofibromatosis type 1.
There also was agreement by each of 20 raters who repeated the session several months later with the same photographs. In 79.3% of instances, the second DSS-PN ratings agreed with their previous scores within one point, and only 3.8% differed by >3 points. When looking at the change in scores for an individual patient pre- and post-treatment by each rater, in 89.6% of instances, there was agreement within one point between the two sessions by that rater, and only 0.42% differed by >3 points. In addition, within each rater cohort, the distribution of the mean difference between rater sessions for the changes in ratings from baseline to 1 year on treatment did not differ from zero for the set of 10 selumetinib participants re-evaluated. This result means that the same degree of change in disfigurement was scored at both rating sessions (all differences of changes between 0.10 and 0.16, with
Comparison of rater cohorts
All four groups consistently identified disfigurement in the photos (Supplemental Appendix S10), in addition to some degree of improvement in the disfigurement of the treated participants after 1 year (Table 2). When comparing scores between specific cohorts, there was no difference in the overall change scores between clinicians with and without NF1 familiarity (
Association of DSS-PN ratings with rater and participant variables
The change in scores after 1 year of treatment was found to have a moderately strong correlation with baseline age of participant with PNs (
Feasibility of questionnaire
Twenty-seven of the 28 raters (96%) found the scale to be either very easy or somewhat easy to understand. In addition, 64% found the scale to be either very easy or somewhat easy to complete (Supplemental Appendix S12). One common challenge for completion was the perceived overlap of the definitions of “area” and “distortion” when ranking factors that affected disfigurement. In addition, raters found that small changes in zoom and lighting between two sets of photographs also made direct comparison difficult.
Conclusion
While recent advancements have led to the development of treatments that shrink PNs in children with NF1, it is critical to assess whether tumor shrinkage results in clinical benefit for patients. 18 In this study, we developed the first observer disfigurement rating scale for PNs in people with NF1.
Our findings demonstrate that our 11-point DSS-PN for symptomatic inoperable pediatric PNs has moderately to substantially reliable results within raters with similar NF1 and clinical experience levels. In addition, in over 75% of ratings, the on-treatment photograph was identified as having less disfigurement than the baseline by raters who were blinded to the timepoint of the photos. The degree of improvement in DSS-PN rating was moderately well correlated with tumor volumetric responses to treatment with selumetinib. Together, these findings indicate that improvement in disfigurement with treatment is detectable by an observer using this scale and suggest that the measure may be valid for this indication. The scale also was found to be easy to understand and complete.
The DSS-PN had moderate or substantial inter-rater reliability within each four external rater cohorts. However, there were important differences between scores from different rater cohorts. Clinicians tended to rate disfigurement severity higher when compared to non-clinicians and indicated a larger difference in DSS-PN ratings between baseline and 1 year on treatment than non-clinicians, implying that clinicians may be able to detect a change in appearance more sensitively, or that non-clinicians expect more change to perceive an improvement in disfigurement. Also, within clinicians, those non-NF1 familiar were more consistent in their ratings, potentially due to a lack of external experience affecting their scores, and having scores be more solely focused on the anchor photographs provided. These results highlight the fact that in any future study utilizing this scale, it will be important for the external raters to have a consistent level of clinical experience and NF1 familiarity. The scale also had relatively good intra-rater reliability, which also supports its potential utility as a measure in future clinical trials.
Across all raters, the main factor that contributed to a change in DSS-PN ratings over time was a change in the degree of distortion in normal body shape. The only physical factor other than the PNs that seemed to impact DSS-PN ratings was scoliosis, with the two participants with scoliosis having worse absolute DSS-PN ratings at pre-cycle 13 than the other participants, indicating that this scale may not be able to differentiate between disfigurement caused by scoliosis or PNs.
There was a moderately strong correlation between improvement in DSS-PN rating to percentage tumor shrinkage, indicating that this measure may be used to detect tumor change with treatment in patients with this morbidity. In addition, the fact that raters were able to successfully distinguish between the control participants and those who received treatment also support the ability of this measure to identify differences in disfigurement in the setting of a clinical trial. In contrast, there was no association between the change in DSS-PN rating and whether participants or their caregivers specifically commented on a change of appearance in the Global Impression of Change scale. However, since the Global Impression of Change did not directly solicit feedback about disfigurement from all participants, it is difficult to interpret this lack of correlation. This finding suggests the need to also include a systematic patient-reported assessment of disfigurement in PN trials.
Overall, these data indicate that selumetinib treatment, which previous studies have shown leads to magnetic resonance imaging (MRI) volumetric response of PNs and improvements in pain and quality of life, also results in an observable improvement in PN-related disfigurement.18,27 This change in score is moderately well correlated with age enrolled on treatment, with younger children having a larger change in DSS-PN rating, indicating that younger children may benefit more from treatment in terms of improvement in disfigurement than older ones. As disfigurement is noted to be one of the most common morbidities of PNs, these findings could have implications regarding clinical benefit in future therapies of this disease.
Limitations of the study included the lack of standardization of the lighting and distance from the participants with PNs at different timepoints, as the photographs came from multiple clinical trial sites. While all attempts were made to standardize all photographs, there were times that certain aspects of the photo were not able to be aligned. In addition, in some younger participants, it was relatively apparent that they had aged between the photographs, which may have biased the results for these individuals. As MEK inhibitors are known to lighten hair color, those who are familiar with this drug class may have been able to identify which facial photos were taken 1 year into treatment, which also could have biased results. For future prospective studies, finding ways to better de-identify these participants would be beneficial.
As most participants with PNs were Caucasian, we could not evaluate if skin color could affect the perceived change in disfigurement. In addition, none of the participants on the selumetinib trial had tumor progression at 1 year; therefore, we could not assess whether our scale can detect worsening rather than improvement in disfigurement over time. However, the fact that the raters were blinded to which photographs represented the baseline and on-treatment timepoints but were, in general, able to successfully identify improvement with treatment does indicate that the scale may be sensitive to detect change with treatment in this population. Additional prospective studies are needed to assess changes in disfigurement over time for patients both on and off treatment.
Many raters also found it difficult to distinguish between “area affected” by the PNs and “distortion of normal body shape” by the PNs as the reason for change in DSS-PN rating. Therefore, in the next iteration of this measure, these two concepts will not be separated. The study additionally would have been enhanced by having better matched controls to the study participants in age and by having a larger number of controls. Although we recognize that PN tends to grow faster in younger patients, closer age matching was not feasible as we specifically selected control participants who were old enough to provide their own consent. However, controls were otherwise similar in sex, tumor volume, and PN location and were found to have similar baseline DSS-PN ratings to the treated participants.
Although the ability to objectively and externally rate disfigurement has been shown to be important in other conditions, ultimately those best able to assess their tumor-related disfigurement are the patients and caregivers themselves. There is not currently a standardized patient-reported outcome measure for disfigurement in these participants against which these results could be compared, although such scales are in development. If such a rating scale were developed, future studies comparing these results against those from our measure would be important for validating the clinical meaningfulness of this tool.
In conclusion, our data show that this 11-point observer DSS-PN for pediatric patients with NF1-related PNs was feasible for use across all groups and had moderate to substantial inter- and intra-rater reliability within rater cohorts. In addition, these results indicate that there was a measurable improvement in disfigurement after 1 year of treatment with selumetinib. Additional studies will be needed to validate the clinical significance of this measurable change, including using the DSS-PN in future PN studies to assess changes in disfigurement for patients on other PN-directed therapies.
Supplemental Material
sj-docx-1-ctj-10.1177_17407745231206402 – Supplemental material for Development and pilot validation of a novel disfigurement severity scale for plexiform neurofibromas in children with neurofibromatosis type 1
Supplemental material, sj-docx-1-ctj-10.1177_17407745231206402 for Development and pilot validation of a novel disfigurement severity scale for plexiform neurofibromas in children with neurofibromatosis type 1 by Liny John, Gurbani Singh, Eva Dombi, Pamela L Wolters, Staci Martin, Andrea Baldwin, Seth M Steinberg, Jessica Bernstein, Patricia Whitcomb, Dominique C Pichard, Anne Dufek, Andy Gillespie, Kara Heisey, Miriam Bornhorst, Michael J Fisher, Brian D Weiss, AeRang Kim, Brigitte C Widemann and Andrea M Gross in Clinical Trials
Footnotes
Acknowledgements
The authors sincerely thank the participants with NF1 and raters who participated in our study. The authors like to thank the members of the Response Evaluation in Neurofibromatosis and Schwannomatosis (REiNS) working group for providing patient and caregiver insight as we developed our disfigurement questionnaire. In addition, the authors acknowledge Dr Russell Hall and Dr Susan Halabi from Duke University for their valuable suggestions into the design and analysis of this study.
Author contributions
L.J., G.S., E.D., P.L.W., S.M., M.B., M.J.F., B.D.W., A.R. K., B.C.W., and A.M.G. contributed to conceptualization. L.J., A.G., and A.M.G. contributed to data curation. L.J., S.M.S., and A.M.G. contributed to formal analysis. L.J. and A.M.G. contributed to investigation. L.J., G.S., E.D., P.L.W., S.M., B.C.W., and A.M.G. contributed to methodology. L.J., G.S., J.B., and A.M.G. contributed to resources. B.C.W. and A.M.G. contributed to supervision. L.J., S.M.S., and A.M.G. contributed to visualization. L.J., S.M.S., and A.M.G. contributed to writing—original draft. L.J., G.S., E.D., P.L.W., S.M., A.B., S.M.S., J.B., P.W., D.C.P., A.D., A.G., K.H., M.B., M.J.F., B.D.W., A.R.K., B.C.W., and A.M.G. contributed to writing—review and editing.
Declaration of conflicting interests
Dr B.C.W. and Dr A.M.G. have unpaid advisory roles for AstraZeneca and SpringWorks. Dr M.J.F. has paid advisory roles for AstraZeneca and SpringWorks and research support from AstraZeneca, Array BioPharma, and Exelixis for plexiform neurofibroma clinical trials. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results. Dr M. B. has a paid advisory role on the external advisory board for the Koselugo Registry through Alexion.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Intramural Research Program of the National Institutes of Health; the Center for Cancer Research, National Cancer Institute (NCI).
Ethical approval
The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of the National Cancers Institute (protocol code NCT04879160, approved 19 April 2022).
Informed consent
Informed consent was obtained from all participants involved in the study. For those participants whose photographs were included in this publication, written informed consent to publish this paper was obtained from the participant (or their legal guardians when applicable).
Trial registration
Clinicaltrials.gov registration: CT04879160.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
