Abstract
Keywords
Introduction
Lymphoedema, also known as chronic oedema, is a fluctuating and progressive condition with poor health outcomes if left unmanaged (Sung et al., 2022). Lymphoedema is widespread (Quere et al., 2019) and under-recognised (Keast et al., 2015), with prevalence conservatively estimated at 4:1000 (Moffatt et al., 2017). Not only does the region affected increase in size as a result of the accumulated lymphatic fluid, but local tissue undergoes extensive and progressive architectural changes, including inflammation, adipose tissue deposition, dermal thickening and the development of fibrosis (Azhar et al., 2020). This myriad of changes are both seen (dermal changes) and unseen (subcutaneous changes). The various effects necessitate a thorough assessment of a range of outcomes at each clinic visit to estimate treatment response and progression (Shaitelman et al., 2015).
Observation (Sierla et al., 2020b) and interviews (Sierla et al., 2020a) with lymphoedema therapists revealed significant variance in how lymphoedema-related changes are assessed, including variability in outcomes and outcome measures selected, measurement protocols followed, methods of analysis, and descriptors used to report change in upper limb lymphoedema. A reliance on subjective assessment, primarily observed and palpated changes, in the absence of objective measures, was evident (Sierla et al., 2020b). This is problematic for the comparison of progress between appointments due to limitations in the therapists’ and patients’ ability to recall how the lymphoedema ‘looked’ and ‘felt’ (Czerniec et al., 2010; Powers et al., 2017; Sierla et al., 2020a). Without objective measures, the ability to reliably evaluate progression of lymphoedema and/or treatment effect is limited, and without uniformity in how change is reported, the ability to compare results is also limited (Finnane et al., 2015).
Investigation into the breadth of outcome measurements reported in the literature (Sierla et al., 2018) and observation of therapists treating lymphoedema (Sierla et al., 2020b) revealed five commonly reported lymphoedema-specific assessment domains: (i) size, by volume or circumference; (ii) extracellular fluid volume, by bioimpedance spectroscopy (BIS); (iii) observed changes; (iv) palpated changes and (v) symptoms as reported by the patient. This study sought consensus on how clinician-reported domains are reported.
In addition to the lymphoedema-specific assessment domains reported in the literature, observation of lymphoedema therapists revealed additional domains frequently assessed during routine care (Sierla et al., 2020b). Examples of these additional domains include height, weight, cellulitis history and arm dominance. These additional domains may be relevant because they (i) contribute to the risk analysis related to lymphoedema development and progression such as surgical history, or number of episodes of cellulitis; or (ii) contribute to data analysis, such as the relationship between arm dominance or body mass index (BMI) and circumference measurement differences.
A strategy to shift the field towards a more standardised approach is to seek consensus on a uniform dataset to capture and report change in upper limb lymphoedema and collect this dataset within routine care. Digital transformation provides a window to attain more uniformity in practice by guiding workflow and prescribing what and how data are entered (EXPH, 2019; Laka et al., 2022). Unfortunately, electronic medical record systems and clinic management systems have not resolved some of the specific challenges for the field of lymphoedema, particularly the collection and analysis of circumference measurements.
Circumference measurements are the most commonly used outcome measure by therapists and those researching lymphoedema (Sierla et al., 2018, 2020b). The current practice of collecting and analysing circumference data using a paper record with mental arithmetic to investigate interlimb difference is time consuming, prone to error, and encourages data comparison between current and the most recent appointment, rather than over time (Sierla et al., 2020a, 2020b).Transferring the storage of circumference measures from paper to a digital system can enhance the utility of the raw circumference data through data processing and display. For example, interlimb difference can be calculated and auto-populated, a mathematical conversion to volume or interlimb ratio can be automated, or the interlimb ratio could be graphed. It is expected these efficiencies and the value imparted by data visualisation would incentivise uptake of the system as a whole.
Finally, data quality is an important consideration with an agreed uniform approach and common language contributing to the completeness, comprehensiveness and consistency of the dataset. A uniform approach to data collection at point of care is particularly helpful when a patient moves between therapists, and for the potential to leverage data collection at the point of care for service evaluation and research (Ehsani-Moghaddam et al., 2021; Kilkenny and Robinson, 2018). However, a uniform approach requires consensus on how change is measured and reported. This investigation was to attain this consensus.
The aim of this study was to identify a comprehensive dataset for reporting change in upper limb lymphoedema, agreed by therapists and researchers specialising in the field of lymphoedema, as a fundamental step in creating a clinical support system.
Methods
A two-stage process was used. Stage 1 involved synthesis of the dataset to propose for the modified Delphi study while Stage 2 was the modified Delphi study. The Delphi technique is an iterative survey technique in which a group of experts are asked to anonymously respond to a series of questionnaires. Delphi uses a group communication process, seeking convergence of opinion for a specific question (Hsu and Sandford, 2007), drawing on the stability of group opinion rather than on individuals’ opinions (Ziglio and Adler, 1996). The Delphi process is often used for content generation. However, here a modified Delphi process was used in which the initial dataset was gathered from prior investigations (Sierla et al., 2018, 2020a, 2020b).
Stage 1: Synthesis of clinician-reported outcomes from prior studies
The breadth of outcomes used to report change in upper limb lymphoedema was captured in a systematic review (Sierla et al., 2018) and qualitative studies, both observational and involving semi-structured interviews with therapists treating lymphoedema (Sierla et al., 2020a, 2020b). These items were combined under four primary domains: size; extracellular fluid volume; observed change; and palpated change. Combined, there were 25 different objective outcome measures and 42 descriptors for changes observed or palpated. Although there was crossover between the outcomes captured in the literature and those in the clinic, there was still an unwieldy number of outcome measures to consider. This dataset was not only too large to build into a clinical support system, but also too large to propose in a questionnaire.
As some outcomes or descriptors were only reported once or twice in the prior studies, an analysis of frequency of use was undertaken to narrow the dataset. A pragmatic decision was made to include only those items reported by at least three sources (see Box 1). This frequency was selected to exclude single site observations. An outcome was included if it was used by at least three therapists in the observational study, or three articles in the systematic review, or a combination of these. Items not specific to lymphoedema were similarly required to be reported by a minimum of three sources for inclusion (see Box 1). The resultant dataset provided the basis for Round 1 of the modified Delphi study.
Refining the dataset for consideration in the Delphi study (from prior studies).
Stage 2: The modified Delphi study
Participants
A purposive sample of participants (n = 70) was selected. These experts were English-speaking senior lymphoedema therapists and/or researchers undertaking research in the area of lymphoedema. Diversity of experience was sought with the intent to explore any new outcomes or terms that may have been missed in the prior studies. Twenty-one participants were identified from publications of their lymphoedema-specific research to ensure the use of data for research purposes was a consideration. Professional representation was also a consideration with participants invited from nursing and medicine, occupational therapy and physiotherapy professions, with contacts obtained from the National Lymphoedema Practitioner Register. A minimum of 5 years’ experience was necessary for inclusion. Questionnaires were sent to national (Australia; n = 49) and international (n = 21) experts with invitees from 10 nations.
Modified Delphi protocol
Participants invited to take part in the Delphi process by email (n = 70) were given 2 weeks to complete the questionnaire. At the end of this time, those who had not responded were sent a reminder and provided a 1-week extension to complete the questionnaire. The responses were analysed, and the Round 2 questionnaire was developed in the 2-week period following the closure of the first questionnaire. The Round 2 questionnaire was sent to those who provided a full response in Round 1 (n = 43). The same process used in Round 1 was followed. The questionnaire was collected and managed using REDCap electronic data capture tools (Harris et al., 2009), hosted at The University of Sydney.
Modified Delphi – Round 1
In Round 1, demographic and professional background data of respondents were collected. Respondents were asked to consider each item of the proposed dataset independently on a five-point Likert Scale, from (1) not required to (5) essential for the record (Stone et al., 2018). The threshold for consensus was set low with an a priori decision that consensus was attained when 70% of respondents rated an item as three or above on the Likert Scale. Where there was less than 30% rating an item at three or above, the item was removed from the dataset. Any item falling between these thresholds was to be re-presented in Round 2. Consensus was sought for: (i) objective and subjective outcome measures for inclusion; (ii) how objective outcome measures are reported and displayed; and (iii) descriptors used to represent subjective changes. To ensure nothing was missed, respondents in Round 1 were asked to identify in free text fields any items or descriptors that they felt were missing. These were added to the dataset for consideration in Round 2.
Modified Delphi – Round 2
A second questionnaire (Round 2) was sent to all respondents from Round 1 (n = 43). This questionnaire included the additional items proposed by participants in Round 1. A closed-ended yes or no response was requested for any new items proposed and 70% consensus was necessary for these items to be included in the final dataset. This simplified strategy was implemented as the strength of consensus was such that the first round was also dichotomised (i.e. no items were re-presented in Round 2 – all were either confirmed by >70% or excluded by <30%).
In addition to confirming the necessary outcome measures and descriptors, the Round 2 questionnaire asked the preferred method for reporting and displaying circumference data to represent interlimb size differences or changes over time. Options for reporting included interlimb circumference differences summed, interlimb difference expressed as a percentage or ratio, single limb volume and interlimb volume difference, which are all frequently used to report changes in size or volume in the literature. Visual examples of how these changes in size might be displayed (from interval circumference measures) were provided for consideration in the Round 2 questionnaire (Figure 1). Participants were not limited to a single choice; however, they were asked to additionally identify their preference.

An example provided for how size change could be displayed (Round 2).
Data analysis
For the data gathered in Round 1, the percentage of the group endorsing each item was calculated. The mean, standard deviation of the mean, and the median for the relevance of inclusion for each item were determined. This analysis was used to determine the value experts placed on each item for inclusion in the dataset, and the spread of opinion. The data were analysed using IBM SPSS statistics (Version 26) predictive analytics software. In Round 2, as participants were limited to indicating only whether an item was to be included, the percentage of the group endorsing the item was reported.
Ethics approval
Consent was embedded at the beginning of the questionnaire and provided by all participants. Ethics approval was granted by the Sydney Local Health District Human Research Ethics Committee (HREC/15/RPAH/411).
Results
Stage 1 – Synthesis of outcomes
The breadth of outcome measures and descriptors seen in the literature and in use by therapists was narrowed in an analysis of frequencies. This process narrowed the number of objective outcome measures from 25 to 9 (Box 1) with infrequently observed/reported measurement tools such as calliper and ultrasound measures for dermal thickness, tonometer for tissue resistance, and Moisture Meter™ for spot impedance measurement excluded. The list of descriptors for observed and palpated changes (combined) was also narrowed from 42 to 17.
Stages 2 – The modified Delphi study
In Round 1, responses were received from 43 of the 70 invitees (61%). These participants described themselves as physiotherapists (n = 25), occupational therapists (n = 8), nurses (n = 3), researchers (n = 3), medical practitioners (n = 2) and other (n = 2). Responses were received from Australia (n = 33), UK (n = 3), Europe (n = 2), US (n = 3), Canada (n = 1) and New Zealand (n = 1). These 43 respondents were invited for the Round 2 questionnaire with 40 providing a response (93%).
Consensus was achieved for 96% of the items proposed in Round 1 and 41% of the items proposed in Round 2. The items that reached consensus, and the average value for each item, are listed in Table 1. For circumference measurements, options of recording measures at regular intervals along the arm and at anatomical landmarks, typically taken for compression garment prescription, reached the threshold for inclusion. The four options for reporting circumference measures that reached consensus were: (i) calculation of single point interlimb difference; (ii) summed interlimb differences; (iii) volume derived by mathematical conversion; and (iv) interlimb difference as a percentage. Items lost in Round 1 were the use of a ratio to represent interlimb size difference and a ratio to represent interlimb fluid differences using BIS. Inclusion of a graded pitting test and record of segmental bioimpedance did not reach consensus.
Final data set.
ROM: range of movement; VAS: visual assessment scale.
Participants were invited to identify any items they felt were absent from the proposed dataset in Round 1. This introduced 11 new for descriptors for palpated and observed changes for consideration in Round 2. One item suggested for inclusion from Round 1 was a visual assessment scale to indicate the (i) intensity and (ii) distress caused for each reported symptom. Inclusion of the VAS for patient-reported symptom was confirmed in Round 2. Interlimb difference as a percentage was identified as the preferred outcome measure for reporting changes in size (Figure 2) and although not lymphoedema-specific, a date and time stamp, the opportunity to record weight, and joint range of movement were introduced for inclusion in the dataset. Of the 11 new descriptors proposed in Round 1, five were added to the final dataset (Figures 3 and 4). Descriptive terms that did not reach consensus through this process were: (i) erythema (instead of colour difference), erythema (in addition to colour difference), clothing/garment marking and lymph blockage apparent from clothing/garments for observed changes; and (ii) rubbery, woody, congested, thickening (instead of fibrosis), thickening (as well as fibrosis) for palpated tissue descriptors. The terms added and removed through each stage of the analysis are shown in Figures 3 and 4.

Preference for display of size.

Descriptors for observed tissue change.

Descriptors for palpated tissue change.
Final free text comments from Round 2 revealed some themes raised for consideration regarding the development of a clinical support system. Four respondents encouraged a need for brevity and/or simplicity to encourage uptake of a clinical support system. A further three respondents recommended limiting the number of descriptors for subjective changes but providing an opportunity for the therapist to add their own terms. For display of visual and palpated changes, participants also proposed the option to upload photographs and/or localise the changes using a body chart, in addition to identifying the observed and palpated changes with descriptive terms.
Discussion
A core dataset for the assessment of upper body lymphoedema was identified, based on consensus from expert researchers and therapists working in the field of lymphoedema. As the proposed dataset was derived and synthesised from an in-depth analysis of assessment and reporting from research (Sierla et al., 2018) and the clinical environment (Sierla et al., 2020a, 2020b), it is perhaps not surprising that 96% of the items proposed in Round 1 of the Delphi process reached consensus.
While the measurement of limb circumference was the most used objective measure in clinical and research settings (De Vrieze et al., 2019; ISL, 2020; Sierla et al., 2018), inconsistencies in how these data were collected, analysed and reported have been a barrier for sharing and comparing data (Ezzo et al., 2015; Sierla et al., 2018). For example, the domain of size change was reported in 19 different absolute and relative ways in the literature (Sierla et al., 2018). For the first time, a consensus process has identified percentage interlimb difference as preferred for reporting change in size. Percentage interlimb difference can be calculated from either volume or circumference data, ensuring results can be compared regardless of the measurement tool used or protocol followed. Equally, a ratio to represent interlimb difference could meet this need; however, this did not reach consensus, likely due to lack of familiarity with this approach. Percentage and limb volume calculation were discussed as desirable during therapist interviews; however, the time required to manually calculate these data was reported as the barrier to their use (Sierla et al., 2020a). Data processing can automate these calculations and remove this barrier. Importantly, a graphed display of percentage interlimb difference and limb volume can provide a more readable longitudinal view, providing the form and context necessary to support clinical reasoning and to inform patients of their progress and treatment response. Furthermore, standardising this commonly collected outcome measure at point of care would enable longitudinal and cross-sectional comparisons of these data for service evaluation and research.
Limb volume or percentage interlimb difference represent whole limb differences without localisation of where the differences in size occur. From clinic observations, it was evident that the interlimb circumference differences at specific locations were used to target treatment and assess change (Sierla et al., 2020b). The Delphi process agreed that these data were desirable, and that a graphed display of bilateral circumference measures and point differences along the limb was also desirable. Within the digital platform value could be added by providing a flag when an interlimb difference exceeds a normative range. These ranges were derived from previous research that determined what the normative population data was for interlimb differences at 10 cm intervals (Dylke et al., 2016).
The assessment of joint ROM was identified as desirable in Round 1, and confirmed in Round 2, as necessary for inclusion in the dataset. Joint ROM and additional items about issues such as cording or fixed scarring, may be relevant to patient management; however, they are not lymphoedema-specific. These additional items are relevant for upper limb lymphoedema management as they may influence, or be influenced by, the presence of lymphoedema and are therefore clinically relevant. This is distinct from the lymphoedema-specific dataset. Changes in lymphoedema presentation due to treatment or progression of the condition were detectable with assessments from the lymphoedema-specific dataset only, hence their relevance and specificity for comparative effectiveness research.
Subjective assessment was more commonly observed in the clinic rather than the research environment (Sierla, 2020). Patient-reported symptoms were highly valued, with the inclusion of a VAS for the patient to identify both the level of intensity and distress for each symptom (Portenoy et al., 1994; Ridner and Dietrich, 2015), reaching consensus in Round 2. The option of free text to enable therapists to add additional relevant terms for observed and palpated changes, and symptom reports were raised in the final comments from Round 2.
A consideration for this study was that while the number of participants recruited for this research was appropriate for a Delphi study (Akins et al., 2005), the majority (n = 33) were Australian, due, in part, because of the contacts known to the team. Therefore, findings are likely skewed toward an Australian practice model. This is particularly relevant for BIS. Three international respondents did not have experience in its use in contrast with only one Australian lacking experience with this assessment. Most respondents described themselves as therapists (40/43), rather than researchers, although 21 of those invited were identified from their research publications in the field. Therefore, the dataset may be determined more in consideration for patient care rather than research. This is appropriate for the purpose intended, namely the development of a clinical support tool. As noted in methods, there were two different strategies used for rating consensus. Due to the strength of consensus in Round 1, both were dichotomised such that an item was included where consensus was above 70% in both rounds.
While this approach sought consensus on a core dataset for upper limb lymphoedema, the reality is that this only represents our understanding at a given time. The pathophysiology of the lymphatics is a rapidly evolving field and the outcomes of interest and our language to describe change continues to evolve in parallel. The benefit of a digitised support system is that it can become a learning health environment (Kalra et al., 2017). By enabling the collection of real-world data and implementing new findings, a learning health environment can evolve and grow alongside the profession. For example, the confirmed list of terms to describe palpated change can include ‘other’ for therapists to add their own terms. It is simple to then monitor the terms currently in use and any new terms coming into common use and amend the list to reflect this change. Or, as previously noted, normative thresholds can be used to flag when objective measures move beyond the norm for detection. Establishing consensus on a core dataset to represent current practice provides a launching pad for this important development to begin.
The findings of this study will inform the development of a clinical support system for lymphoedema. This platform will be presented for a systematic evaluation of technology acceptance and usability with clinicians using a ‘Think Aloud’ protocol as a next step in this body of work.
Conclusion
A core dataset for the assessment of upper body lymphoedema has been agreed through consensus from clinicians and research experts. This dataset includes both objective clinician-reported outcome measures and descriptors for observed and palpated changes. The dataset extends beyond lymphoedema-specific outcomes to include domains that impact the condition (e.g. joint ROM, BMI) or inform data analysis (e.g. dominance). To reduce clinical variance and enhance communication, this dataset provides a recommended language for reporting change in lymphoedema and provides the foundation for developing a clinical support system for lymphoedema and a learning health environment for the future.
Footnotes
Acknowledgements
Thank you to all respondents who shared their expertise in establishing this dataset and to Sydney Research for funding this research through their Health Informatics Scholarship.
Declaration of conflicting interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Dr Robyn Sierla declares she was employed by Sydney Local Health District while writing this paper. All research for this paper was undertaken independently from her employer as part of her doctoral degree at Ther University of Sydney. All other authors declare no conflicts of interest.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: apart from Dr Robyn Sierla. Sydney Research provided funding support to Dr Sierla, quarantining time to lead this investigation.
Data availability statement
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
