Abstract
Background:
The number of children worldwide requiring palliative care services is increasing due to advances in medical care and technology. The use of outcome measures is important to improve the quality and effectiveness of care.
Aim:
To systematically identify health-related quality-of-life outcome measures that could be used in paediatric palliative care and examine their feasibility of use and psychometric properties.
Design:
A systematic literature review and analysis of psychometric properties.
Data sources:
PsychInfo, Medline and EMBASE were searched from 1 January 1990 to 10 December 2014. Hand searches of the reference list of included studies and relevant reviews were also performed.
Results:
From 3460 articles, 125 papers were selected for full-text assessment. A total of 41 articles met the eligibility criteria and examined the psychometric properties of 22 health-related quality-of-life measures. Evidence was limited as at least half of the information on psychometric properties per instrument was missing. Measurement error was not analysed in any of the included articles and responsiveness was only analysed in one study. The methodological quality of included studies varied greatly.
Conclusion:
There is currently no ‘ideal’ outcome assessment measure for use in paediatric palliative care. The domains of generic health-related quality-of-life measures are not relevant to all children receiving palliative care and some domains within disease-specific measures are only relevant for that specific population. Potential solutions include adapting an existing measure or developing more individualized patient-centred outcome and experience measures. Either way, it is important to continue work on outcome measurement in this field.
The use of outcome assessment tools is important to measure quality and effectiveness of care.
The population of children requiring paediatric palliative care services is diverse.
There are no outcome assessment tools validated specifically for use within paediatric palliative care.
This is the first review to systematically identify existing health-related quality-of-life outcome measures for use in paediatric palliative care.
The paper finds that there is currently no ‘ideal’ outcome assessment tool for use in paediatric palliative care.
As with adults, both outcome and experience measures are important to achieve and maintain best care for children and young people.
Understanding a child or young person’s own goals for care and treatment – alongside more standardized outcomes – is likely to be most valuable.
Background
Palliative care for children begins at diagnosis and encompasses children with a variety of life-limiting and life-threatening conditions. Life-limiting conditions are diseases where there is no reasonable hope of cure and that will ultimately be fatal. 1 Life-threatening conditions are those where curative treatment may be possible but can fail, for example, cancer. Worldwide, more children are living longer with such conditions due to advances in medical care. Paediatric palliative care (PPC) is about helping children and their families deal with their medical condition, while enabling them to live life to the fullest. 2 Palliative care for children and young people (CYP) is an active and total approach to care, and begins at the point of diagnosis, throughout the child’s life, death and beyond. 3 The scope of PPC is broad and PPC services care for CYP with a wide variety of illnesses, many of which are extremely rare.3,4 Children can be receiving care from these services for many years and therefore it is imperative to ensure that they are supported to live life to their fullest potential.
Health-related quality of life (HRQOL) has been described as a subjective, multidimensional and dynamic construct that comprises physical, psychological and social functioning. 5 HRQOL measurement instruments must consist of physical, social and mental health dimensions as delineated by the World Health Organization (WHO). 6 Given that one of the goals of PPC is to improve HRQOL, service providers, researchers, fundraisers and policy makers will want to measure HRQOL and determine the effectiveness of services in achieving this.
There are no paediatric HRQOL measures that have been successfully validated for use within PPC. One study did attempt to validate the well-used Pediatric Quality of Life 4.0 measure in children with a variety of life-limiting conditions. However, the study found that the instrument did not have valid psychometric properties for use within this population. 7 Therefore, within PPC, two possibilities exist: devising a completely new HRQOL instrument, or revising and validating an existing one. A review of existing HRQOL measures is essential prior to deciding which course of action to take.
The aim of this systematic literature review is to examine the measurement properties of existing HRQOL instruments for use in those up to 18 years old. It will also assess the feasibility of the measures being used in the CYP palliative care population in terms of completion time, response options, recall period, format, domains and whether the measure is parent, professional or self-completed.
Methods
This systematic literature review was performed in accordance with Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines. 8
Identification of studies
PsychInfo, Medline and EMBASE were searched from 1 January 1990 to 10 December 2014. Experts in the field were asked to suggest any further measures. The search terms used included keywords such as child, adolescent and teenager in combination with terms used to find studies on measurement properties of HRQOL measures.9,10 Language restriction to the English language was applied due to practical constraints within the research team. Reference lists of included articles were also searched for further publications. Box 1 shows the search strategy used.
Search strategy.
Inclusion/exclusion criteria
A study was included if it met the following inclusion criteria:
Full-text article;
Written in the English language;
Examining one or more measurement properties of an instrument that measured physical, mental and social aspects of HRQOL as delineated by WHO; 6
The study population were under 18 years old;
The measure was generic or disease specific;
Disease-specific instruments had to assess HRQOL in an illness considered to be life-limiting or life-threatening;3,4
Studies of generic measures were included only if some or all of the population in the study had a life-limiting illness;
Included measures could be completed by the CYP, parent or clinician.
Study selection
The results of the search were thoroughly checked and full manuscripts of all studies whose title/abstract seemed to meet the selection criteria were retrieved. Independent reviewers (L.H.C. and G.L.) examined these full-text articles and made the final decision as to whether they were included.
Data extraction
The methodological quality of included studies was rated using the COnsensus based Standards for the selection of health Measurement INstruments (COSMIN) checklist. 11 This checklist contains nine boxes, each dealing with one measurement property. There are 5–18 items per box (98 items in total) that can be used to assess whether a study on a specific measurement property meets the standard for good methodological quality. The checklist evaluates the following measurement properties: internal consistency, reliability, measurement error, content validity, construct validity (structural validity, hypothesis testing, cross-cultural validity), criterion validity and responsiveness. There are three additional boxes: one to assess the methodological quality of studies on interpretability, one to assess the generalizability of results and one that includes extra methodological standards for studies that use item response theory (IRT; 12 boxes in total). Each item is scored on a 4-point rating scale (poor, fair, good and excellent). 12 An overall score for the methodological quality of a study is determined for each measurement property separately, by taking the lowest rating of any item in a box (worst score counts). 11
Box 2 gives definitions of these measurement properties.
Definitions of measurement properties.
Synthesis of results
To summarize the evidence of measurement properties of each included instrument, the results were combined. The number and methodological quality of the studies were taken into account, along with the consistency of results. A method similar to that proposed by the Cochrane Back Review Group was used (Table 1). 17
Levels of evidence for overall quality of measurement property.
Source: Tulder et al. 17
+: positive results; −: negative results.
The overall rating of each measurement property is ‘positive’, ‘negative’ or ‘indeterminate’, accompanied by levels of evidence. These criteria were originally meant for systematic reviews of clinical trials but have been used in reviews on measurement properties. 12 To assess whether results of measurement properties were positive, negative or indeterminate, criteria based on Terwee et al. 18 were used (Table 2).
Quality criteria for measurement properties.
Source: Terwee et al. 18
MIC: minimal important change; SDC: smallest detectable change; LOA: limits of agreement; ICC: intraclass correlation coefficient; AUC: area under the curve; +: positive rating; ?: indeterminate; −: negative rating.
This assessment of measurement properties was then looked at alongside the feasibility of each measure being used in PPC in terms of completion time, response options, recall period, format and domains.
Research ethics committee/institutional review board (IRB) approvals were not required as this was a systematic review of pre-existing evidence.
Results
Paper selection
A total of 3451 articles were found using the search strategy and a further 9 were found via reference searching. A total of 125 of these were selected for full-text review based on title and abstract. A total of 41 were selected to be included in the review. Figure 1 shows a flow chart of the article selection process, and Table 3 shows the general characteristics of included studies.

PRISMA flow chart.
Summary of included studies.
CP-QOL: cerebral palsy quality of life; HRQOL: health-related quality of life; DMD: Duchenne muscular dystrophy; SMA: spinal muscular atrophy.
Mixture of healthy, chronically ill and acutely ill children.
Two papers reporting same data.
Summary of results
The 41 articles included evaluated 22 HRQOL measures for use with children aged 0–18 years. Two papers discussed the results from the same study so were only analysed once.19,20 All included measures were originally developed to be completed in paper format. Five of the included measures were generic and 17 were disease specific. Of the disease-specific measures, three are for use with children with cardiac disease, three for cerebral palsy, six for cancer, one for brain tumours, three for epilepsy and one for neuromuscular disease. Four measures are child completed, four parent completed, thirteen have both parent and child forms and one measure had both child and clinician forms. Completion time ranged from 2 to 25 min, with the number of items ranging from 6 to 87. Recall time ranged from the current moment to 1 month. Table 3 shows a summary of the included studies, where data are missing it is because they are not available.
None of the studies included analysed measurement error or cross-cultural validity. One study reported on responsiveness and one on criterion validity. 31 The COSMIN panel define criterion validity as ‘the degree to which the scores of an instrument are an adequate reflection of a gold standard’. 16 As outcome measures focus on perceptions that may by subjective, they usually lack a gold standard. The only exception to this would be if a shorter version of a measure was developed from an already validated longer one, where the longer version could be considered to be the gold standard. Therefore, the results of this analysis of criterion validity have not been included here.
The methodological quality of included studies is shown in Table 4 and ranged from poor to excellent. Table 5 shows the synthesis of results per outcome measure with levels of evidence of quality. This ranged from strong to unknown.
Methodological quality of included measures.
Two papers reporting on the same data.
Data synthesis.
+++ or – – –: strong evidence for positive/negative results; ++ or – –: moderate evidence for negative/positive results; ±: conflicting evidence; ?: unknown due to poor methodological quality.
Internal consistency was tested in 34 of the included studies. However, 40% of these lacked a check of the uni-dimensionality of the scale, leading to a score of poor for methodological quality. 59
Test–retest reliability testing was carried out in 14 of the studies. In all, 38% of studies had a sample size of at least 100 which is needed for an excellent quality score. 11 Only one study described how missing items were handled. If not handled appropriately, this could lead to over or underestimation of reliability.
Content validity testing was carried out in 13 of the included studies. The main flaws in the methodology of content validity testing were inclusion of only small numbers in focus groups, pilot studies and cognitive testing and not involving children, parents and professionals in the process. Ideally, all should be included to make sure the items are relevant and ensure no important items are missing.
Structural validity can be assessed by factor analysis or IRT tests for dimensionality. 12 Structural validity was assessed in 14 of the studies and generally there was appropriate use of confirmatory or exploratory factor analysis. In order to carry out structural validity testing, a sample size of 5–7 times the number of items (and greater than 100) is recommended. 11 This was achieved in 93% of the studies. Lack of description of missing items and how they were handled let down 64% of the studies.
Only one study analysed responsiveness. 58 No correlations between change scores were calculated in the included study; a paired t-test was carried out instead, thus the methodology for this was scored as poor. Sample size was also inadequate and there was no description of how missing items were handled.
Discussion
Main findings
To the authors’ knowledge, this is the first systematic review of outcome measures that could potentially be used in PPC. The aim of this review was to examine the feasibility of use of measures, as well as the methodological quality of analysis of measurement properties of included studies. The review identified 22 measures, 5 generic and 17 disease specific, which could potentially be useful. The disease-specific measures included those for use in children with cardiac disease,23,38–40,51 cerebral palsy,24,27,52 cancer,34–37,41,50,58 brain tumours, 49 epilepsy31,32,56 and neuromuscular disease.53–55 All measures were initially developed in the English language. None of the measures were developed for use in CYP receiving palliative care. All were developed to be completed in paper format, predominantly by the CYP and/or their parent.
The PedsQL™ Generic Core Scale was the most widely analysed in terms of its measurement properties. It is unique because it contains a generic core scale and various disease-specific modules that can be administered alongside the core scale.
Quality of assessment of measurement properties
None of the studies on measurement properties in this review achieved a score of fair methodological quality or higher in all characteristics. Most of the studies show positive results (except parent test–retest reliability in two studies and hypothesis testing in one, see Table 5). Evidence is mainly limited and at least half the information on measurement properties per questionnaire is missing. The methodological quality of the included studies varied greatly and therefore results should be treated cautiously.
Internal consistency, reliability, content validity and hypothesis testing were widely assessed in the papers. Only one study analysed responsiveness. 58 It is imperative that any measure used to assess HRQOL is responsive to change, particularly in PPC, where a child’s condition can change frequently and sometimes rapidly. Measurement error was not tested in any of the included studies. With the same data, both reliability and measurement error can be calculated. 12 In all, 14 of the 22 included studies assessed reliability, thereby measurement error could easily have been reported.
Feasibility of use of included measures
In adult palliative care, there are concerns regarding the use and relevance of outcome measures. 60 These concerns include the method of administration and whether the patient, carer or professional completes the measure. 60 These concerns are probably just as applicable to PPC. Many children requiring palliative care services are non-verbal or too unwell to self-complete the tools and thereby rely on the reports of their carers and/or professionals. The method of administration of a measure is also important. Different modes of administration may be appropriate depending on the type and stage of a CYP illness. The PedsQL™ is the only measure included in this review that has been validated across different modes of administration. 48 Multi-group confirmatory factor analysis was performed showing strong factorial invariance across three modes of administration groups (mail, in-person and telephone survey). With widespread mobile technology now available, new ways of collecting data, such as online or via an app, should be considered as these may be more acceptable to CYP and their carers, as well as being easier to access.
Within PPC, as in adult palliative care, there is a debate as to who should complete outcome measures. Most children with life-limiting and life-threatening illnesses are cared for at home by their parents, so a clinician completed measure is not always ideal. HRQOL is generally understood as a latent, not directly observable construct, and contains the perceptions and evaluation of one’s life from the subjective view of the individual, as well as the individual’s subjective well-being and affective mood. 61 Wherever possible, the child’s self-report of HRQOL should be sought. Within this population, some children will be too young or too unwell to complete a measure and a parent/proxy completed measure will need to be used. A total of 19 of the 22 measures included in this review contain parent reports. Of those studies that looked at correlation between child and parent scores, three found moderate correlation between parent and child scores.37,38,49 One study showed poor correlation in the psychological and emotional subscales. 52 These results support those of previous studies that show a higher correlation for observable constructs, such as physical aspects, and a lower correlation for non-observable constructs such as emotional problems between parents and children. 62
Recall period in the included studies ranged from the current moment to 1 month. Research has shown that children as young as 8 years can use a 4-week recall period with accuracy. 63 However, HRQOL measures with shorter recall periods are likely to elicit more accurate responses. 64 Most of the disease-specific measures had shorter recall periods, which is more appropriate as there can be variation in symptoms over a longer period in many cases. Children with palliative care needs often have frequently changing symptoms which can affect their HRQOL so a measure with a shorter recall period may be more appropriate.
There were a variety of response options used in the included measures. The most common method was a Likert scale and response options ranged from 3 to 9 points. It has been recommended that fewer responses should be employed for younger children as they tend to choose responses at the extremes. 63 There is also little evidence showing that young children can effectively respond to Likert scales. 64 The completion time (when reported) for measures was between 2 and 25 minutes. Shorter measures are preferable in PPC as children will fatigue easily. Shorter parent-completed measures are also preferable as parents will already have the burden of caring for their sick child.
HRQOL instruments may be either generic or disease specific. 7 Generic measures are useful for comparing general quality of life across different populations. These measures are used with healthy children so are more likely to have been validated based on large samples but may lack sensitivity in sick CYP. Disease-specific quality-of-life instruments, on the other hand, are used to compare quality of life within a given condition. Disease-specific measures are assumed to be more sensitive to the implications of different illnesses and may be more appropriate for evaluating interventions or different treatments within CYP with the same illness. 62 The drawback of this is that it is not possible to compare HRQOL across groups of CYP with different illnesses, which is essential for a discipline as wide and varied as PPC. The measures included in this study contain varying numbers of domains but all covered the constructs of HRQOL (physical, emotional and social). Some of the domains included in the generic HRQOL measures may not be relevant for the PPC population. For example, domains such as school environment may be irrelevant for a child near the end of life. One of the included studies aimed to validate the PedsQL™ in children with life-limiting illnesses. 7 Confirmatory factor analysis did not support the construct validity of the PedsQL™ in this group of children, implying that the hypothesized HRQOL structures between children with life-limiting illnesses and other populations may be different. Most of the generic measures included in this review do not capture the impact of life-limiting illness on daily functioning and well-being.
Implications for research
As discussed above, it is questionable whether any of the included generic measures, such as the PedsQL™ and Child Health Questionnaire, would be valid in the PPC population without adaptation, due to concerns regarding construct validity. The Memorial Symptom Assessment Scales (MSAS) for children could potentially be useful in PPC.34,35 Although they capture many of the domains of PPC, they would need testing for validity and reliability in the population. It is unlikely that without adaptation, they would be useful in a non-cancer population as there is a question about hair-loss, which is unique to this group of children. The methodological quality of studies on the MSAS was fair throughout. Other disease-specific measures included in this review may be useful in PPC. For example, the PedsQL™ Neuromuscular module was designed for use in children with spinal muscular atrophy and muscular dystrophy which are both life-limiting conditions.53,54,57 However, within the three studies included in this review, the majority of assessment of its psychometric properties was scored as fair or poor. It is unlikely that any of the included measures would have acceptable measurement properties in the entire range of children receiving PPC services, as the population is so diverse.
None of the measures included in this review meet all the requirements for use in the PPC population. The generic measures do not capture the full impact of living with a life-limiting illness and often have recall periods that could be considered too long in a child whose condition may be changing frequently. The disease-specific measures contain domains that are only relevant to CYP with specific illnesses so could not be used to compare children with different conditions. One potential solution to this is to revise an existing instrument. An alternative is to develop a completely new measure. It is questionable whether by using either method it will be possible to develop a HRQOL outcome measure for a population as diverse as PPC. Children have many different types of illnesses, some of which are extremely rare and each illness comes with its own set of physical, psychological and emotional needs. All items in a measure may not be equally useful for children with different life-limiting conditions. Findings from other studies have suggested that static models (all items are administered to all subjects) will increase measurement error and decrease precision. 7 The use of IRT along with computerized adaptive testing (CAT) may better assess HRQOL for this population. 7 Alternatively, using individualized measurement tools rather than standardized ones may be a solution. 7 The Schedule for the Evaluation of Individual Quality of Life (SEIQoL) has been shown to be valid and reliable in a population of terminally ill adult cancer patients. 65
Two relatively new concepts in healthcare, patient reported experience measures (PREMs) and patient-centred outcome measures (PCOMS), may also be beneficial to CYP and their families receiving palliative care services, but more research in this area is required. PCOMs involve putting patients and their families/carers at the heart of deciding which goals are most valuable for an individual, rather than clinicians deciding what is best. 66 PREMs measure patient experience with the goal of improving services. It is desirable to combine measures of experience with measures of outcome to obtain a rounded view of the quality of care. 67
Strengths and limitations
This review has several strengths. First of all, this is the first review the authors are aware of which examines the measurement properties and feasibility of using already developed outcome measures in the PPC population. The review was comprehensive, the search strategy found more than 3000 articles for potential inclusion and over 40 papers were systematically appraised and compared.
This review also has several limitations. First, it is never possible to be sure that all relevant studies have been identified. The COSMIN checklist is based on expert group opinion. The inter-rater agreement of the COSMIN checklist is adequate. The inter-rater reliability for many COSMIN items is poor, which has been suggested to be due to interpretation of checklist items. 68 Selected articles were restricted to English language. Finally, it was sometimes not clear if certain criteria on the COSMIN checklist were not performed or not reported on. Therefore, it was not possible to distinguish between poor reporting and poor quality.
Conclusion
Although there is no ‘ideal’ HRQOL measure for use in PPC at the moment, it is important to continue developing and researching measures in this area.
Outcome measurement in PPC is rarely carried out and as of yet there are no specific HRQOL measures for use in this population. In light of new developments in the field of PREMS and PCOMs, it may be desirable to develop a combination of measures that are able to measure outcomes that are important to the individual child and family, as well as measuring their satisfaction of the experience of the services that deliver care. The purpose of measuring quality of life and outcomes in CYP receiving PPC is potentially fourfold: to improve clinical care, to audit and evaluate services, for research purposes and to inform commissioners and secure funding. 60
Footnotes
Acknowledgements
This work was undertaken towards an MSc in Palliative Care at the Cicely Saunders Institute, King’s College London. This article presents independent research funded, in part, by the NIHR Collaboration for Leadership in Applied Health Research & Care (CLAHRC) Funding scheme, through CLAHRC South London. The views expressed in this publication are those of the authors and not necessarily those of the National Health Service, the National Institute for Health Research or the Department of Health. CLAHRC South London is part of the National Institute for Health Research (NIHR) and is a partnership between King’s Health Partners, St. George’s, University London, and St George’s Healthcare NHS Trust.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Apart from NIHR CLAHRC support for the senior and administrative input as acknowledged above, this research received no additional funding from commercial, public or not-for-profit sectors.
