Abstract
Goal Attainment Scaling is an assessment instrument to evaluate interventions on the basis of individual, patient-specific goals. The attainment of these goals is mapped in a pre-specified way to attainment levels on an ordinal scale, which is common to all goals. This approach is patient-centred and allows one to integrate the outcomes of patients with very heterogeneous symptoms. The latter is of particular importance in clinical trials in rare diseases because it enables larger sample sizes by including a broader patient population. In this paper, we focus on the statistical analysis of Goal Attainment Scaling outcomes for the comparison of two treatments in randomised clinical trials. Building on a general statistical model, we investigate the properties of different hypothesis testing approaches. Additionally, we propose a latent variable approach to generate Goal Attainment Scaling data in a simulation study, to assess the impact of model parameters such as the number of goals per patient and their correlation, the choice of discretisation thresholds and the type of design (parallel group or cross-over). Based on our findings, we give recommendations for the design of clinical trials with a Goal Attainment Scaling endpoint. Furthermore, we discuss an application of Goal Attainment Scaling in a clinical trial in mastocytosis.
Keywords
1 Introduction
For diseases with very heterogeneous courses or stages where symptoms differ substantially between patients, the evaluation of new treatments can be challenging when no standardised outcome measure, applicable to all concerned patients is available. This is of special concern in rare diseases where separate clinical trials in homogeneous subgroups of patients are not feasible because of the small number of patients available. Examples of such heterogeneous disorders are mitochondrial DNA diseases where the same underlying mitochondrial defect may have a wide range of symptoms, varying from coordination disturbance and muscle weakness to developmental delay and hearing loss. A drug that targets the mechanism underlying the disease could lead to an improvement in groups of patients with very heterogeneous symptoms. However, an outcome measure such as a walking or a hearing test will only be able to describe improvements in the subgroup of patients that are affected by the corresponding symptom. Restricting a clinical trial to such specific subgroups with homogeneous phenotypes may lead to too small sample sizes due to the low disease prevalence of 9.2 in 100,000. 1
Another example is Duchenne muscular dystrophy, a disabling and life-threatening X-linked recessive genetic disorder that primarily affects males.2,3 It results from defects in the gene for dystrophin, a structural protein required to maintain muscle integrity. First signs of Duchenne are increasingly abnormal ambulation due to proximal muscle weakness. Problems with falling while walking, standing up from supine position or climbing stairs are typically encountered before the age of 8. By 10–14 years of age, most boys with the disease are restricted to a wheelchair. Except for walking abnormalities including stride length and cadence, major disease manifestations are impairments in upper and lower extremity movements and strength, such as elbow flexion, elbow extension, knee flexion, knee extension and shoulder abduction, but also endurance, and cardiorespiratory status. A currently often used outcome measure for ambulatory Duchenne patients is the 6-min Walk Test. This endpoint, however, has been criticised because it is restricted to ambulatory patients and provides insufficient information in case of loss of ambulation.4–6
A general tool to quantify the treatment benefit in a population with very heterogeneous symptoms is Goal Attainment Scaling (GAS) introduced by Kiresuk and Sherman. 7 It has been proposed as a patient centred outcome measure capturing the treatment effect across a range of manifestations. GAS has been used as an endpoint in rehabilitation research,8,9 in geriatric trials to measure changes in the health status of frail elderly patients,10,11 to evaluate health care,12,13 educational programs 14 and psychosocial interventions, 15 but rarely in comparative clinical drug trials to assess the effect of an experimental treatment compared to a control. 16 Examples for such randomised controlled trials are studies assessing botulinum toxin treating patients with upper limb spasticity17–19 and a trial to evaluate donepezil 20 for the treatment of Alzheimer’s disease.
GAS endpoints are assessed in a procedure with several steps. First, patients formulate one or more goals together with a treating physician. Such goals can be, for example, to improve the maximum walking distance, to improve independence in a selected activity of daily living such as eating or to be able to use a computer mouse. Typically, the choice of goals is a process in which both the patient and the investigator take part. The investigator supports the trial participant to identify goals that are most relevant for the disease, are feasible, and can be measured objectively. In some settings, each patient and/or the caregiver is interviewed and the investigators define the goals for the specific patient based on this interview. Note that especially for diseases with very heterogeneous manifestations, the chosen goals may be unique to the patient. Furthermore, the number of chosen goals may vary across patients, a result of the goal setting step. In addition to the goals, criteria defining attainment levels for each goal are specified. The number of attainment levels is the same for all goals and often a scale with five measurement levels from −2 to 2 is chosen as suggested in the original article about GAS by Kiresuk and Sherman. 7 Finally, patients can optionally choose weights for the goals to differentiate between goals of different relevance or importance. This concludes the goal setting steps. After the treatment intervention, at a given follow-up time, the assessment of the goal attainment levels for each patient is performed according to the pre-specified assessment criteria.
Figure 1 gives a schematic illustration of the application of GAS as endpoint in a double-blind randomised clinical trial. To avoid bias, the goals, the criteria for the assessment of their attainment, and the weights are chosen before the patient is randomised to one of the treatment groups. This can prevent systematic differences in the choice of goals between treatment groups even if patients and/or physicians cannot be fully blinded. However, blinding is important for the assessment of goal attainment at the time of follow-up to avoid imbalances between treatment groups. If blinding of patients and physicians is not possible, the validity of the assessment can be improved if the assessment of goal attainments is performed by an assessor blinded to the assigned treatment.
Illustration of the application of a GAS endpoint in a randomised clinical trial. Supported by an investigator, each patient chooses disease related goals he or she wants to attain. The type and the number of goals are chosen individually and can correspond to different dimensions of symptoms. In the example, the patient chooses two goals related to physical functioning and one goal related to bodily pain but no goal related to the other dimensions. For each goal criteria, specific attainment levels are defined before the intervention. Additionally, goals can optionally be weighted to quantify differences in importance and relevance of the goals. After the goals are set, patients are randomised and allocated to the treatment groups. After a predefined follow-up period, the attainment level for each goal is assessed according to the pre-specified criteria.
The statistical analysis and interpretation of GAS endpoints is challenging because the goals of each patient may be unique and the number of goals across patients may vary. As a consequence, current practice and opinions differ substantially.9,21–23
An important advantage of GAS as an endpoint for clinical trials in rare diseases is the potential to include patients with very heterogeneous disease manifestations. This allows one to broaden the pool of potential trial participants and to speed up recruitment. Furthermore, the involvement of patients in the choice of goals can increase the relevance of the endpoint to the patient which is an important factor of patient-focused drug development (see literature 24 for a recent discussion). However, practical challenges, such as the training and time required for goal setting as well as a lack of scientific literature on study design and analysis methods may be an obstacle in the application of GAS as an endpoint in clinical trials. Recently, several systematic literature reviews addressing the psychometric properties of the GAS scale, i.e. its validity, reliability and responsiveness have been performed16,25 but only general, qualitative recommendations9,21,26 regarding the statistical analysis of GAS endpoints are available.
In this article we address the statistical analysis and study design of comparative clinical trials with a GAS endpoint. Especially, we study the statistical properties of different analysis methods and explore the impact of the number of goals per patient, the distribution of effect sizes across goals, the correlation of goal attainment levels and other factors that may have an influence on the power and type I error rate of statistical tests. To this end, in Section 2.1 we propose a probabilistic model for GAS data in clinical trials. In Sections 2.2, 2.3 and 2.4, analysis methods to demonstrate a treatment effect in randomised clinical trials with a GAS endpoint are introduced and in Section 2.5 models accounting for goal-specific weights are discussed. In Section 2.6 the robustness of the testing procedures with regard to the model assumptions is explored. In Section 3 we introduce a hierarchical model to simulate GAS data and report the results of a simulation study investigating the power and type I error rate of the considered analysis methods under a range of scenarios. The extension of the testing procedures to cross-over trials is discussed in Section 4. In Section 5 an example from a clinical trial is presented. Finally, in Section 6 we discuss the limitations of the approach and give some recommendations for the analysis and design of studies with a GAS endpoint.
2 A probabilistic model for GAS and hypothesis tests
2.1 Data model
Consider a randomised parallel group trial comparing an experimental treatment to a control with respect to a GAS endpoint. Let m denote the total number of subjects and for each subject
Let Fg denote the distribution of
Let
We make two assumptions on the distributions F0, F1:
The distribution of ni is independent of gi. For
Assumption (A) states that the distribution of the number of goals per patient is equally distributed in both treatment groups. This can be achieved if, for example, the goals are set before randomisation. Assumption (B) implies that the expected mean attainment of goals is independent of the number of goals a patient sets. In Section 2.6 we discuss the validity of statistical hypothesis tests in settings where these assumptions are not satisfied.
We consider several testing approaches to test the null hypothesis H0 that take the dependence between observations from the same patient into account.
2.2 T-Test and Mann–Whitney U Test on per-subject means
A two-sample Welch’s t-test applied to the per-subject means
2.3 Generalised estimating equations (GEE) approach
The variance of
Assuming a working covariance structure with equal correlations ρx for all pairs
Following the GEE approach, both ρx and
Now, the Wald test statistics to test (1) is given by
Two important special cases are covered by the GEE approach. For the case
2.4 Standardised means (Kiresuk and Sherman)
In their initial proposal of GAS, Kiresuk and Sherman
7
proposed to standardise the per-subject means of the goal attainment levels
Several authors have applied tests based on the standardised means,8,15,20 with t-tests or Wilcoxon tests. For parallel group comparisons, the t-test tests the null hypothesis that the difference of means of the standardised per patient means is non-positive. Under assumptions (A) and (B) this is equivalent to the null hypothesis H0 (formulated for the non-standardised means), i.e.
2.5 Goal-specific weights
All the above testing approaches are based on per-subject averages of the individual goal attainment scores
Now, as discussed in Sections 2.2 and 2.4, the t-test can be applied to the means
The weighted tests, however, in general test a different null hypothesis than the unweighted test. Let
In any case, to ensure the control of the type I error rate, the weights need to be chosen independently of the treatment assignment and thus before randomisation. For example, they can be chosen at the time the individual goals are set. Weights can either be used to reflect patient preferences regarding the different importance of the goals or the weights could be chosen to maximise the power of the test, by giving more weight to goals where a larger treatment effect (i.e., larger difference to the control group) is expected. Furthermore, we assume that assumption (B) with
2.6 Robustness of the testing procedures
The type I error rate control of the testing procedures above may be compromised, if the assumptions defined in Section 2.1 are not satisfied.
If only condition (A) is relaxed and the distribution of the number of goals per patient may differ between the treatment groups, the per patient means
The t-test for the standardised means (Section 2.4) does, however, not control the type I error rate. To see this, consider a simple example: Assume that in the control group each subject chooses one goal, while in the treatment group two goals are chosen. Furthermore, let
Consider now the case that only assumption (B) is not satisfied such that the attainment of goals depends on the number of goals a patient sets while the distribution of the number of goals is equal across treatment groups. This may result, for example, if there is a trend where patients choosing more goals tend to choose increasingly challenging goals. Also in this setting, the per patient means
However, if (A) but not (B) holds, the GEE and standardised means tests are valid tests for the stronger null hypothesis
Finally, if neither assumption (A) nor (B) holds, Welch’s t-test based on the patient-wise means will still be valid, due to asymptotic normal distribution of the
Type I error rate control of the testing procedures dependent on the validity of assumptions (A) and (B).
✓: (asymptotic) control of the type I error rate for the test of H0, (✓): Type I error control for the stronger null hypothesis
3 Simulation study
3.1 A data generating model for GAS data
To investigate the properties of different analysis approaches for GAS endpoints, we introduce a data generating model for the observed goal attainment scores Xik based on continuous latent variables Yik. They allow one to parametrise the treatment effect as well as the between- and within-patient variability. Let
3.2 Simulation scenarios
We performed a simulation study to compare the power of the considered hypothesis tests to detect a treatment effect in a GAS endpoint in a parallel group design with equal per group sample sizes
Besides the unweighted case, we considered two choices of goal specific weights (see Section 2.5): (i) a setting where the weights vik (
We compared three testing procedures: the Welch t-test based on the per-subject mean specific scores (see Section 2.2), the GEE test (see Section 2.3) and the t-test based on the standardised means (see Section 2.4) were performed. All simulations were performed in R. The GEE models were fitted with the R-package geepack. 29
3.3 Results of the simulation study
Due to the discretisation of the continuous variables, the dependence of the expected value of the goal attainment scores Xik on δ is not linear in the treatment group but levels off for large δ due to a ceiling effect (see Figure 2(a)). Correspondingly, their variance decreases to zero for increasing δ. As noted above, also the correlation between the goal attainment scores of patients in the treatment group depends on the effect size. Figure 2(b) shows that the correlations of (a) The expected value of the goal attainment scores Xik in the treatment group μ1 and their variance σ as function of δ in the reference scenario described in Section 3.2. (b) The correlations of the continuous and discretised goal scores 
Figure 3(a) shows that for the reference scenario the power of the GEE approach is largest, followed by the t-test for standardised means and the t-test for the raw means. For example, for The power (a, 104 simulation runs) and type I error rate (b, 105 simulation runs) of the t-test based on the per-subject means (mean), the t-tests based on the standardised per-subject means (Kiresuk), and the test based on the GEE model (GEE) for the reference scenario.
For sufficiently large sample sizes, the type I error rate is controlled for all three procedures. However, the GEE approach is liberal for small sample sizes: for a sample size of m = 20, the type I error rate for
If goal specific weights are applied that are independent of the effect sizes (Case (i) in Section 3.2) the power drops for all three testing procedures. However, if we assume that the weights are chosen according to equation (8) (Case (ii) in Section 3.2), the power increases compared to the tests based on the unweighted goal attainment scores (see Figure 4(a)). The latter scenario serves as benchmark only, as it requires that the exact effect sizes of each goal are known in advance. If estimates based on historical data are used instead, the increase in power may be smaller.
(a) The power of the three considered testing procedures without weighting, with patient preference weighting (Case (i) in Section 3.2) and with treatment effect weighting (Case (ii) in Section 3.2) in the reference scenario for (a) The power of the t-test for standardised means for the case where correlations 

Figure 4(b) shows the impact of the maximum number of goals per patient on the power for the reference scenario. As expected the power increases with increasing nmax and the increase is more prominent for lower correlations ρ0.
To assess the robustness of the procedures, we performed simulations for several alternative scenarios and modifications of the procedure: To investigate the impact of the inclusion of goals on which the treatment has no effect, we consider a scenario where bi follows a mixture distribution with point mass (a) The power for the GEE approach for discrete goal attainment scales with 
4 Extension to cross-over designs
4.1 Hypothesis tests for cross-over designs
For the investigation of treatments with short-term effects on a chronic condition, cross-over designs may provide advantages to parallel group designs. 30 In a two-armed cross-over trial, each patient is exposed to both treatments in randomised order, over a sequence of treatment periods. The outcome variables are observed at the end of each period. Effect estimates and hypothesis tests are then based on within-patient comparisons and precision and power are determined by the within-patient variance rather than the typically larger between-patient variance determining the properties of a parallel group design. Therefore, a cross-over design may lead to a higher power or require a smaller sample size than a parallel group design.
Consider a cross-over trial with a goal attainment scaling endpoint, where each patient chooses a single set of goals and attainment categories which are used in both treatment periods. Such a design can be useful for the investigation of symptomatic treatments in stable, chronic diseases where a short-term endpoint is available.
4.2 Extension of the testing approaches to cross-over designs
The three hypothesis tests proposed in Section 2 can be extended to a two groups–two periods cross-over design:
4.2.1 Paired t-test based on per-subject means
For patient
4.2.2 Paired t-test based on standardised per-subject means
For patient i under treatment g, let
4.2.3 GEE approach 1
The GEE approach reduces to an intercept only GEE model applied to the differences
4.2.4 GEE approach 2
The GEE approach could also be applied to model
Remarks: (i) With the exception of GEE approach 1, the above hypothesis tests can be extended to adjust for co-variables such as the treatment period. To this end, the paired t-tests are replaced by a linear model for
4.3 A data generating model for cross-over trials
Data for the cross-over trial are modelled similarly as for the parallel group design based on continuous goal attainment scores defined by
In Section 5, we compare the cross-over design to the parallel group design in an example.
5 Example: Assessment of the efficacy of recombinant human diamine oxidase in mastocytosis
In November 2014 the Medical University of Vienna seeked scientific advice concerning planned clinical trials for the treatment of mastocytosis patients with the active substance recombinant human diamine oxidase (rhDAO) in order to achieve Marketing Authorisation. With a prevalence of less than 3 in 10,000 (EMA/OD/75/2014), mastocytosis is considered an orphan disease. It is characterised by too many mast cells in various organs of the body, and patients with recurrent anaphylaxis are even much rarer.
Due to the diversity of symptoms of mastocytosis patients, it is especially difficult to perform clinical studies with a single standardised endpoint. Symptoms include a broad spectrum ranging from minor inconveniences compromising the quality of life like flushing (redness), pruritus (itching), urticaria, abdominal pain (cramps), nausea, vomiting, heartburn, palpitations (tachycardia), dyspnoea (difficulty breathing) and hypotension (low blood pressure) to its most severe form with life-threatening hypersensitivity reactions (anaphylaxis). All these symptoms have as a common cause an excess of activated mast cells in various parts of the body. About 70% of mastocytosis patients feel that they suffer from a disability caused by their disease. 31
Example of a Goal Attainment Scale for a mastocytosis patient taken from the request for scientific advice at the EMA.
In the protocol assistance provided by the European Medicines Agency (EMA), it was recommended that the study is focused on patients with stable cutaneous and indolent systemic mastocytosis and that all patients are well educated about their disease, factors triggering symptoms and the use of rescue therapy. The EMA suggested performing an open (multinational) trial in this population, where patients and/or emergency physicians inject rhDAO, but to use a differentiated or individualised approach in defining suitable endpoints as triggers and presentation of anaphylaxis can vary substantially between patients. The outcome of such a trial could be a series of cases treated and observed under a common, well-defined protocol including a number of amendments or notifications to fit individual patient needs. Additionally they suggest to further explore the feasibility of a within patient, placebo-controlled, on top of standard of care study in those patients considered suitable for self-administration of rhDAO.
Note that the exemplary goal definition (Table 2) as proposed in the request for scientific advice can be refined by defining separate goals for tolerable temperatures and pressures. Furthermore, quantitative thresholds for temperature and pressure defining the attainment levels could be specified.
Operating characteristics of a parallel group trial with 30 patients compared to a cross-over trial with 15 patients for the example in Section 5 with expected effect size δ = 1.
Note: The power (104 simulation runs) and type I error rate (T1E, 105 simulation runs) are given for the unweighted (no w), patient preference weighted (patient w), and treatment effect weighted (effect w) scores. Note that under the null hypothesis, the treatment effect weighted and unweighted case coincide.
As expected, for a correlation of goal attainment levels within patients of
6 Discussion
We propose a probabilistic model for data from a GAS endpoint that can be useful to assess the appropriateness of different analysis approaches and to calculate the sample size for trials with a GAS endpoint. The proposed model covers a number of important features of these endpoints, such as the varying number of goals per patient and the individual choice of goals for each patient (reflected by modelling the treatment effects as a random variable). We focused on parallel group superiority trials and considered three testing procedures, t-tests on per patient mean scores or standardised mean scores as well as an analysis by a GEE approach. Randomised treatment allocation and the choice of goals and weights before randomisation avoids systematic imbalances between the treatment groups. In particular, systematic differences in the quality, attainability and importance of goals are prevented.
A clinical interpretation of the tested null hypothesis as well as the considered alternative hypotheses is challenging because the GAS endpoint depends not only on the preferences of the patients included in the trial but also on the scope of goals available to the patients and the process of choosing individual goals. The goals and patient preference weights, which are chosen before randomisation, can be considered as a baseline characteristic of the patient, as age or sex. The observed value of a GAS endpoint can be regarded as an attribute of the patient under the specific treatment, similar to the observed value of a traditional clinical endpoint. Therefore, observing a significant treatment effect in the mean (weighted) goal attainment in a randomised trial allows for the conclusion that, under the same process of selecting goals (and weights), on average the (weighted) goal attainment levels of other patients from the same population will be higher under treatment than under control. If the study sample cannot be assumed to be a random sample, but patients are randomised between treatment groups, a conditional interpretation is possible and study results may be generalised to a population with properties matching those of the study sample. The issue of conditional and unconditional inference is not specific to GAS endpoints but has been discussed for clinical trials with traditional clinical endpoints as well, as the applicability of the random sampling model for clinical trials has been challenged. Note that, at least asymptotically, conditional re-randomisation tests and unconditional tests often coincide (see e.g. Sections 4.1 and 4.2 of literature34,35 for a recent discussion). It is well understood for clinical trials with a traditional endpoint that the interpretation of trial results is only meaningful if the trial population is sufficiently specified by appropriate inclusion/exclusion criteria based on baseline characteristics. Furthermore, the possibility to generalise trial results may depend on how well the trial population including their goal preferences reflects a future patient population to be treated with the treatments under investigation. Similarly, for a clinical interpretation of results of a trial with a GAS endpoint, appropriate criteria for valid goals have to be specified. These criteria can help to assess the generalisability of trial results, similar to classical inclusion/exclusion criteria. The criteria need to be flexible enough to cover all important goals for a heterogeneous patient population, but need to make sure that the goals are relevant and clinically useful. The choice of the goal to reach a certain level of the experimental drug in the blood, for example, will typically not be a valid goal and may bias the analysis.
For the interpretation of outcomes of trials with a GAS endpoint, it is essential that the chosen goals are reported in a similar way as baseline variables are traditionally reported in randomised controlled trials. If a full listing of goals is not feasible, a categorisation of goals based on pre-specified categories and reporting of the corresponding frequencies can be an important tool for the interpretation. Furthermore, such a categorisation can be the basis for stratified randomisation and stratified analysis to ensure a balance across treatment groups and potentially increase the power of respective tests.
Goal attainment scaling can be viewed as a type of combined endpoint, where the components may change from patient to patient such that potentially only one observation per component is available. Therefore, the treatment effects for specific goals cannot be estimated. This limitation in the interpretation is a price for the possibility to choose individualised endpoints such that more heterogeneous patients can be included in the trial.
The interpretation of GAS endpoints shares several limitations with more standard combined endpoints, where the same components are measured for each patient. Especially, positive effects in some components can mask negative effects in other components. 36 For GAS this masking of effects is more difficult to assess because the analysis of individual components is not feasible if the goals are chosen individually. However, based on a categorisation of goals, a separate descriptive analysis for each of the categories could be considered.
If the number of goals per patient or the correlation between goal attainments varies between patients, not all patients provide the same amount of information on the treatment effect. Then, appropriate weighting of goals or patients can increase the power of hypothesis tests. For the considered data generating model, tests based on a GEE model lead to the highest power. The GEE model is based on a more efficient estimator of the average treatment effect across goals and patients compared to methods based on the mean or the standardised mean. Tests based on the standardised means, although widely used, have the disadvantage that they may not control the type I error rate conditional on the number of goals chosen for each patient.
While weighting of goals according to patient preferences can increase the clinical relevance of the GAS endpoints, such weighting reduces the power of hypothesis tests if the weights are not correlated with the treatment effect of the respective goals. This is due to the additional variability introduced by the weighting. On the other hand, power can be gained by choosing weights that correlate with the unknown treatment effects for the different goals. As an alternative to weights that are normalised within each patient, as discussed in Section 2.5, weights on an absolute scale could be used to reflect the relevance of a goal compared to different goals of different patients.
We also considered statistical testing procedures for cross-over designs with a goal attainment endpoint. As in such designs each patient serves as his own control, at most half the number of patients are required. If there is a strong correlation of goal attainments within patients, the required sample size is further reduced. For stable, chronic diseases and symptomatic treatments which are not disease modifying, the same goals can be used in both treatment periods. This not only reduces the variability for within patient comparisons, but can also limit the potential for bias as the analysis is stratified by goal. Allowing for different goals in the two treatment periods of randomised trials gives additional flexibility to adjust to changes in the patients’ needs, for example, due to a progressive disease. However, if goals or definitions of attainment levels are modified after randomisation of the treatment sequence, there is an additional potential for bias and blinding is paramount.
The power of the considered testing procedures for GAS endpoints depends on a number of factors, such as the number of chosen goals, the correlation of goal attainments within patients, the variability of goal attainments, the choice of the goal attainment levels, and the distribution of effect sizes for the chosen goals. The large number of parameters which are typically unknown at the planning stage imposes a substantial challenge for sample size planning. However, a few recommendations can be derived by our simulation study:
Number of goals: The power of a between-group comparison increases with the number of goals per patient, assuming the distribution of treatment effects and correlation of goal attainments stay constant as the number of goals increases. Sensitivity of goals: Including goals that are not affected by the treatment can lead to a substantial loss in power and should be avoided. Hypothesis tests: Overall the GEE approach has the largest power, with minimal inflation of the type I error rate for moderate sample sizes. T-tests for the per-subject means of goal attainment levels are the most robust testing approach and control the type I error rate in all considered scenarios. Dependency between goals: For parallel group designs, goals within a patient should be weakly correlated. The increase in power by adding goals is less pronounced, the higher the correlation between goals. Number of scale levels: A goal attainment scale with five levels appears to be sufficient, as a further increase in the number of levels has little effect on the power. Definition of GAS attainment levels: The optimal scale (maximising the power) depends on the effect sizes of the individual goals. Parallel or cross-over trial: If applicable, a cross-over design may allow for a substantial reduction of the required sample size.
Utilising a GAS endpoint in a clinical trial requires substantial additional effort and time resources, as well as particular training of involved study team members, to choose goals with each patient individually and to evaluate the goal attainment. However, in rare diseases with very heterogeneous symptoms, where traditional clinical endpoints are not sufficiently sensitive in the overall population, the possibility to include a broader patient population may outweigh the increased complexity. The proposed statistical model enables the assessment of the operating characteristics of trials with a GAS endpoint. This can be the basis to evaluate if a GAS endpoint should be chosen, to facilitate the planning of trials with a GAS endpoint and to guide the data analysis and interpretation.
Supplemental Material
Supplemental material for Statistical analysis of Goal Attainment Scaling endpoints in randomised trials
Supplemental material for Statistical analysis of Goal Attainment Scaling endpoints in randomised trials by S Urach, CMW Gaasterland, M Posch, B Jilma, K Roes, G Rosenkranz, JH Van der Lee and R Ristl in Statistical Methods in Medical Research
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work has been funded by the FP7-HEALTH-2013-INNOVATION-1 project Advances in Small Trials Design for Regulatory Innovation and Excellence (ASTERIX) (grant no. 603160). Website: ![]()
Supplemental material
Supplemental material for this article is available online.
Appendix 1. Computation of the first two moments of the discretised goals ( X ik ) k = 1 n i for model (6)
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
