Abstract
Background
Established analysis methods for composite outcomes can be complex to interpret, clinically and statistically. To counter this, composite outcomes are frequently reduced to a binary outcome. This can simplify interpretation but there exists a danger that deeper understanding of patient experience is lost. Rank-based DOOR methodology is emerging as an attractive option, using all components of a composite outcome. DOOR methodology has been employed retrospectively to the TOPPIC trial (ISRCTN89489788) to determine if making fuller use of data could enhance study results.
Methods
TOPPIC patients were assigned a clinical outcome rank (1 = most desirable to 8 = least desirable). DOOR methodology was applied, testing the null hypothesis of a DOOR probability of 50% (no difference). Additionally, smokers and non-smokers were considered separately. Win ratio and ordinal logistic regression provided supportive analyses.
Results
Employing DOOR methodology demonstrated the distribution of ranks differed between treatment groups - DOOR probability of a more desirable outcome with active treatment of 54.6% (95% CI 53.8% - 55.4%, p < 0.0001). Considering smokers and non-smokers separately, this difference was amplified in smokers (DOOR probability 66.1% (95% CI 62.7% - 69.4%, p < 0.0001). Win ratio methodology and ordinal logistic regression showed no difference between treatments.
Conclusions
DOOR methodology was applied successfully to a previously published trial and shows potential to discriminate clinical outcomes effectively by deconstructing a binary endpoint of component parts into an ordinal outcome. This approach should be considered when designing trials, either as principal analysis of a rank-based outcome or as a confirmatory sensitivity analysis.
Background
Randomised clinical trials are recognised as the gold standard for clinical research as a means of evaluating the potential positive impact of new interventions while also considering risks. In order to produce findings that are both clinically and statistically meaningful in this complex research setting, composite primary outcomes are often proposed, made up of a number of individual components which are then dichotomised to a binary (yes/no) response. This approach increases the likelihood that each clinical trial participant contributes a measurable outcome (i.e. an event) relevant to the intervention under examination in comparison to a setting where a single primary (i.e. non-composite) outcome had been proposed. However, a disadvantage to the composite approach is that less severe events occurring early in the research pathway will dominate more severe events (e.g. death) that may occur later in the trial. This can lead to confounding issues in interpretation of trial results. Additionally, composite primary outcomes are in danger of over-simplifying the clinical question and may fall short of fully understanding important patient responses. This potential limitation may result in reduced statistical power for detecting important effects, resulting in larger sample sizes than necessary. An alternative methodological approach, developed in recent years, is presented here in order to expand more fully on the potential benefit-risk assessment in clinical trials. It is recognised that this alternative method may have greater pragmatism than the traditional methods of clinical trial reporting by recognising the various nuances of the patient experience.
Desirability of outcome ranking (DOOR)
The Desirability of Outcome Ranking (DOOR) is a process which characterises patients according to an overall clinical outcome constructed on the basis of important individual outcomes. 1 It is based on a longitudinal view of patient experience rather than measuring outcomes at a single point in time. This provides a comprehensive overview of the individual patient where patients in the same category have similar overall clinical outcomes while those in different categories display clinically relevant differences in terms of the overall clinical outcome. DOORs are mutually exclusive hierarchical levels, constructed by assigning higher ranks to patients with better overall clinical outcomes. Furthermore, RADAR (response adjusted for duration of antibiotic risk) is an enhanced version of DOOR which has adapted the DOOR methodology for trials that compare optimisation of antibiotic use, as demonstrated by Evans et al. 2
At the analysis stage, the DOORs are treated as an ordinal outcome with distributions of DOORs compared between strategies or interventions. This ordinal ranking approach addresses some of the limitations of a composite primary outcome which is treated as a binary (yes/no) measure. With a binary outcome, individual components are treated equally, but realistically they may have differing importance to patients and clinicians. Some components may occur more frequently and may dominate overall event rates.
The DOOR approach is gaining traction in current clinical trials as an attractive option for primary endpoint selection with trials building this outcome into the design and methodology from the outset of trial conception and design.3–6 Furthermore, results of previously published trials can be revisited retrospectively to explore alternative analysis approaches.7–13 These retrospective analyses demonstrate the feasibility of the DOOR approach. There is a growing body of evidence that this method is an effective means of assessing the effect of an intervention on multiple outcomes which are frequently over-simplified into dichotomous outcomes in original analyses.
With this in mind, DOOR methodology has been employed retrospectively to the TOPPIC trial 14 to determine if making fuller use of the available data could support and enhance the main study results.
The TOPPIC trial
The TOPPIC trial was a randomised, placebo-controlled, double-blind trial of 240 patients across 29 UK secondary and tertiary hospitals who had a confirmed diagnosis of Crohn’s disease and had undergone intestinal resection.
Patients were randomly assigned (1:1) to oral daily mercaptopurine (6MP) (N = 128) at a dose of 1 mg/kg bodyweight or placebo (N = 112). Patients were followed up for 3 years. The primary outcome was clinical recurrence of Crohn’s disease, captured as a binary outcome (yes/no). This binary outcome was made up of three individual components: i. A post-baseline Crohn’s Disease Activity Index (CDAI) score of >150, together with a 100-point increase in CDAI score from baseline ii. The need for anti-inflammatory rescue treatment iii. Primary surgical intervention. Component i. Was required to be satisfied in conjunction with either (or both) of components ii. and iii. The components of the primary outcome were independently adjudicated by clinicians blinded to treatment allocation to confirm clinical recurrence of Crohn’s disease.
In the original trial analysis, the primary outcome was modelled using Cox proportional hazards, making use of the timing of the recurrence of clinical recurrence of Crohn’s disease relative to randomisation. The principal analysis adjusted for centre and smoking status (randomisation stratification variables) and baseline values of previous treatment with mercaptopurine and Azathioprine. Unadjusted results were also presented. A number of pre-planned subgroup analyses of the primary outcome were undertaken.
The principal analysis showed that 16 (13%) of patients in the mercaptopurine (6MP) group versus 26 (23%) patients in the placebo group had a clinical recurrence of Crohn’s disease and needed anti-inflammatory rescue treatment or primary surgical intervention (adjusted hazard ratio [HR] 0.54, 95% CI 0.27–1·06; p = 0.07; unadjusted HR 0.53, 95% CI 0.28–0·99; p = 0.046).
In a subgroup analysis comparing smokers with non-smokers, three (10%) of 29 smokers in the mercaptopurine group and 12 (46%) of 26 in the placebo group had a clinical recurrence that needed treatment (HR 0.13, 95% CI 0.04–0·46), compared with 13 (13%) of 99 non-smokers in the mercaptopurine group and 14 (16%) of 86 in the placebo group (HR 0.90, 0.42–1·94) (pinteraction = 0.018). The effect of mercaptopurine did not significantly differ from placebo for any of the other pre-planned subgroup analyses (previous thiopurines, previous infliximab or methotrexate, previous surgery, duration of disease, age at diagnosis).
The trial results concluded that mercaptopurine is effective in preventing postoperative clinical recurrence of Crohn’s disease, but only in patients who are smokers.
Aims
The objective of this retrospective analysis of the TOPPIC trial was to determine the extent of clinical recurrence of Crohn’s disease in greater detail. Given that the primary outcome was a binary measure, this may not have accounted fully for the severity of the recurrence of Crohn’s disease since the constituent components of the outcome were reduced in complexity to a straightforward yes/no response.
Clinical recurrence could potentially occur in patients with varying experiences during the trial – some may have needed surgery and a number of instances of rescue therapy across a long time period, whereas other patient experiences may have only necessitated a single use of rescue therapy which still resulted in clinical recurrence of Crohn’s disease.
By analysing the binary primary outcome according to a different methodology which makes use of its individual components, a deeper understanding of the severity of Crohn’s disease for the patients in the TOPPIC trial could be uncovered. DOOR methodology was the primary approach employed for the re-analysis of the TOPPIC trial, with win ratio methodology and ordinal logistic regression also undertaken for comparison purposes.
Methods
DOOR
Based on primary trial data,
15
the methods recommended by Evans et al.
2
were followed in which a mutually exclusive clinical outcome rank specific to the TOPPIC trial, ranging from 1 to 8, was assigned to each patient (Figure 1). Clinical outcome DOOR ranks for the TOPPIC trial.
Rank 1 denotes the most favourable rank (no surgery, no rescue medications, no CDAI increase 1 ) while Rank 8 denotes the least desirable rank (surgery, rescue medications, CDAI increase). Ranks 2 to 7 are combinations of the composite primary endpoint in descending order of desirability. These ranks were approved as clinically sensible after discussion with the Chief Investigator of the original TOPPIC trial.
In addition to a clinical outcome ranking, the number of rescue medications required by TOPPIC patients was taken into consideration as part of the DOOR methodology. This aligns with the RADAR approach
2
in which the level of antibiotic use augments the clinical rankings under the assumption that less antibiotic use is better but cannot be at the expense of clinical outcomes. In place of antibiotic use in the RADAR example, the same rationale was applied to the number of rescue medications required during the TOPPIC trial. In this setting, TOPPIC patients were assigned a DOOR using a modified version of the 2-step process recommended by Evans et al.
2
1. Categorisation of all patients into an overall clinical outcome (Figure 1) 2. Rank patients according to 2 rules: i. When ranking the outcomes of 2 patients with different overall clinical outcomes, the patient with a better overall clinical outcome receives the higher rank. ii. When ranking the outcomes of 2 patients with the same overall clinical outcome, the patient requiring less instances of rescue medication receives a higher rank.
To compare the distribution of DOORs between the mercaptopurine and placebo groups, all possible pairwise comparisons of the DOOR were considered, comparing the 128 patients in the mercaptopurine (6MP) arm against the 112 placebo arm patients. The number of possible pairwise comparisons is the product of number of participants in each treatment arm (N6MP = 128, Nplacebo = 112, N6MP x Nplacebo = 14,336).
For each pairwise comparison, the number of ‘wins’ for all mercaptopurine patients was determined by counting the number of times a placebo patient had a lower (worse) ranking for that individual patient. Where the DOOR was the same for a pair of patients, this would be classed as a ‘tie’. The total number of ‘wins’ and ‘ties’ across all mercaptopurine patients was determined with the resultant DOOR probability calculated as:
This is the probability that a randomly selected participant assigned to mercaptopurine (6MP) would be ranked more favourably (have a more desirable outcome) than a randomly selected participant assigned to placebo. A 95% confidence interval around the DOOR probability was also constructed.
The null hypothesis in this setting is defined as a DOOR probability of 50%, indicating no difference between the groups. This DOOR probability is similar to the nonparametric Wilcoxon rank-sum test probability and can be viewed as an absolute, rather than relative, measure.
Lastly, the DOOR approach was applied to smokers (N = 55) and non-smokers (N = 185) separately as a means of aligning with the original findings of the TOPPIC trial.
Supportive analyses
Two supportive analyses were undertaken. Firstly, the win ratio matched pairs approach was undertaken in line with methods first proposed by Pocock et al. 16 This approach considers each matched pair (consisting of one intervention and one placebo patient considered to have the same or similar pre-identified risk profile) and labels intervention patients as ‘winners’ or ‘losers’ depending on who in the matched pair reaches the components of the composite outcome first (considering the components independently in order of decreasing severity i.e. the more major event first). The win ratio is the total number of winners divided by the total number of losers. This approach aligns well with DOOR methodology in terms of the concept of wins and losses and has been increasingly applied in clinical trials since its first introduction in 2012.17–19
The second supportive analysis was an ordinal logistic regression. This approach has two main advantages: i. Allowing for covariate adjustment and ii. The ability to include variables as random effects. 20 However, a major assumption of ordinal logistic regression is the assumption of proportional odds i.e. the effect of an independent variable is constant for each increase in the level of the response. Thus, this model will contain an intercept for each level of the response except one and a single slope for each explanatory variable.
All analyses were conducted using SAS statistical software, version 9.4 (SAS Institute, Cary, NC, USA).
Results
DOOR analysis
Individual components of the TOPPIC primary outcome, split by treatment allocation [n (%)].
Clinical outcomes for the TOPPIC population – all patients (N = 240) and primary outcome patients (N = 42) [n (%)].

Assigned clinical outcome DOOR ranks for the TOPPIC population, split by treatment group.
The number of patients classed as having clinical outcome 4 was more than double in the placebo group (25 of 112 patients, 22.3%) compared to 6MP (14 of 128 patients, 10.9%). The same number of patients in both groups were ranked as clinical outcome 8 (n = 3).
Patients experiencing the primary endpoint (clinical recurrence of Crohn’s disease) during the TOPPIC trial were assigned to clinical outcomes 4 and 8 only (Table 2). Six of the 42 patients experiencing the primary outcome in the original trial were appropriately allocated to Rank 8 (surgery, rescue medications, CDAI increase) under DOOR methodology. However, the remaining 36 patients experiencing the primary outcome were allocated mid-way in the ranking scheme to Rank 4 (no surgery, rescue medication, CDAI increase). This apparent mis-match is due to the requirement of experiencing a CDAI increase during the TOPPIC trial for a patient to have experienced the primary outcome. In contrast, under the DOOR ranking scheme, surgery was considered to be the most important factor defining occurrence of Crohn’s disease with a CDAI increase as the least important factor.
Results of alternative analyses using the TOPPIC study data.
aConsidering the extent of rescue medication in addition to clinical outcome ranking, akin to RADAR approach.
Furthermore, when considering the extent of rescue medication use in combination with clinical outcomes as the defined DOOR (akin to the RADAR approach 2 where the level of antibiotic use augments the clinical rankings), the results were almost identical (DOOR probability 54.6%, 95% CI 53.7% to 55.4%, p < 0.0001) (Table 3).
Lastly, smokers and non-smokers from the TOPPIC population were assessed separately under DOOR methodology. Similar to the overall population, the mercaptopurine and placebo groups differed for both subgroups, more markedly so for the smokers. In smokers, the DOOR probability of having a more desirable outcome with mercaptopurine was 66.1% (95% confidence interval [CI] 62.7% to 69.4%, p < 0.0001), while for non-smokers the likelihood of a more desirable outcome with mercaptopurine was much lower at 51.1% (95% confidence interval [CI] 50.1% to 52.2%, p < 0.0355) (Table 3).
Win ratio
Pocock’s win ratio approach based on matched pairs was applied to the TOPPIC trial data. A requirement for the formation of pairs is to have equal sized treatment groups. For the TOPPIC trial, there was a slight mis-match in the number of patients in each treatment group with 112 in the placebo group and 128 in the 6MP group. Therefore, a maximum of 112 pairs was possible in the matching stage of this analysis. Where imbalance between groups exists, it is recommended that patients are randomly excluded from the treatment group with the higher number of patients. 19 The 112 TOPPIC placebo patients were matched with the available 128 6MP patients, making sure they were matched for smoking status (stratification variable for randomisation) and previous treatment with 6MP and Azathioprine (baseline covariates). This resulted in a match in 96 cases out of a possible 112.
To identify winners and losers in the matched pairs, the component parts of the TOPPIC primary outcome were considered in turn in the same order of assessment as the DOOR ranking process: i. Surgery ii. The need for rescue medication iii. Increase in CDAI score.
The win ratio was calculated as 1.69 (95% CI 0.94 to 3.39, p = 0.083) which indicates that patients in the 6MP group are 1.69 times more likely than those in the placebo group to ‘win’ or have a positive outcome, although this result was not statistically significant (Table 3).
Ordinal logistic regression
The results of the ordinal logistic regression analysis showed no difference between treatment groups. The adjusted analysis gave an odds ratio of 1.50 (95% CI 0.87, 2.57, p = 0.144) and the corresponding unadjusted analysis supported these results (odds ratio 1.36, 95% CI 0.81, 2.29, p = 0.249) (Table 3). An odds ratio greater than one here indicates that patients in the 6MP group have a higher probability of experiencing the best outcome (Rank 1) and a lower probability of the worst outcome (Rank 8) in comparison to patients in the placebo group. This analysis has the requirement for the proportional odds assumption to hold which proved to be more complex than anticipated i.e. the assumption did not hold for all variables in the model. This is likely as a result of quasi-complete separation of the data since two of the eight possible ranks contained no TOPPIC patients (Ranks 5 and 6) and one rank contained only one patient (Rank 7). These results are likely to be relatively unreliable, given the complexities around the proportional odds assumption.
Conclusion
An alternative DOOR analysis approach was applied to a previously published randomised controlled trial assessing clinical recurrence of Crohn’s disease. Using this approach, it was found that patients randomised to mercaptopurine (6MP) were significantly more likely to have a more desirable outcome than those patients in the placebo group. This effect increased when focussing on patients who were smokers.
Clinical outcome ranks for TOPPIC patients experiencing the primary outcome in the original trial were allocated to Ranks 4 and 8. The clinical outcome ranking approach identified the most severely affected patients in the TOPPIC trial, assigning 6 patients to the least desirable outcome (Rank 8). These 6 patients experienced the primary outcome in the original TOPPIC trial. However, the remaining 36 patients experiencing the primary outcome in the TOPPIC trial were assigned to Rank 4 using this alternative methodology. This apparent mis-match can be explained by the requirement for an increase in CDAI score to have occurred as part of the composite primary outcome in the original trial, with only one of the remaining components (surgery or rescue medication) required to have occurred in tandem with a CDAI increase.
Furthermore, in the TOPPIC population as a whole, the revised ranking approach has captured degrees of severity in patient outcomes by not only taking into account use of rescue medication as a binary outcome, but also considering the overall number of rescue medications taken across the trial duration. There was very little difference in results when considering the DOOR ranking with no consideration for rescue medication use compared to the rankings taking into account incidence of rescue medication use. This is likely due to there being little difference between the treatment groups in rescue medication use (median = 0 in both groups).
For initial re-analysis purposes, it was assumed that both elements of the CDAI score should be fulfilled i.e. a post-baseline score >150 and change from baseline >100. This simplified the ranking process while also aligning with the original TOPPIC trial methodology. If these two elements of the CDAI scored were to have been considered as separate components going forward, this would increase the number of DOOR ranks for analysis.
There are a number of strengths to this alternative analysis of the TOPPIC trial data. Firstly, the rank-based DOOR approach does not depend on any data distribution, nor is it reliant on the proportional odds assumption, as would be the case for other analysis methods e.g. ordinal logistic regression. The original primary outcome was a composite outcome, reduced to a binary measure for reporting and statistical analysis. Here the composite outcome has been expanded to be ordinal in nature, a means which recognises different states of Crohn’s disease, rather than a straight yes/no outcome. Furthermore, limiting the number of ranks to eight to capture the components of the TOPPIC composite primary outcome ensured a sensible and pragmatic approach to this retrospective analysis. The ranking methodology is an intuitive one and provides results that are straightforward to interpret from a clinical and statistical perspective.
There are limitations to the DOOR methodology which became evident in this retrospective analysis. Most of the 240 TOPPIC patients were assigned to 5 of the 8 designated clinical outcome ranks with the majority (80 (62.5%) and 64 (57.1%) of patients in the 6MP and placebo arms, respectively) assigned to the most desirable rank (Rank 1: no surgery, no rescue medications, no CDAI increase). It may have been advantageous from a purely analytical perspective if the patients in the TOPPIC population were assigned more evenly across the 8 clinical outcome ranks. This lack of even spread across ranks was particularly apparent when considering surgery, the most clinically important component of the DOOR construct. Only 7 of the 240 (3%) TOPPIC patients required surgery during the trial with 6 of the 7 patients being assigned to Rank 8. Furthermore, it is important to acknowledge that time has not been accounted for in the clinical outcome rankings i.e. does the timing of the occurrence of a CDAI increase or an increase in the requirement for rescue medication either earlier or later in a patient’s pathway alter the ranking scheme? This would have added to the complexity of the ranking process and would have lost the essence of the original objective of this alternative methodology. In more general terms, the DOOR approach is worth considering when there is an even spread of ranks across the patient population and where the number of ranks is manageable, both from a logistical assignment perspective, but also at the results stage where the alignment of clinical and statistical interpretation of results is key. A further limitation of this study is that the determination of the most appropriate rankings was based on the judgement of a single clinical specialist. Dependence on one expert opinion may introduce bias; consequently, future research of this nature would be strengthened by incorporating the perspectives of multiple clinicians.
Applying win ratio methodology as an alternative to the DOOR construction demonstrated a similar approach to using the trial data more fully. The win ratio results aligned with those from the DOOR analysis in that patients in the 6MP group were more likely to experience a positive outcome compared with placebo patients (albeit this was not statistically significant using the win ratio approach). The difference in results may be due to the different approach in selecting pairwise comparisons in the TOPPIC population. DOOR methodology makes use of all participant data, whereas the matched pairs approach for win ratio methodology requires the formation of pairs to be based on equal sized treatment groups. The matching process also reduced the number of available valid pairs, resulting in 96 pairs of a possible 112 pairs in the TOPPIC trial (the number of patients in the placebo group). The matched pairs approach is recommended, provided a pre-defined basis for matching exists. 16 This was the case for the TOPPIC data which matched patients for smoking status (stratification variable for randomisation) and previous treatment with 6MP and Azathioprine (baseline covariates).
It should be noted for both of these ranking approaches, results are likely to be sensitive to the assigned ordering hierarchy. Re-ordering of this hierarchy could result in a preferable conclusion if the initial ordering did not yield the desired results. Therefore, it is important to pre-specify the planned hierarchy and outcome definitions prior to any formal statistical analyses (i.e. at the statistical analysis plan stage).
The supportive ordinal logistic regression provided a more conventional approach to utilising the data and had the advantage of including covariate adjustment in the model. However, for the TOPPIC data, the requirement for the proportional odds assumption to be satisfied proved more complex than initially anticipated, perhaps due to some of the ranks being sparsely populated and therefore the results should be considered as less reliable.
Overall, the main learning from taking a DOOR ranking analysis approach is that it has the potential to discriminate clinical outcomes more effectively. By deconstructing a binary endpoint based on component parts into an ordinal outcome, the gradations of the patient experience can be understood more fully. This approach should be given consideration when designing future clinical trials, either as the principal analysis of a rank-based primary outcome or as a sensitivity analysis to support and confirm the principal results of any primary outcome.
Footnotes
Acknowledgements
The author thanks Professor Jack Satsangi, Chief Investigator for the TOPPIC trial, who provided valuable advice regarding the most clinically appropriate ranking order of the individual components of the original primary outcome and reviewed the manuscript. The author thanks Professor Steff Lewis and Mr Richard Parker (both of Edinburgh Clinical Trials Unit) for their review of the manuscript and recommendations for improvement.
Ethical approval
The original TOPPIC trial received ethical approval from Scotland A Research Ethics Committee (ref. 07/MRE00/74) and informed written consent recorded for all participants.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Trial registration
ISRCTN89489788.
