Abstract

Introduction
‘Trials of therapy’, in which physicians ‘try out’ treatments and assess patients’ responses, are long-established, common elements of routine medical practice. Because ‘trials of therapy’ are usually informal, they may only be reported if treatments are associated with dramatic changes in a patient’s condition – whether by improvement or deterioration.
Our understanding of bias suggests that informal ‘trials of therapy’ – comparisons of patients’ condition before and after treatment – do not provide a trustworthy basis for inferring treatment effects. More sophisticated comparisons are usually needed: for example, comparing a patient’s responses when treatments are given or withheld (‘crossed over’) and conducting formal assessment of outcomes.
In 1676, Richard Wiseman (a surgeon to King Charles II) reported an unplanned experiment. He had prescribed a pair of laced stockings for a patient suffering from leg oedema. The stockings had reduced the oedema to the extent that the patient ‘was able to walk to his closet, and take the air in his coach, and was well pleased with them’. 1 However, someone suggested to the patient that the stockings might do him harm and persuaded him to remove them. His legs swelled up, he became confined to bed again and developed leg ulcers. Dr Wiseman waited six weeks for the ulcers to heal, restored the laced stockings, with the result that the patient recovered.
A century after Wiseman’s crude crossover trial of laced stockings, Caleb Parry,2,3 a doctor in Bath, England, published a more formal, planned use of between two and six crossover periods of variable duration in 13 patients, to compare the purgative effects of three varieties of rhubarb. Parry was unable to find any advantage of the more costly Turkish rhubarb compared with English rhubarb.
Parry’s ‘trials of therapy’ were important in having used at least two crossovers, but he took no steps to ensure that his and his patients’ assessments of the treatment effects were not influenced by his or the patients’ knowledge of the type of rhubarb being given. Fourteen years later, also in Bath, John Haygarth 4 compared the effects on rheumatism of a metal ‘tractor’ with a matched wooden (placebo) tractor. This demonstrated that the assumed treatment effects of the metal tractor resulted from patients’ imagination. 5
Haygarth’s study made clear that informal ‘trials of therapy’ can be plagued by false positives (due to placebo effects, physicians’ and patients’ desires to please, the pre-existing expectations of both parties and natural history). And they can also result in false negatives (patients destined to deteriorate and the intervention resulting in them remaining stable). Although more than a century passed after Haygarth before Paul Martini set out principles for designing unbiased crossover trials in his 69-page book,6,7 it appears that it was not until 1953 that serious scientific consideration was given to how controlled trials in individual patients could complement traditional parallel group trials. Hogben and Sim
8
recognised that: The now current recipe for a clinical trial based on group comparison sets out a balance sheet in which individual variability with respect both to nature and to previous nurture does not appear as an explicit item in the final statement of the account; but such variability of response to treatment may be of paramount interest in practice.
The experiment reported by Hogben and Sim is a methodological landmark (see Appendix 1 for a list of N-of-1 trials completed to date), celebrated more than half a century later by republication and commentaries in the International Journal of Epidemiology.10–12 One of the commentaries
12
summarises the features of the study: Because they used patient’s self-reported symptoms, they put a particular emphasis on careful blinding: the use of a placebo and keeping both clinical and patient unaware of the sequence of treatments. They were also concerned about the non-specific response to prostigmine so they used two comparators: dexamphetamine (a stimulant) and lactose (as an inert placebo). Their weighted analysis, based on concerns about wash-out and wash-in effects, also appears to be novel. Finally, with a minimum of eight periods for each treatment, they seemed to have set a new record for the number of crossovers in any crossover trial in an individual patient.
Baskerville et al. 15 were the first to apply principles of adaptive design to the N-of-1 model. Instead of fixed treatment periods, length was determined by adverse events, clinical deterioration, and patient preference. Their model was further expanded to account for typical crossover features, including carry-over effects. 16
N-of-1 trials come of age
In 1986, in the New England Journal of Medicine, a group of clinical investigators at McMaster University, Canada, published a paper entitled ‘Determining optimal therapy – randomized trials in individual patients’, in which they labelled such studies ‘N of 1 randomized control trials’. 17 Their interest had been prompted by a poorly controlled asthmatic patient treated with inhaled beta agonists, theophylline and prednisone. The N-of-1 trial they designed addressed the utility of the theophylline the patient was using. After the second paired block of theophylline and placebo, the patient ended the trial early: the results were clear to him, and, from the symptom diary he had been keeping, to the clinician who instituted the trial. When the blind was broken, it was clear that during the periods when the patient had been using theophylline his symptoms were much worse. Improvement was sustained when theophylline was withheld after the trial ended, with much better asthma control despite a reduced dose of steroids. The trial proved spectacularly helpful: improved symptom control, reduced drug burden and decreased costs.
Among the class of single patient/person study designs,18–20 N-of-1 trials are unique as rigorously controlled intervention studies that can provide a basis for inferring cause and effect. Though many variations exist, the work that originated at McMaster University focused on single patient trials with two or more pairs of treatment periods, one for the intervention and one for the comparator, ideally with blinding of both patients and healthcare providers (Figure 1). The outcome measures in such trials are the experiences of the patients, recorded using individualised, patient-reported outcomes.
Depiction of N-of-trial. Modified from Shamseer et al.
21

Clinicians have now formally reported on hundreds, if not thousands, of N-of-1 trials, exploring their utility in avoiding unnecessary treatment and improving patient outcomes, and also in facilitating drug development (See Appendix 1). Despite these reports, and the enormous potential that the originators saw for use of N-of-1 trials, their uptake has remained limited in the decades since 1986, although there have been recent signs of renewed interest.22–25
The N-of-1 niche
The N-of-1 trial identifies whether an intervention is likely to benefit or cause unwanted effects in an individual patient. The design is most suited to assessing interventions that act and cease to act quickly. It is particularly useful in clinical contexts in which variability in patient responses is large, when the evidence is limited, and/or when the patient differs in important ways from the people who have participated in conventional randomised controlled trials. Examples include conditions with quickly acting symptomatic treatment, in which variability in response is large (e.g. chronic pain, obstructive lung disease); conditions with a prevalence too low for large, parallel group randomised controlled trials; medically complex patients who differ substantially from patients who have participated in existing trials; and patients who have been treated over a long time when there is uncertainty about ongoing need for treatment (e.g. proton pump inhibitors in long-standing dyspepsia). Indeed, the applicability of the results of parallel group randomised clinical trials to individual patients (i.e. external validity) may sometimes be limited by narrow inclusion criteria and the exclusion of patients with co-morbidities and/or concurrent treatment Reviews of randomised controlled trials have found average exclusion rates of 73% and recruitment of less than 10% of patients with the primary diagnosis. 26 These concerns, however, should be tempered by knowledge that true subgroup effects are very unusual. 27 The real issue of importance to N-of-1 trials is the likelihood, in many instances, of large variability in responses among patients. 28
N-of-1 trial services
The result of their first N-of-1 trial inspired the team at McMaster to develop a full N-of-1 referral service to address patient dilemmas that met criteria for our N-of-1 designs: therapeutic impact was uncertain, the treatment target was to reduce daily or otherwise frequent symptoms, the intervention (typically a drug) worked quickly, and it quickly ceased acting. Within two years, the group had completed 57 N-of-1 trials. Results had provided a definite therapeutic answer in 88% of the patients studied and these results prompted 39% of physicians to change their prior-to-trial treatment plan. This experience led the McMaster team to offer guides for clinicians wishing to apply the N-of-1 concept in their own practice. 29 Ultimately, however, the clinical communities interest in conducting N-of-1 trials diminished and the service was terminated.
Eric Larson was in the audience at a presentation of the McMaster work at the American Federation for Clinical Research. 30 Appreciating the utility of the design, Larson developed an N-of-1 clinical service at the University of Washington. Over two years, Larson’s group completed 34 trials, again demonstrating that N-of-1 trials could provide physicians with useful treatment guidance in uncertain cases and improve patient satisfaction. 31 Unfortunately, funding for the service ran dry and it was discontinued.
In 1999, the University of Queensland in Australia created the first national N-of-1 research service, referred to as a ‘single patient trial service’. 32 The service was designed to acquaint general practitioners with research methodology and to introduce research-derived data into clinical decision-making for conditions where treatment effectiveness was uncertain. Physicians could refer their patients to the service, which was centrally located, and so used mail and telephone communication only. The service managed all major components of trial management: randomisation, preparing tablets, sending all materials to patients, following up, and relaying results to clinicians. Of the N-of-1 trials carried out by this service and which had available data, post-trial management decisions were consistent with trial results at 12 months in approximately 70% of attention deficit hyperactivity disorder trials 33 45% of osteoarthritis trials, 34 and 32% of neuropathic pain trials. 35 This is a successful example of how N-of-1 trials can be implemented at a national level, though, again, only as a temporary research initiative.
Another example of the versatility of N-of-1 trials began when the Complementary and Alternative Research and Education (CARE) programme at the University of Alberta established the first academic paediatric integrative medicine programme in Canada. In 2006, as part of this programme, a paediatric N-of-1 service responded to the increased use of complementary therapies in children with chronic conditions. The goal of this service is to offer an objective, evidence-based approach to assessing whether a given complementary therapy is effective for a specific patient. The service is designed to assist patients, their parents and referring physicians throughout all stages of the N-of-1 trial, including the design and implementation of the N-of-1 evaluation. For example, this service has assessed natural health products (e.g. melatonin, probiotics, micronutrients) and acupuncture for conditions including attention deficit hyperactivity disorder, eczema, sleep disturbances, chemo-induced nausea and vomiting, irritable bowel syndrome and autism.
N-of-1 in drug development
The McMaster group speculated that drug development might also benefit from use of the N-of-1 methodology. The reasoning was that pre-approval drug development costs are high (average $479–936 million USD36,37 and rising 38 ). Conducting N-of-1 trials before a costly large-scale randomised controlled trial could (a) help to assess early efficacy, (b) be less expensive than traditional approaches, and (c) identify predictors of response. 39
The idea of applying the N-of-1 approach to early drug development arose from experience with multiple N-of-1 trials in specific conditions. For instance, when what is now termed myofascial pain syndrome was labelled fibrositis and there had been one apparently positive randomised controlled trial of amitriptyline, the condition provided a framework for N-of-1 trials in early drug development. The McMaster team conducted 14 N-of-1 trials which demonstrated substantial benefit from amitriptyline at doses far lower than had been used for the primary indication for the drug, depression. 39 The McMaster team also demonstrated the utility of multiple N-of-1 trials in Alzheimer’s disease 40 and in the use of home oxygen in patients with chronic obstructive pulmonary disease. 41 In each of these situations the process appeared to be efficient, requiring limited cost and time investment. Nevertheless, subsequent attempts to apply the reasoning in drug development have been sporadic and unsuccessful.
Failure to revolutionise clinical practice: were N-of-1 trials ahead of their time?
Early experience was disappointing, shattering the initial optimism that N-of-1 trials would quickly revolutionise clinical practice. There had been some tantalising results, 42 but randomised controlled trials in which patients were randomised to conventional care or to N-of-1 trials generally failed to show dramatically convincing benefits of participation in the N-of-1 trials.43,44
At McMaster University, despite educating local clinicians, playing cheerleader, succeeding in conducting 73 N-of-1 trials over three years, 45 and inspiring other ‘N of 1 services’, interest still faded. An attempt to use venture capital to create an efficient, marketable service went nowhere. Thirty years after our initial publication, few clinicians have even heard of N-of-1 trials.
Sporadic reports of success with N-of-1 continue. For instance, Joy et al. 46 reported findings consistent with ‘the nocebo phenomenon’ – patients sometimes report side effects to placebo: 47 in seven patients with suspected but uncertain statin-associated myalgia, N-of-1 trials failed to detect any statin-related symptoms in any of the patients, allowing patients to continue the drugs. Despite such isolated reports of successes, clinicians seldom use N-of-1 trials and most remain unaware of the design.
Renewed interest in N-of-1 trials
At the University of Alberta, recent efforts have focused on methodological issues related to N-of-1 trial design and reporting. For example, N-of-1 trials have been criticised for their lack of generalisability. The Alberta group recently partnered with the Journal of Clinical Epidemiology to publish a series dedicated to N-of-1 trials and included papers to address this concern. A comprehensive systematic review of the design, analysis and meta-analysis of N-of-1 trials found that the majority (60%) of published N-of-1 trials are published as a series (i.e. one report publishing N-of-1 trial data about more than one participant for the same condition-intervention pair), suggesting their value beyond assessing individual treatment effects and their potential to provide more generalisable treatment effects. Indeed, the Oxford Centre for Evidence-Based Medicine 48 has classified N-of-1 trials as Level 1 evidence, comparable to systematic reviews of randomised controlled trials.
By virtue of their methods (i.e. use of randomisation, blinding, formal outcome assessment), the meta-analysis of N-of-1 trials may provide a valuable source of population data for conditions that have little to no randomised controlled evidence, and to help refine evidence when parallel group randomised controlled trials may exist.
N=100; number of published N-of-1 studies that have assessed treatments for the respective condition category (adapted from Punja et al. 51 ).
Challenges and future directions
Methodological considerations for N-of-1 trials differ from those for standard, parallel group randomised controlled trials. When considering N-of-1 trials as a research endeavour, investigators have proposed solutions to three major limitations among reported N-of-1 trials: incomplete reporting, marked variability in quality, and unacceptably high rates of prospective protocol registration.
First, as is the case with parallel group randomised controlled trials, lack of complete and transparent reporting is a problem in the N-of-1 trial literature. The Alberta group51 found that authors of N-of-1 trials failed to report on a number of critical design and conduct elements: trial registration (97%), whether individuals with co-morbid conditions (77%) or on concurrent therapies (69%) were included, and whether adverse events were assessed (64%). Another review confirmed that the quality of reporting of published N-of-1 trials was highly variable. 52 The Alberta group led the development of the CONSORT Extension for N of 1 Trials (CENT) in response to the limitations and heterogeneity in reporting,53,54 serving as a minimum checklist for reporting N-of-1 trials.
Second, careful development and reporting of N-of-1 protocols is necessary for researchers, ethics review boards and funders. The Alberta group is currently developing a SPIRIT Extension for N of 1 Trials (SPENT). This will recommend essential elements in N-of-1 trial protocols, in the expectation that this will help to improve the quality of published reports of N-of-1 trials and promote the inclusion of N-of-1 trial protocols in trial registries.
Third, only 3% of published N-of-1 trials are reported as having registered protocols prospectively. It is certain that not all N-of-1 trials are published and readily available (nor, for those conducted as part of optimal routine clinical practice, should they be) – unpublished trials begun as part of the research endeavour may create a risk of bias for future systematic reviews and meta-analyses. One way of capturing these trials would be to establish an electronic repository (as is done for conventional randomised controlled trials with clinicaltrials.gov) and encourage authors to register their N-of-1 trial protocols. This would help reviewers to identify selective outcome reporting and publication biases.
Beyond these challenges, emerging methodologies may facilitate optimal use of N-of-1 principles. Bayesian and adaptive designs have potential applicability to N-of-1 trials. Trials can be designed with preset points based on adverse effects or patient preferences to crossover, change dose or discontinuation. These methods can be used both to analyse and to meta-analyse N-of-1 trials.55,56 The strength of Bayesian approaches lies in their ability to maximise the use of reliable available information from each participant, as well as the use of reliable prior information for incorporation in the statistical model so that each N-of-1 trial can inform the next. Zucker et al. 56 have demonstrated the use of Bayesian methods to aggregate N-of-1 trials to yield estimates of population treatment effects. Combining Bayesian approaches with adaptive designs may prove to be a useful combination for future N-of-1 trials.
Discussion
What explains the failure adopt and sustain N-of-1 trials? The obstacles to conducting N-of-1 trials as an element of routine clinical practice have been too great. For many pharmacists, preparing identical drug and placebo combinations proved too labour-intensive. For clinicians, N-of-1 trials take too much time, even with easy-to-use guidance: 29 preparing questionnaires, instructing patients and examining the results all require clinician commitment. By comparison, the simple question, ‘did the treatment help’ is too easy, and has too much face validity, compared to the more onerous substitution of a formal N-of-1 trial. The late Professor Charles Bridge-Webb proposed a workaround to the expensive, time-consuming process of arranging placebo. 57 He suggested a simplified N-of-1, The Single Patient Open Trials (SPOTs), substituting the blinded trial for an open one. This trial trades pragmatism for rigour, particularly useful for independent practitioners without access to N-of-1 services.
The advent of technological advances may help to overcome the operational complexity and costs that have hindered the uptake of the N-of-1 methodology. The emergence of mobile electronic health devices makes it easier than ever for patients to engage in their own healthcare. The creation of an IT-based N-of-1 trial platform would help clinicians and patients to collaborate in designing their own N-of-1 trials, track health outcomes and produce a report of results for patients and clinicians to discuss. Researchers from the University of California, Davis, have developed a mobile application called the ‘Trialist’ specifically to facilitate the conduct of N-of-1 trials in clinical settings. They are testing the feasibility and efficacy of this application in a randomised controlled trial comparing the effects on patient outcomes of participating in a mobile N-of-1 trial versus usual care. 58
This potential for N-of-1 trials as a way of providing clinical care differs from its use as a research endeavour. The distinction comes down to the intent behind conducting an N-of-1 trial. If the objective is to inform treatment decisions for an individual patient, the trial is optimal clinical care and should therefore not require formal ethics approval 59 nor regulatory oversight from agencies monitoring clinical research. When choices from among two or more alternative treatments are being considered, patients should be informed about genuine uncertainties about their relative merits and how treatment should be selected in these circumstances. 60 Random allocation within formal treatment comparisons is one of the options that should be offered to patients.
If the primary purpose of N-of-1 trials is to produce generalisable knowledge to inform treatment decisions for future patients, these N-of-1 trials are more properly regarded as research. In these circumstances, compliance with methodological and ethics standards will be expected. In 2014, the Agency for Healthcare Research and Quality commissioned a user’s guide to N-of-1 trials, which clarifies this distinction. 24
N-of-1 trials may have a future, both as a research endeavour complementing standard trials and as a strategy for improving clinical care outside of the research setting. Unlike conventional parallel group randomised controlled trials, which assess what is best on average for a given population, N-of-1 trials assess what is best for an individual patient. 61 They are thus particularly well suited to emerging interests in patient-centred research and ‘precision’ or ‘personalised’ medicine. N-of-1 trials support the evolution of patient-centred research by offering an evidence-based approach for personalising care. They help to answer, for example, which treatment options are most effective through a process that strengthens the clinician–patient relationship and ultimately empowers the patient to be more engaged with their healthcare. Furthermore, with the advent of ‘big data’, and its hoped-for potential to inform care, N-of-1 trials can provide opportunities to learn how to improve care. The potential exists. The extent to which it will be realised remains uncertain.
