Abstract
The FatiGo trial concluded that multidisciplinary rehabilitation treatment is more effective for chronic fatigue syndrome/myalgic encephalomyelitis in the long term than cognitive behaviour therapy and that multidisciplinary rehabilitation treatment is more cost-effective for fatigue and cognitive behaviour therapy for quality of life. However, FatiGo suffered from a number of serious methodological flaws. Moreover, it ignored the results of the activity metre, its only objective outcome. This jeopardizes the validity of FatiGo. Its analysis shows that there was no statistically significant difference between multidisciplinary rehabilitation treatment and cognitive behaviour therapy and neither are (cost-)effective. FatiGo’s claims of efficacy of multidisciplinary rehabilitation treatment and cognitive behaviour therapy for chronic fatigue syndrome/myalgic encephalomyelitis are misleading and not justified by their results.
Keywords
Introduction
While many researchers have reported a range of abnormalities in myalgic encephalomyelitis (ME), which is also called chronic fatigue syndrome (CFS), or ME/CFS (Carruthers et al., 2011), doctors and scientists still struggle to understand the underlying mechanism of this serious multisystem disease, and as a consequence, there is little in the way of effective treatment (Institute of Medicine (IOM), 2015). ME is affecting millions of patients worldwide, 25 per cent of whom are severely affected and bedridden (Carruthers et al., 2011; IOM, 2015).
However, according to a small but influential group of psychiatrists and psychologists, who view ME as a behavioural problem, cognitive behaviour therapy (CBT) and graded exercise therapy (GET) are effective treatments for this disease (White et al., 2011). Guidelines worldwide have promoted both treatments as the only evidence-based effective treatments. A large part of this evidence was provided by the PACE trial, the biggest CBT and GET trial for CFS so far. It involved 640 patients and costed £5 million, the equivalent of US$7 million according to the current exchange rate. The trial concluded that both treatments were effective and led to recovery in 22 per cent of cases (White et al., 2013).
A review, an editorial, two re-analyses of the released individual participant data and a special edition of the
The British National Institute for Health and Care Excellence (NICE) has recently announced, partly in response to the problems of the PACE trial that it will be performing a full upgrade of their CFS/ME guidelines (NICE, 2017b), and it has released a document that it will review 13 pieces of evidence that were selected from approximately 300, highlighted to NICE. One of these is the FatiGo trial (NICE, 2017a) by Vos-Vromans et al. (2016a), who concluded that their trial showed that multidisciplinary rehabilitation treatment (MRT) is more effective than CBT (Vos-Vromans et al., 2016b).
This re-analysis of the FatiGo trial will examine if MRT is an effective treatment for ME, based on objective evidence.
NICE uses the term CFS/ME. This analysis will do the same to avoid any confusion.
Background information
The FatiGo (Fatigue-Go) trial was a multicentre, randomized controlled trial involving 122 patients with CFS/ME. It compared the efficacy of CBT and MRT which consisted of ‘CBT and, depending on the individual analysis, elements of body awareness therapy, gradual reactivation, pacing, mindfulness, gradual normalization of sleep/wake rhythm and social reintegration’. FatiGo used two subjective primary outcomes (fatigue and health-related quality of life) and a number of subjective secondary outcomes including one objective outcome (the activity monitor also known as the actometer). Outcomes were assessed prior to treatment and at 26 and 52 weeks after treatment initiation, that is, at the end of treatment and 26 weeks later. FatiGo concluded that ‘MRT is more effective in reducing long-term fatigue severity than CBT in patients with CFS’ and ‘patients showed an improvement in quality of life over time, but between-group differences were not significant’ (Vos-Vromans et al., 2016a).
The protocol
Patients were selected between December 2008 and January 2011. Therapy lasted 6 months. The protocol was submitted on 17 March 2011 (accepted on the 16 April 2012 and published on the 30 May 2012 (Vos-Vromans et al., 2012) even though ‘a fundamental principle in the design of randomized trials involves setting out in advance the endpoints that will be assessed in the trial, as failure to prespecify endpoints can introduce bias into a trial and creates opportunities for manipulation’ (Evans, 2007). Therefore a protocol should be published before the start of a trial and not when it (in FatiGo’s case) has (almost) finished.
Problems with the design of the study
FatiGo was an unblinded trial with two treatment groups without a ‘placebo’ control group that used two subjective primary outcomes. Even though, according to a systematic review of the interventions for the treatment and management of CFS by Whiting et al. (2001), one of the problems with subjective outcomes is that patients ‘may feel better able to cope with daily activities because they have reduced their expectations of what they should achieve, rather than because they have made any recovery as a result of the intervention’. Therefore more objective measures of the effect of any intervention should be used.
Also, unblinded trials should use objective primary outcomes alone or in combination with subjective ones to avoid the erroneous interference of efficacy in its absence (Edwards, 2017; Lilienfeld et al., 2014). FatiGo could have done this very easily by using their objective secondary outcome (the activity monitor) as a primary one. Why it did not do it is unclear. The risk for false-positive results was made even bigger because there was a large difference in treatment hours between the MRT (44.5) and the CBT (16) groups (Vos-Vromans et al., 2016a). This creates serious biases towards finding a positive effect for the intervention regardless of whether it is effective or not (Coyne, 2016).
Selection criteria
Only 33.5 per cent (122/364) of those screened for the FatiGo trial actually entered it. FatiGo used the Fukuda criteria which require at least 6 months of chronic fatigue in combination with a minimum of four out of eight symptoms (Vos-Vromans et al., 2016a). The problem with these criteria, even though they are the most commonly used criteria for CFS/ME, is that the main characteristic of the disease, an abnormally delayed (muscle) recovery after trivial exertion (Ramsay, 1988), which in this day and age is often called post-exertional malaise, is an optional requirement and not compulsory (Fukuda et al., 1994). The consequence of this is that a group of patients selected by using the Fukuda criteria also includes patients with depression labelled as CFS/ME patients, whereas both the Canadian Consensus Criteria from 2003 and the International Consensus Criteria from 2011 differentiate patients with ME from those who are depressed and identify patients who are more physically and cognitively disabled (Carruthers et al., 2011). Also, at trial entry, patients had to fill in the hospital anxiety and depression scale yet no mention is made by FatiGo of how many of the participants suffered from depression and/or anxiety. In a study by Moss-Morris et al. (2005) that also used the Fukuda criteria, as many as 30–42 per cent of the sample were suffering from depression and anxiety respectively according to the authors themselves. This is of particular importance, as a meta-analysis by Tolin (2010) found that CBT is the most effective treatment for depression. Therefore, including patients with depression might lead to the erroneous inference of efficacy of CBT for CFS/ME in its absence.
The biopsychosocial model
FatiGo was based on the biopsychosocial model that after a viral infection, which has been cleared by the body, there is no underlying illness anymore. Instead, patients have developed the belief that they suffer from a physical illness. This leads to the avoidance of exercise and activity and results in deconditioning which is the cause of their problems/symptoms. CBT for CFS/ME, which is different from ‘ordinary’ CBT, was designed to modify these dysfunctional beliefs and behaviours and usually includes a graded increase in activity (Vos-Vromans et al., 2016a). The biopsychosocial model is an assumption- and opinion-based model for which objective evidence has never been presented and which is at odds with the physiological abnormalities in CFS/ME (Vink, 2017a).
Results
Primary outcomes
The primary outcomes were measured at the end of treatment and 6 months later (26 and 52 weeks after initiation of therapy) (Vos-Vromans et al., 2016a). According to a systematic review by Whiting et al. (2001), it is essential not to rely on the outcome at the end of treatment but to wait at least 6–12 months to remove the naturally-occurring fluctuation of the disease. Therefore, FatiGo’s results 6 months after the end of treatment better reflect the efficacy of its treatment.
As can be seen in Table 1, the mean checklist individual strength (CIS) fatigue scores (scale 8–56; lower scores mean less fatigue) 52 weeks after the start of the trial were 33.8 (MRT) and 40.1 (CBT), and the scores of patients in the MRT group had improved by 5.7 compared to CBT. The entry score for the trial was 40 or more (Vos-Vromans et al., 2016a). This means that after CBT, which according to the authors has been proven to be effective for CFS/ME (Vos-Vromans et al., 2016b), patients were still ill enough to re-enter the FatiGo trial. This confirms the outcome of the PACE trial, the biggest CBT and GET trial for CFS/ME so far, which also found that after ‘effective’ treatment, the mean scores of both subjective primary outcomes (the Chalder fatigue questionnaire and the Short-Form 36 (SF-36) physical functioning questionnaire) showed that patients were still ill enough to re-enter the trial (Vink, 2016; White et al., 2011).
CIS fatigue scores.
CIS: checklist individual strength; MRT: multidisciplinary rehabilitation treatment; CBT: cognitive behaviour therapy.
CIS fatigue scores: scale 8–56; lower scores mean less fatigue.
Sources: 1: Korenromp et al. (2011); 2: Vercoulen et al. (1999); 3: Beurskens et al. (2000); 4: Worm-Smeitink et al. (2017); 5: Nijhof et al. (2016); 6: Rongen-van Dartel et al. (2014); 7: Vos-Vromans et al. (2016a); 8: Soetekouw et al. (2000); Servaes et al. (2002); Bleijenberg (2006); Torenbeek et al. (2006); Knoop et al. (2007a); Van Hoogmoed et al. (2010); Voet et al. (2010); Korenromp et al. (2011); Smits et al. (2011); Droogleever Fortuyn et al. (2012); Rongen-van Dartel et al. (2014); Verhaak et al. (2016); Poort et al. (2017); 9: Kalkman et al. (2004); 10: Tieleman et al. (2010).
The CIS fatigue score of 33.8 after MRT is only minimally better than a score of 35 or more, which according to the literature means that patients are severely fatigued (Bleijenberg, 2006; Droogleever Fortuyn et al., 2012; Knoop et al., 2007a; Korenromp et al., 2011; Poort et al., 2017; Rongen-van Dartel et al., 2014; Servaes et al., 2002; Smits et al., 2011; Soetekouw et al., 2000; Torenbeek et al., 2006; Van Hoogmoed et al., 2010; Verhaak et al., 2016; Voet et al., 2010). As can be seen in table 1, the mean score of 17.3 for healthy controls of a similar age (Vercoulen et al., 1999; Vos-Vromans et al., 2016a) is much better. Also Korenromp et al. (2011), who explored fatigue in sarcoidosis patients in clinical remission, concluded that the ‘fatigue severity mean score [30.5] … was high’. This makes it even more difficult to understand why FatiGo deemed a mean CIS fatigue score of 33.8 after MRT, which represents fatigue that is a lot higher then the score of 30.5, as proof that MRT is effective.
The other subjective primary outcome was the health-related quality of life scores. The scores for its physical component summary after 52 weeks (26 weeks after the end of treatment), measured by the SF-36 were 40.2 (MRT) and 36.7 (CBT) (scale 0–100; higher scores indicate a better quality of life). According to the study itself, ‘no significant differences in quality of life were found between the groups’ (Vos-Vromans et al., 2016a). A study by Farivar et al. (2007) of 7093 patients, who received medical care from an independent association of 48 physician groups in the western United States, found that their mean physical health summary score was 62.2. The abovementioned scores therefore indicate that the physical quality of life was still poor.
The activity monitor
The activity monitor, a Sensewear Pro Armband, an armband the size of a watch weighing 45 g, was used to measure the effect of the treatments on patients’ physical activity level objectively. The monitor has two integrated accelerometers that measure the intensity of acceleration and deceleration with higher counts indicating a higher degree of physical activity. The activity monitor was the only objective outcome of the trial (Vos-Vromans et al., 2016a). Its results were published in a table but not discussed in the article. Not publishing or ignoring the results of outcomes measured is a form of reporting bias that jeopardizes the validity of a study (Heneghan et al., 2017). Analysis of the objective activity monitor results from three Dutch studies that did not publish these results in the original publication, by proponents of the biopsychosocial model themselves, showed that CBT did not lead to objective improvement (Wiborg et al., 2010).
Analysis of the activity monitor results shows that patients’ physical activity level had objectively improved by 5.8 (MRT) and 6.5 per cent (CBT) at 52 weeks respectively. The subjective fatigue scores had improved by 33.4 (MRT) and 21.5 per cent (CBT) (Vos-Vromans et al., 2016a). There is an inverse relation between fatigue and activity (Rongen-van Dartel et al., 2014). The more tired you are the less active you become and when your tiredness decreases your activity level will increase. Therefore the percentage of subjective decrease of fatigue should be the same or similar to the increase in activity. The activity monitor results however show that this wasn’t the case.
FatiGo also concluded that ‘At 26 weeks, there was no significant difference in fatigue severity between the CBT and MRT groups. After the end of treatment at 26 weeks, the reduced level of fatigue was sustained until 52 weeks of follow-up in patients who received MRT; during this period, the mean fatigue level of the patients in the CBT group increased. The fact that MRT resulted in a larger effect at 52 weeks is especially relevant in patients with CFS, which typically follows a chronic course’ (Vos-Vromans et al., 2016a).
However, the analysis of the activity monitor shows the following. At 52 weeks patients in the CBT group had improved by 2.5 per cent compared to 26 weeks yet patients in the MRT group had deteriorated by 3.5 per cent. Also, at 52 weeks even though patients’ fatigue had subjectively improved by 11.9 per cent in the MRT group compared to the CBT group, in reality there was a minimal objective negative effect (0.7%) of MRT compared to CBT according to the activity monitor results. Moreover the rates of improvement for the MRT group at the end of treatment and 26 weeks later were not significantly higher than those for the CBT group (p-values according to Vos-Vromans et al. (2016a) were 0.10 and 0.85 respectively).
The aforementioned PACE trial, which used an adaptive pacing therapy (APT) and also a specialist medical care (SMC) control group, showed that after treatment there were no clinically significant differences according to the step test and the six minute walk test between the 4 groups in the study (CBT, GET, SMC and APT). According to the 6 minute walk test patients in all four groups would have still been ill enough to be on the waiting list for a lung transplant (Vink, 2016; White et al., 2011). The number of patients that were able to work had decreased and the number of patients receiving illness and disability benefits had increased. Also there was a 100 per cent increase in the proportion of participants in receipt of income protection or private pensions in the CBT and GET groups.
Other studies that used objective outcome measures (activity monitor and neuropsychological testing) had shown that CBT does not lead to objective improvement (Knoop et al., 2007b; Wiborg et al., 2010). Stordeur et al. (2008) analysed the efficacy of CBT and GET in the Belgium CFS knowledge centres. Just like the PACE trial, their analysis found that after treatment, less people were able to work and more people were receiving illness benefits. It was also found that (sub) maximal exercise testing with VO2max showed that CBT and GET do not lead to objective improvements. This shows that CBT and GET were ineffective and might suggest that they were also harmful.
Dropout rate
Even though only 14.8 per cent (18/122) dropped out, at 52 weeks, activity monitor results were not available for 34.4 per cent (42/122) of participants. Patients who drop out of therapy are not a random sub-sample of all clients. Those who do not improve or suffer adverse reactions are the ones most likely to drop out of treatment. Yet many researchers and studies do not take this into account, and as a result ‘may conclude erroneously that their treatments are effective merely because their remaining clients are those that have improved’ (Lilienfeld et al., 2014).
The economic evaluation
In its economic evaluation, FatiGo concluded that MRT is more cost-effective for fatigue and CBT for the quality of life, if the EQ-5D-3L quality of life scores of their secondary outcome are used (Vos-Vromans et al., 2017). Yet, as previously discussed, after MRT, patients were only minimally better than the severely fatigued. A study by Olesen et al. (2016), consisting of 20,220 adult patients, found a mean EQ-5D-3L quality of life score of 0.84 for the total population and 0.93 for people without a chronic condition. The mean EQ-5D-3L quality of life scores in FatiGo after CBT (0.61) and MRT (0.69) were still worse than in stroke (0.71), ischaemic heart disease (0.72) or colon cancer (0.74) (higher scores indicating a better quality of life) (Hvidberg et al., 2015). Moreover, a score of 0.69 (MRT) equals that of people with four chronic health conditions and a score of 0.61 (CBT) is almost the same as the score (0.60) for people with five or more chronic health conditions (Olesen et al., 2016). This confirms that neither MRT nor CBT are effective and ineffective treatments cannot be cost-effective.
Recommendation for CBT removed
Two American government agencies, the Centers for Disease Control and Prevention (CDC, 2017) and the Agency for Healthcare Research and Quality (AHRQ) (Smith et al., 2016), have recently removed (CDC) and downgraded (AHRQ) their recommendations for CBT and GET because there is insufficient evidence that these treatments are effective. Norwegian oncologists have recently shown that if muscle cells from healthy people are put into contact with serum from CFS/ME patients, that their cellular energy production starts to malfunction just as it does in the cells of patients themselves (Fluge et al., 2016). This indicates that something in the serum of patients is directly or indirectly affecting the cellular energy production. Tomas et al. (2017) recently confirmed that there are problems with the cellular energy production. A literature review showed that we have known since the 1990s that there are energy production problems at cellular level in CFS/ME (Vink, 2015). Therefore, it is no wonder that behavioural interventions, like MRT, CBT and GET, are not effective as patients have been saying for decades (Action for ME, 2011; Bringsli et al., 2014; De Kimpe et al., 2016; Geraghty et al., 2017; ME Association, 2015).
Discussion
The FatiGo trial concluded that MRT is more effective for CFS/ME in the long term than CBT (Vos-Vromans et al., 2016a). It also concluded that MRT is more cost-effective for fatigue and CBT for the quality of life (Vos-Vromans et al., 2017). However, analysis of the study shows that it suffered from a number of serious methodological flaws. The unblinded trial used two subjective primary outcomes (fatigue and quality of life). This combination is known to lead to the erroneous inference of efficacy in its absence. The likelihood of this was made even bigger because of the large difference in contact hours between the two groups (44.5 vs 16) even though these should be the same. The only way to correct for these problems in unblinded trials is by using well-designed control groups and objective primary outcomes (Edwards, 2017; Lilienfeld et al., 2014).
Another fundamental design flaw of the trial was that it compared CBT against MRT which was CBT plus a number of things. However these were not properly specified, as they were tailored to the individual needs. Yet, in a properly designed trial, patients in a treatment group should all receive the same treatment. Furthermore, the trial did not have a ‘placebo’ control group (for example relaxation therapy, specialist medical care or pacing) to correct for the placebo effect and other confounding factors.
Furthermore, the fatigue scores showed that neither MRT nor CBT were effective and the mean EQ-5D-3L quality of life scores after CBT and MRT were the same as for people with five or more (CBT) or four chronic health conditions (MRT) (Olesen et al., 2016). Moreover, quality of life was still worse than in stroke, ischaemic heart disease, colon cancer (Hvidberg et al., 2015), the total population or in people without a chronic condition (Olesen et al., 2016).
Also FatiGo ignored the results of the activity monitor, its only objective outcome measure. Analysis of these results showed that CBT and MRT at best only led to a minimal objective improvement of 5–6 per cent. The trial suffered from many methodological problems as discussed before. For example, post-exertional malaise, the cardinal feature of the disease was not compulsory for diagnosis. Patients with co-morbid depression or anxiety were not excluded from the study even though that has been recommended by an international group of experts including the main proponents of the biopsychosocial model in 2003. It was recommended by consensus because ‘the presence of a medical or psychiatric condition that may explain the chronic fatigue state excludes the classification as CFS in research studies because overlapping pathophysiology may confound findings specific to CFS’ (Reeves et al., 2003).
Moreover, other trials that used a ‘placebo’ control group and objective outcomes did not show any objective or clinical significant improvement (Vink, 2016; White et al., 2011). Therefore, it is likely that this minimal improvement was caused by the natural fluctuation of the disease, the inclusion of patients who do not have the disease, the absence of a properly designed control group, the high percentage of patients who were excluded from the trial (66.5%), and the high percentage of participants (34.4%) for whom there were no activity monitor results, as those who do not improve or suffer adverse reactions are the ones most likely to drop out of treatment (Lilienfeld et al., 2014) and/or other confounding factors and not by the treatments under investigation. But, even if this was not the case, then no one would classify an operation, an antibiotic or any other treatment as effective if it would lead to just 5–6 per cent improvement. Even more so as a major criterion for defining CFS/ME is a reduction in physical capacity of at least 50 per cent compared to pre-illness levels (Fukuda et al., 1994; Holmes, 1988).
Conclusion
The FatiGo trial suffered from a number of severe methodological flaws. On top of this, it ignored the results of the activity metre, its only objective outcome. Its analysis shows that MRT and CBT are neither effective nor cost-effective. Re-analysis of FatiGo also shows that one should be extremely careful accepting claims of efficacy of psychological interventions in the absence of objective proof to support such claims. Even more so when trials use objective outcomes but ignore the results, as was the case in FatiGo, or even worse, when they do not report them at all.
Footnotes
Acknowledgements
The authors would like to thank M.V.’s parents for typing out his speech memos.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship and/or publication of this article.
