Abstract
China was the first country in the world that approved mifepristone (RU-486) for abortion. A total of 6 years after the report published in the Western world indicated that mifepristone may also be effective in treating endometriosis, the first paper on the same topic was published in China in 1997. Since then, over 160 studies on this topic have been published in China. We retrieved 104 papers on clinical trials and trial-like studies conducted in China evaluating the use of mifepristone to treat endometriosis that were published in the last 11 years. We found that the quality of these studies is well below an acceptable level, making it difficult to judge whether mifepristone is truly efficacious. There are intriguing signs that these studies, as a whole, have serious anomalies. The areas that are glaringly deficient are informed consent, choice of outcome measures, the evaluation of outcome measures, data analysis and randomization. The uniformly low quality is disquieting, given the large quantity of studies, the enormous amount of resource and energy put into these studies and, above all, the weighty issue of treatment efficacy that concerns each and every patient with endometriosis. Equally disquieting are the low-quality repetition, the absence of a critical, systematic review on the subject, the lack of suggestions for multicenter clinical trials and the seemingly unnecessary duplication of clinical trials without due informed consent. In view of this, it may be time to institute changes in attitude and practice, and to change education and training programs in the methodology of clinical trials in obstetrics and gynecology research in China.
Endometriosis & the need for medical treatment
Endometriosis, characterized by the ectopic presence of an endometrial gland and stroma, is a common and debilitating gynecological disorder with an enigmatic pathogenesis [1]. Its presenting symptomology includes dysmenorrhea, dyspareunia, chronic pelvic pain and subfertility. Although the treatment of choice is currently surgery, medical treatment is often needed, either as a first-line therapy or owing to the high recurrence risk after surgery [1].
The use of progestins to treat endometriosis resulted from a somewhat serendipitous clinical finding in the 1950s that endometriosis-related symptoms were apparently resolved during pregnancy [2]. With the advent of oral and intrauterine contraceptives in the 1960s, their use in treating endometriosis also was proposed [3]. In 1973, danazol, an isoxazole derivative of 17a-ethinyl testosterone, began to be used in treating endometriosis [4]. The role of danazol was recognized as an antigonadotropin or a gonadotropin inhibitor [5,6]. With the recognition of estrogen dependency in endometriosis and the elucidation of the negative feedback mechanism in the hypothalamic–pituitary–gonadal axis at approximately the same time and, in particular, the availability of a gonadotrophin releasing hormone agonist (GnRHa), Buserelin, in the early 1980s, GnRHa was introduced as an alternative treatment for endometriosis [7].
While the three treatment modalities – progestins, androgenic agents and GnRHas – are somewhat effective in relieving endometriosis-associated pains, the relief of pain appears to be relatively short-term [8]. In addition, they have many undesirable, and sometimes severe, side effects [9–11]. Consequently, one strong impetus for endometriosis research is to search for more efficacious medical treatment, preferably with more tolerable side effects and cost profiles.
Mifepristone & its potential in treating endometriosis
In 1980, a synthetic steroid compound, known as RU-486, was discovered by scientists at Roussel-Uclaf Company in France during the development of novel glucocoricoid receptor antagonists. Its antiprogestegenic activity was soon realized and it was later used as an abortifacient under the name mifepristone [12]. The WHO and the Royal College of Obstetricians and Gynaecologists have demonstrated that a combination of mifepristone followed by misoprostol is the most effective and safe medical method for inducing abortion in the first and second trimester of pregnancy [13].
Mifepristone was the first drug known to compete with progesterone at its receptor and the first member of the selective progesterone receptor modulators or SPRMs, agents that act on the progesterone receptor with tissue-specific agonist/agonist profiles of action. In 1991, Kettel et al. reported that in six women with endometriosis, the administration of RU-486 100 mg/d for 3 months resulted in an improvement in pelvic pain in all subjects without a significant change in the extent of disease as evaluated by follow-up laparoscopy [14]. Based on a case series, the same group also reported that the lower doses of RU-486 were also effective in relieving endometriosis-associated pelvic pain and causing regression of endometriosis in the absence of significant side effects [15–17]. The effect of suppression of ectopic implants was also confirmed in cynomolgus monkeys with induced endometriosis [18].
Despite these earlier promising reports, no clinical trials on the use of mifepristone to treat endometriosis seem to have been conducted since then in the USA or other parts of the world, except China. It is likely that the lack of availability of mifepristone in the USA may have contributed to this situation. Indeed, Roussel-Uclaf did not seek US approval, probably bowing to pressure from prolife groups and the importation of mifepristone, often labeled as the abortion pill in the media, was banned by the US FDA during the first Bush administration in 1989. It was approved for abortion by the FDA in 2000, yet it is not available to the public through pharmacies as other prescription drugs are – its distribution is restricted to specially qualified licensed physicians under the tradename Mifeprex [12].
In China, clinical trials examining mifepristone for induced abortion began as early as 1985, at the time when there was a great demand for abortificient in order to rigorously enforce the ‘one family, one child’ policy. Since the homemade abortificient, Trichosanthin, which was derived from the herb Trichosanthes kirilowii, did not work very well and was not easy to use, mifepristone, with its excellent abortion-inducing efficacy and ease of use, was a fortuitous and timely development. In 1986, mifepristone was quickly approved for termination of pregnancy for up to 49 days gestation in China [19], making China the first country in the world to approve mifepristone.
A total of 6 years after the first report on the use of mifepristone to treat endometriosis was published by Kettel et al. [14], the first clinical study conducted in China on the same topic was published in 1997 [20] following a brief review paper on the topic [21] and an animal study [22], both published in 1995. Since then, over 160 clinical studies have been published in Chinese medical journals; many of them appear to be clinical trials or trial-like studies. These papers are almost always published in Chinese in journals usually confined to China that are not easily accessible to western medical professionals. Such a copious research publication on the efficacy of a single drug is not very common in clinical studies.
Motivations for this review
After over 160 published papers over a period of 15 years, it is perhaps reasonable to expect that certain conclusive findings regarding the efficacy of mifepristone may have arisen from these studies, given the enormous economic burden that endometriosis has and the debilitating nature of the disease. Has any verdict been reached from these studies? What is the verdict, if any? Do the benefits of mifepristone outweigh the risks for treating endometriosis?
Before examining these questions in greater detail, it should be noted that clinical studies designed to evaluate the efficacy of potential therapeutics are prone to biases of various kinds. For this reason, there are universally accepted principles and guidelines for the design, execution and analysis of clinical trials. In fact, harmonization of clinical trial protocols was shown to be feasible across countries of the EU approximately 30 years ago. Coordination between Europe, Japan and the USA led to a joint regulatory-industry initiative on international harmonization named after 1990 as the International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use [201]. Currently, most, if not all, clinical trial programs worldwide follow the ICH guidelines, aimed at “ensuring that good quality, safe and effective medicines are developed and registered in the most efficient and cost-effective manner”. These activities are pursued in the interest of the consumer and public health, to prevent unnecessary duplication of clinical trials in humans and to minimize the use of animal testing without compromising the regulatory obligations of safety and effectiveness’ [201].
With these guidelines, it seems logical to assess the quality of those published clinical studies on mifepristone to treat endometriosis. In other words, how trustworthy are these published studies?
Objective
The objective was to assess the quality of published clinical trials or trial-like clinical studies, all conducted and published in China, which evaluated the efficacy of mifepristone in relieving symptoms of endometriosis and in improving the fertility of women with endometriosis. Herein, trial-like studies refer to those clinical studies that have the ‘look and feel’ of a clinical trial aimed at the evaluation of a drug of interest, yet in many cases lack the methodological vigor and reporting quality normally required for a clinical trial. As previously stated, this is not a meta-analysis of these published studies.
Methods of the review
We followed the Quality of Reporting of Meta-analyses (QUOROM) guidelines [23] in the report of this review.
Search strategy
We searched the Cochrane Central Register of Controlled Trials, PubMed and the following Chinese language electronic databases: Chinese Biomedical Literature Database (CBM), Chinese Science & Technology Journals (VIP account), and China National Knowledge Infrastructure (CNKI) for all publications starting from 1979 to 1 April, 2010. We identified relevant studies using the terms ‘RU-486’, ‘mifepristone’, ‘clinical trial’, ‘endometriosis’, ‘endometrial ovarian cyst’, ‘chocolate cyst’ and a combination of them. Manual searches of bibliographies of all relevant trials or trial-like studies were also conducted independently by Maohua Liu and Fanghua Shen. Review articles and commentaries were excluded. There was no language restriction, but there were restrictions on the population and regions in which the study was conducted, namely, all studies had to be conducted within mainland China.
Study selection & validity assessment
We included studies if they met at least one of the following criteria:
They were clinical trials or trial-like clinical studies of mifepristone compared with controls (i.e., placebo, nonmifepristone or conventional medicine [CM]);
The diagnostic criteria of endometriosis included ultrasonography or other imaging techniques (if ovarian endometrial cyst), laparoscopic diagnosis or criteria established by the Integrated Chinese and Western Medicine Diagnostic Criteria for Endometriosis [24].
No restriction on the formulation, dosage or route of administration was made. The contents of 165 retrieved citations were reviewed independently by two investigators (Maohua Liu and Fanghua Shen) to determine if they met eligibility criteria for inclusion. Discrepancies were resolved by consensus.
Of the 165 studies retrieved for detailed assessment, 104 fulfilled our inclusion criteria [25–128].

The identification, selection and assessment of included clinical studies.
Data extraction
All data from eligible studies were independently extracted by Maohua Liu and Fanghua Shen with a standard protocol
Quality assessment of included studies
Well designed and executed, properly powered randomized clinical trials (RCTs) can provide unequivocal evidence for or against the efficacy of a drug. Based on the basic principles of design, execution of RCTs and the characteristics of endometriosis, we developed a list of criteria for evaluating the quality of an included study
Criteria for assessing the quality of a clinical study.
Was the study described as randomized?
Was the study described as double blind?
Was there a description of withdrawals and dropouts?
An affirmative answer to each of the above questions would give 1 point while a negative one would yield none. Additional points were given if:
The method of randomization was described in the paper, and that method was appropriate
The method of blinding was described, and it was appropriate
Points would however be deducted if:
The method of randomization was described, but was inappropriate
The method of blinding was described, but was inappropriate [129]
Thus, the Jadad scale or score of a study would range from 0 to 5.
For reference purpose, we also chose three recently published clinical trials on endometriosis [130–132]. Their quality and Jadad scores were also evaluated independently by Maohua Liu and Fanghua Shen.
Measures of the therapeutic efficacy
The primary outcome measure used in almost all included studies was the OER. Following the guidelines set by the Ministry of Health of China [133] and the Integrated Chinese and Western Medicine Diagnostic Criteria for Endometriosis [24], OER was defined to be the proportion of patients who underwent the treatment and who were later found to be cured, or significantly improved, or improved. Here, cured was also defined to be the complete disappearance of all symptoms, disappearance of pelvic mass or palpable tenderness; for women with infertility, cured was defined to be successfully getting pregnant or giving birth to a child within 3 years after the treatment. The significant improvement was defined to be at least two events of the following:
Near or complete resolution of dysmenorrhea or dyspareunia
Near or complete resolution of lumbosacral pain or lower abdominal pain
Near or complete disappearance of nodules, along with significantly reduced or disappearance of palpable tenderness
Improvement was defined to be at least two events of the following:
Reduced dysmenorrhea or dyspareunia
Reduced lumbosacral pain or lower abdominal pain
Slightly deflated nodules, along with reduced or disappearance of palpable tenderness
Ineffective treatment was defined to be at least one of the following:
Unabated dysmenorrhea or dyspareunia
Unabated lumbosacral pain or lower abdominal pain
No change in nodular lesions
Other outcome measures included dysmenorrhea remission rate, cyst shrinkage rate, PR and RR. However, more often than not, these rates were not explicitly defined; for example, the recurrence rate was often defined as the recurrence of some symptoms without specifying the length of follow-up after the start of the treatment. For this review, we only considered OER, RR and PR.
Statistical analysis
For descriptive statistics, we used boxplot [134] to graphically depict the distribution of data, in which the bottom and top of the box represent the lower and upper quartiles, respectively, the band near the middle of the box represents the median, and the ends of the whiskers represent the smallest and the largest nonoutlier observations. The comparison of distributions of continuous variables between two or among three or more groups was made using the Wilcoxon test and Kruskal-Wallis test, respectively. Pearson's correlation coefficient was used when evaluating correlations between two variables when both variables are continuous. When at least one variable is ordinal, Spearman's rank correlation coefficient was used instead.
To evaluate which factors were associated with the difference in OER, RR or PR between the mifepristone and the control groups, a multiple linear regression model was used in conjunction with a stepwise regression.
In order to observe whether publication bias exists in reporting the efficacy of mifepristone, funnel plots were employed [135]. In funnel plots, risk (for being effective, recurrent or pregnant) estimates, such as odds ratios (ORs) or log ORs, are placed on the horizontal axis against some measure of study size or precision, such as the standard errors (SEs), on the vertical axis. In the absence of selection biases (such as publication bias and bias in inclusion criteria), true heterogeneity (i.e., size of effect differs according to study size) and data irregularities (such as poor methodological design of small studies, inadequate analysis, and fraud), studies of large or small sample sizes should be symmetrically scattered around the true log OR in the funnel plot. Hence the plot should have the shape of a funnel with a wide opening on the top (due to
sampling variability), with the tip of the funnel pointing to the bottom and centering on the true log ORs [135]. The choice of log OR instead of OR is a result of the fact that the SE of OR is somehow related with the OR, while the SE of log OR is purely a function of sample sizes in different exposure-disease status combinations. The use of log ORs also renders ORs that are greater or less than 1 symmetric at 1. When SEs are used, funnel plots are considered to be an excellent tool to detect publication bias [136]. p-values of less than 0.05 were considered to be statistically significant. All computations were made with R 2.10.1 [137,202].
Role of the funding source
The sponsors of this study had no role in study design, data collection, data analysis, data interpretation or writing of this report.
Results
There were five different control groups in the 104 studies: unmedicated (i.e., surgery alone); GnRHa; danazol; progestins; and other medications (including traditional Chinese medicine and norethisterone). No study used a placebo as a control. There were a total of five studies that evaluated the efficacy of mifepristone without performing surgery, one used danazol control and the other four used other medications. In the remaining 99 studies, surgery was performed to remove endometriotic lesions, followed by treatment with mifepristone, no medication (unmedicated), GnRHa, danazol, progestins or other medications. The number of studies and their year of publication can be seen in

The number of publications of clinical studies using mifepristone to treat endometriosis, as a function of year of publication.
The 104 studies had recruited a total of 10,003 patients. The sample size of these studies ranged from 16 to 350, with a median of 88.5. A substantial proportion of studies did not report basic information on their studies or recruited patients, such as age, the revised American Fertility Society classification stage and length of follow-up. Some important outcome measures, such as recurrence rate and/or pregnancy rate, were also not reported
The number and the proportion of studies not reporting basic information.
rAFS: Revised American Fertility Society classification.
Characteristics of the 104 retrieved studies.
These numbers, when added up, exceeded 104 because 38 studies had more than two arms. GnRHa: Gonadotrophin releasing hormone agonist; SD: Standard deviation.
Quality of the retrieved mifepristone studies
The quality scores by the two evaluators correlated very closely (r = 0.97, p = 2.2 × 10−16;

Scatter plot of quality scores assessed independently by Maohua Liu and Fanghua Shen.
For the 104 studies, their quality scores ranged from 34 to 78, with a median of 59.3 and a mean standard deviation of 58.5 (7.8), which were significantly lower than that of the reference study group (p = 0.003;

Histogram of the quality scores of mifepristone studies.
The Jadad score demonstrated the same results. The 104 studies had an average Jadad score of 1.2 (0.80). In fact, only four (3.8%) studies had a Jadad score of 3. No study had a Jadad score higher than 3, which is the minimum standard for the study's results to be included in any systematic review or meta-analysis [138]. Approximatley 73 or 70.2% of the studies had a score of lower than 2. By contrast, the three reference studies all had a full Jadad score of 5, which, as a group, were significantly higher than the mifepristone studies (p = 0.002).
When broken down into 10 composition categories, we can see from

Average quality scores (all standardized to a maximum of 10) of the 104 studies, broken down into 10 different categories.
The trend of quality scores
In addition, we examined the time trend of the average quality score among all mifepristone studies. From

Boxplot of quality scores of mifepristone studies over time.
To identify which categories the increase in quality, or lack of it, is the most notable, we plotted the time trend for the standardized quality scores in nine categories – except for the consent category. It can be seen from

Boxplots of standardized quality scores, in individual category, of mifepristone studies over time.
Factors associated with the quality score
We examined potential factors that may have influenced the quality of the published mifepristone studies. We examined the IF score of the journal in which the study was published, the number of authors, the sample size of the study, whether surgery was used, whether the control was unmedicated, or used GnRHas, danazol, progestins, or other treatment, as well as the year of publication. Univariate analysis indicated that the journal IF score (r = 0.20, p = 0.038) and the surgery (p = 0.01) were both positively associated with the quality score. The multiple linear regression analysis only identified whether surgery was used (p = 0.007) as the only variable positively associated with the quality score.
Outcome measures
In view of low quality and the Jadad scores in these studies, we next examined the outcome measures used in these studies to see whether there is any anomaly or idiosyncrasy. First we produced a scatter plot of the OER and RR in the control and mifepristone groups. For the five types of controls, the correlation coefficient in OER between the mifepristone and unmedicated, GnRHa, danazol, progestins, and other treatment groups was 0.17 (p = 0.19), 0.81 (p = 0.01), 0.85 (p = 6.4 × 10−7), 0.88 (p = 1.4 × 10−12) and 0.88 (p = 0.0001), respectively. While the correlation coefficient between mifepristone and the unmedicated groups was not significant, the trend appeared to be linear. Further examination revealed that the two studies with the smallest sample sizes (mifepristone and unmedicated groups combined) – one 27, and the other, 35 – were apparently outliers (the two both had 100% OER in the mifepristone group). Removing the two studies yielded a correlation coefficient of 0.30, which was significant (p = 0.02;

Scatter plots of the outcome measures.
The situation for recurrence rate was similar, but in the following we only focus on the mifepristone and the unmedicated groups since the other control groups did not have sufficient sample size. The correlation coefficient in recurrence rate between the mifepristone and the unmedicated groups was 0.23 (p = 0.07). Yet one study had only 27 patients and appeared to be an outlier
Normally, the efficacy of an experimental drug should not depend on that of the standard treatment used in the control group. Thus, the positive correlation between the experiment and the control groups, consistently seen in OER and RR, signaled something peculiar and was worth investigating further. Therefore, we attempted to determine which factors were associated with the rate in the mifepristone group.
We performed multiple linear regression analyses of the OER, recurrence rate and pregnancy rate in the mifepristone group using the respective rate in the control group, quality score of the study, year of publication, proportion of revised American Fertility Society classification III/IV patients, number of authors, dose of mifepristone, IF score of the journal in which the study was published, length of follow-up, the square root of the sample size of the mifepristone and the unmedicated groups. Interestingly, we detrmined that for OER, the difference in rate was associated with the OER in the control group (p = 2.9 × 10−9), dose of mifepristone (p = 0.03), and the number of patients in the control group (p = 0.002).
Using the same method, we found that the recurrence rate in the control group is the only variable that is associated with the difference in recurrence rate between the mifepristone and the unmedicated group (p = 2.0 × 10−16). Each 1% increase in the recurrence rate in the control group would decrease the difference by 0.92%. Strangely, the length of follow-up was not associated with the recurrence rate.
For pregnancy rate, while the correlation coefficient between the mifepristone and unmedicated groups was 0.37, it did not reach statistical significance (p = 0.12; n = 19). The pregnancy rate in the mifepristone group did correlate positively with the length of the follow-up period (r = 0.76, p = 0.01; n = 10;
Relationship among outcome measures
For mifepristone and unmedicated groups, we plotted the recurrence rate against OER

Scatter plot of (A) recurrence rate (RR) versus OER for the mifepristone group; (B) RR versus OER for the unmedicated group; (C) RR versus OER for the GnRHa group; and (D) RR versus OER for the progestin group.
Since the definition of ‘effectiveness’ and that of recurrence both involve pain symptoms, especially when the former was not always defined within a specific timeframe, one would expect that these two rates would be somehow negatively correlated. We found that while in the unmedicated group recurrence rate (RR) correlated negatively with OER (r = −0.35, p = 0.006), it did not correlate with the OER in the mifepristone group (r = −0.08, p = 0.48). In addition, for the GnRHa and progestin groups, no such correlation was found (all p > 0.2).
Indications for publication bias
In a funnel plot, all calculated ORs would be approximatley 1 (or 0 if the log scale is used) if there is no difference in OER, RR, or PR between the mifepristone and the control group. However, the funnel plot for OER

Funnel plot of odds ratios between mifepristone and unmedicated groups.
For the unmedicated group, the publication bias did not seem to be evident
Discussion
This study provides a systematic evaluation of the design, execution, and reporting of clinical trials and trial-like studies on the use of mifepristone to treat endometriosis published in China during 1999–2010. We found that:
The majority of included studies had a low quality score, effectively in the fail category
Less than 4% of studies had acceptable quality scores yet none of the included studies had an acceptable Jadad score of 4 or above
Overall, there was a significant difference in both quality and Jadad scores between mifepristone and the reference studies
In general, the outcome measures used in most, if not all, studies are questionable and are open to debate
Consistent with the quality assessment, there is indication that the overall effective rates reported in many studies are problematic, as revealed by the positive correlation in OERs between the mifepristone and the control groups
There is indication that publication bias is present in these studies.
Much ado about nothing?
Aside from the low quality and the Jadad scores, the lack of some basic information on the recruited patients found in the majority of these studies
Signs of problems
There are suggestive signs that these studies, as a whole, have serious problems. The areas with glaring deficiency are, of course, informed consent, bias in evaluating outcome measures, data analysis and randomization. However, there are other less conspicuous signs: correlation in OER and RR between the mifepristone and the control groups, the dependence of the magnitude of the difference in OER on sample size and the lack of dependence of recurrence rate on the length of follow-up period.
It is possible that the OER in both mifepristone and unmedicated groups may depend on the experience and skills of the practicing physicians in the hospital where the study was conducted, resulting in the correlation of OER in both groups. However, there has been no formal documentation that the different hospitals may have different OERs and RRs even though their patient populations are similar. The difference in the presence of negative correlation between the OER and RR in the unmedicated group but not in the mifepristone group suggests that there could be other explanations; for example, systematic biases, perhaps unbeknownst to authors of the study, in the outcome evaluation may exist. The signs for publication bias
The lack of dependence of RR on the length of follow-up is certainly problematic, since one quintessential measurement in quantifying recurrence is the time to recurrence after surgery [139]. This lack of dependence, coupled with the correlation in RR between the mifepristone and the unmedicated groups, casts serious doubts as to how useful these reported RRs are, especially in view of the fact that many studies did not even report the length of follow-up period.
Informed consent
Very few of the 104 studies mentioned informed consent, thus resulting in no point in the ‘consent’ category of the quality score. The requirement for consent to participate in a human experimentation such as a clinical trial is the first principle of the Nuremberg Code and a major component of the Declaration of Helsinki. It provides an instrument for the protection of the patient's rights and an important safeguard against unethical behaviors for personal and professional gains.
While the concept of informed consent is only approximately 20 years old in China, and while ordinary people in China may not feel quite comfortable participating in a clinical trial and may thus be apprehensive reading a consent form, the informed consent is absolutely necessary for any clinical trial and should and can be implemented with more education.
Bias
Next to consent, the quality score in the bias category was low
Randomization
Nearly 95% of the included studies did not adequately describe the randomization procedure, hence the low quality score in the randomization category. Inadequate reporting of research methods and processes and providing a misleading explanation of randomization by referring to a nonrandomized study as an RCT are parallel and are a worldwide problem [140–142]. The problem appears to be more glaring for RCTs conducted in China. A recent study reported that among 3137 purported RCTs only 6.8% of them adhered to accepted methodology of randomization and could be deemed authentic RCTs [143]. This signals that many investigators did not fully understand the essence of randomization, or simply made no attempt to carry out randomization. On the other hand, the peer review in Chinese journals also needs to be dramatically improved [143].
Causes for concern?
The overall low quality in the 104 studies spanning over a decade is certainly somewhat troubling. There are other issues that are equally disquieting. First, while we did find several review papers on the status of research on mifepristone as therapeutics for endometriosis, there has been no attempt, at least in published form, to systematically review the status and progress in this area. Most review papers are uncritical and certainly uninspiring, merely cataloging and rehashing published results.
Second, we did not observe any clinical trials or trial-like studies reporting findings from two different institutions. Worse, no one has ever suggested pooling resources to carry out a multicenter RCT to unequivocally address the question of efficacy. Everyone just repeats, more or less, what others have published.
Third, for a single drug, over 100 trials and trial-like clinical studies over a span of 11 years seem to be somewhat excessive, especially when many of these studies were conducted without any informed consent. One of several impetuses that prompted the International Committee of Medical Journal Editors (ICMJE) to announce, in 2004, that its journals would not publish the results of any clinical trial that had not been appropriately registered at ClinicalTrials. gov or another qualified public registry by 13 September, 2005 [144], is to avoid unnecessary duplication of clinical trials. By repeating over and over again clinical studies involving a potential drug with known side effects, the investigators effectively put patients at risk unnecessarily. This is not fair to the unsuspecting patients.
Finally, we note that the quality and the Jadad scores used in this review did not reflect some recent changes in raising the standard of quality in design, execution and reporting of clinical trials. The mandatory requirement for the registration of clinical trials as a prerequisite for publication has now been instituted by major international biomedical journals [145], and the Consolidated Standards of Reporting Trials (CONSORT) [140] has also been recently updated. Besides the 10 categories that we used, a quality study is now also expected to report allocation concealment mechanism, sample size justification, registration and funding sources. Hence, while the quality score of all 104 studies remained more or less the same in the last 11 years, the actual gap in quality of clinical trials between China and the West may be widening.
Limitations of study
Our study has several limitations. First, since the data extraction and quality scoring were by two junior investigators (Maohua Liu and Fanghua Shen), it is possible that they may have missed some studies and the scoring may be erroneous. However, before data extraction the two investigators were given detailed instructions as to how the search should be done. The search was carried out with utmost care. It is unlikely that they would miss any relevant studies, especially those with higher quality (which tend to be published in journals with higher IF and thus higher visibility (and thus less likely to be missed). The scoring was also performed independently and with great care. When in doubt, they always turned to the senior authors for clarification and guidance. The close correlation in quality scores between them strongly indicate the scoring was properly carried out.
Second, we used a quality score that we developed on our own. While we did not purposefully validate the scoring, the scoring list was developed based on the core principles of clinical trials and was not biased particularly against any studies. Consistent with this notion, the scoring system correlated with the Jadad score, indicating that our scoring system is valid for our purpose. The informed consent category is merely one component of it, comprising only 5 points of the quality score. The list actually did not include more recent requirements of CONSORT, such as allocation concealment mechanism, trial registration and disclosure of funding source. The correlation of the quality score and the Jadad score indicates that the quality score captures the essence of quality and should be appropriate.
Conclusion & future perspective
The quality of design, execution and reporting of clinical trials and trial-like studies conducted in China on the use of mifepristone to treat endometriosis that were published in the last 11 years is well below an acceptable level, making it difficult to judge whether mifepristone is truly efficacious. There are intriguing signs that these studies, as a whole, have serious problems. The areas that are glaringly deficient are informed consent, choice of outcome measures, evaluation of outcome measures, data analysis and randomization.
The uniformly low quality of mifepristone studies is alarming, given the large quantity of these studies the enormous resources and energy put into these studies, and, above all, the weighty issue of treatment efficacy that concerns each and every patient with endometriosis. While the peer review system should share part of the responsibility, the education and research systems also share the responsibility. Unless dramatic measures are taken to change this, clinical studies of this kind are not going to be sustainable and may backfire. Given the weighty issues of drug efficacy and safety that concern each and every patient with endometriosis, the time is ripe for a full evaluation of the problem.
Executive summary
A total of 104 clinical trials and trial-like studies have been conducted in China on the use of mifepristone to treat endometriosis, all of them published in the last 11 years in Chinese journals and all are not easily accessible to medical professionals outside of China.
The majority of the 104 studies had a low quality score, effectively in the ‘fail’ category.
Less than 4% of studies had acceptable quality scores, yet none of the included studies had an acceptable Jadad score of 4 or above. Overall, there was a significant difference in both quality and Jadad scores between mifepristone studies and similar clinical trials conducted in the West.
In general, the outcome measures used in most, if not all, studies are questionable and open to debate.
Consistent with the quality assessment, there is indication that the overall effective rates reported in many studies are problematic, as revealed by the positive correlation in overall effective rates between the mifepristone and the control groups.
There is an indication that publication bias is present within these studies.
Footnotes
This research was supported by grant 30872759 (Sun-Wei Guo) from the National Science Foundation of China, grant 074119517 and a Pujiang Project grant from the Shanghai Science and Technology Commission (Sun-Wei Guo), and grant 09–11 from the State Key Laboratory of Medical Neurobiology of Fudan University (Sun-Wei Guo). The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
No writing assistance was utilized in the production of this manuscript.
