Abstract
This review focuses on how surgical methods should be assessed from a health technology perspective. The use of randomized controlled trials, population based registries, systematic literature research and the recently published IDEAL method are briefly discussed.
Keywords
INTRODUCTION
The European organization EuNetHTA defines Health Technology as “the application of scientific knowledge in health care and prevention” (1). The Swedish Council on Health Technology Assessment (SBU, Statens Beredning för Medicinsk utvärdering) has adopted the term health technology as the use of systematic knowledge in techniques and methods to prevent, diagnose, treat and monitor diseases as well as rehabilitation. Health technology assessment (HTA) is truly multidisciplinary and summarises medical (including safety, efficacy and effectiveness), social, economic, organizational and ethical evaluation of properties, effects and/or impacts of health care technology, and this must be made in a systematic, transparent and reproducible manner, an important aim being to support decision makers. When the process has gone thus far political priorities come into work. Since 1985 there is a journal for scientific articles within this field (International Journal of Technology Assessment in Health Care). There are several categories of actions taken such as medical (where pharmacological substances are included), surgical with various types of interventions sometimes with the use of implantation products or devices, screening programmes, psychological/behavioural interventions, diagnostic modalities etc.
Full SBU reports assess or evaluate one disease or a complex of symptoms. It is an extensive process and usually takes 2–4 years to complete, and it ends up with a scientific text of several hundred pages. SBU Alert is a system with rapid evaluation of the evidence for new methods and technologies, a similar function as horizon scanning or early warning system. The strengths with this system are the evaluation of one method at a time and the temporal effectiveness, usually finished in about 6–9 months.
Table 1 shows all SBU projects with a surgical interest from the start in 1987. This paper focuses on diagnostic and treatment procedures for surgical interventions.
Published SBU reports concerning surgical interventions.
PROBLEMS WITH NON-PHARMACOLOGICAL METHODS
When it comes to evaluation of pharmacological substances the regulatory pattern and requirements of authorities are well defined and there are strict rules how to perform a randomized study. How to evaluate other technologies is less well defined, and there have been less strict rules and often up to individual interventionists to test new surgical procedures and implant prostheses, grafts etc. A CE mark on a product does not guarantee testing in properly designed trials. The role of innovative, enthusiastic and brave colleagues has been important, perhaps in spite of initial poor clinical outcome. This can be exemplified by early development of neurosurgery, open thoracic/cardiac surgery and organ transplantation. Obviously there has been a methodological development without randomized studies and modern evaluation technology, and it is important that we define what cannot be questioned in today's treatment armamentarium. This is the basis against which new technologies must be evaluated. Every speciality should define this level of basic knowledge. In this process common sense must not be forgotten but it is difficult to quantify. The lack of RCTs on the benefit of parachutes in air accidents is a drastic example (2). To exemplify, surgical treatment of femoral neck fractures is hardly to be questioned but when new methods, i.e. new osteosynthetic devices, are developed, they must be properly tested. Regarding drainage of abscesses and suture of perforations the experience over time has given the answer. The same can be said about surgical treatment of ruptured abdominal aortic aneurysms, where no treatment goes with 100% mortality and with open surgery it is less than 50%, in the recent Swedvasc report 35% (3). Intuitively endovascular treatment should be even better because of the less surgical trauma, but the correct way to test this hypothesis is through a randomized trial comparing open and endovascular surgery.
This paper will deal with the use of health technology assessment methodology primarily exemplified by vascular surgery. An important issue, which will not be further discussed, is how ethical perspectives should be integrated (4–5).
RANDOMIZED CONTROLLED TRIAL IN SURGERY
The randomized controlled trial (RCT) has its strength in being truly experimental, ideally studying one factor, having all other factors constant and being evenly distributed between groups by the random allocation. Through the randomization process the risk of systematic bias or errors is minimized. The more strict the inclusion and exclusion criteria, however, the poorer will be the external validity and generalizability. One problem supporting this point is the often slow inclusion rate, much slower than the number of potential patients at risk, especially in large multicenter trials (6). This indicates the risk of selection bias. Often surrogate endpoints are used to keep the sample size small, the problem then being how relevant the surrogate is for the clinical outcome. One concern, true also for evaluation of pharmacological substances, is that there must be a genuine uncertainty which alternative is best for the patient. When the surgeon is not convinced which method is superior, it should be possible and also ethical to randomize – the principle of the “grey area of uncertainty” (7–8). The difficulty to perform blind studies in surgery makes fully independent monitoring boards vital – both concerning outcome and safety.
The pathophysiological background and physiological rationale motivating a specific treatment must be clear, a phenomenon illustrated by the external to internal carotid artery bypass study to prevent stroke (9). In that study the bypass, aimed at increasing blood flow to the brain, opened a free channel for embolization as the thrombotic ulceration in the carotid bifurcation was not removed. As could be expected the surgical group came out with significantly higher rate of stroke and death. We also have to be open for totally new etiologic explanations for diseases, exemplified in the change from large surgical resections to antibiotic eradication of Helicobacter pylori in patients with gastric ulcer.
Table 2 illustrates some problems which are met in surgical RCTs.
Problems with RCT in surgery
One great concern when a new technology is to be evaluated is the learning curve, which involves the surgical handicraft but also development of devices and instruments as well as continuous refinement of patient selection and team performance and communication regarding peri- and postoperative care. During the learning process randomization can start too early, when the methodology is still under development, the results rapidly becoming obsolete, and it is not clear how to deal with acquisition of skill and experience during the development phase of a surgical procedure. An example can be illustrated in Table 3 concerning endovascular aneurysm repair (EVAR). The question is when during the technical development a potential RCT against open repair is optimal to start. One study design, which has aimed at taking such problems into account, is the so called tracker trial (10). The tracker trial combines advantages with RCTs and registries and is designed to track the study progress over time by having a flexible protocol and sophisticated interim analysis. The sample size is open for modification and is not absolutely fixed from the study start.
Development of endovascular abdominal aortic aneurysm repair (EVAR)
Another type of RCT is where the patients are randomized to the surgeon or to the surgical team, with the best knowledge of the technologies, which are to be compared. This means that the patients have to be geographically transported, perhaps between hospitals at different locations, and this has logistic as well as economic problems which must be solved (11). This study type was suggested more than 30 years ago but is underused.
It is easier to obtain economic support for pharmacological than non-pharmacological studies through industry sponsorship. However, there seems to be a systematic bias overestimating effects and favouring products which are made by companies funding the research (12). On the other hand, it is more difficult to obtain funding for evaluation of non-pharmacological methods. Having observed the association between industry sponsorship and proindustry conclusions, it is, however, also fair to remember that the quality of industry-supported studies is often high (13). One problem with industry-sponsored trials is that there could be a company induced delay in publication, especially if results are negative, and selection of data. Especially if negative trials are not published, this could distort conclusions in meta-analyses (14). Small and underpowered RCTs, which is often the case in surgery, may contribute to unreliable results of meta-analyses (15). A further problem is the phenomenon of multiple publications of the same study, which is not always easily detected (16).
POPULATION BASED REGISTRIES
There are, however, problems with RCTs in surgery. One is the generalizabilty of results from the RCTs to the population at risk. As already stated the inclusion rate sometimes casts doubt on this issue. Another problem is what happens with the results obtained in an RCT, when the technique or innovation is disseminated outside the well defined study situation. A third problem is to detect rare complications and side effects which are not usually seen in an RCT, simply because the sample size is too small, and the majority of RCTs are not primarily designed and dimensioned to study side effects.
A recent systematic review showed no difference on average in the risk estimate of adverse effects of an intervention derived from meta-analyses of RCTs and observational studies (17). This suggests that systematic reviews of adverse effects should not be restricted to specific study type.
There is also a substantial risk of conducting meta-analyses of small RCTs where the purpose of studying the specified outcome is not defined in advance or if the follow-up method or time are not adequately described. This is evident from the case of aprotinin to reduce blood loss during coronary artery bypass surgery, where a meta-analysis of 52 RCTs in a Cochrane review could not display any excess mortality in spite of the fact that several observational studies showed this (18–19). Later, a large RCT (20) also showed excess mortality and aprotinin was therefore withdrawn from the market. The aprotinin saga illustrates overconfidence in small RCTs of inferior quality compared to well-conducted observational studies.
Therefore it is important to follow what happens outside the RCT situation using a prospective population based registry with a protocol with well defined variables. Then it is possible to compare regions within one country (as in the Swedish Vascular Registry, Swedvasc) and also to make comparisons between countries (as in the Vascunet cooperation, where a recent publication has focused on abdominal aortic aneurysm (21)).
Differences in such registries can of course be due to differences in prevalence and incidence of diseases but are more often due to differences in practice and indications for treatment (NB that the background concerning risk factors etc is comparable). This can lead to further in depth analysis to find explanations and can be used to generate new hypotheses and new studies.
When a method has been introduced on the basis of good evidence after RCTs or systematic literature research, it is important to analyze what happens outside the well controlled trials, that is in clinical reality, and here population based quality registries have an important task, they are increasingly used and also demanded by health care authorities (22). When establishing such registries it is of utmost importance that the profession is involved so that registries are developed with the highest scientific scrutiny.
Although registries have been around for more than 25 years there are several general problems, which have to be solved. Registries should be automatically linked to patient records, which, however, have logistic as well economic implications. The coordination on a national level is at the moment sub-optimal concerning IT support, medicolegal problems, data linkage between various registries etc. There is an urgent need to develop quality indicators and common definitions to make comparisons between hospitals, regions etc possible and meaningful. Existing registries are a rich source for research activities, but are underused. There should be an automatic linkage between RCTs and population based registries. RCTs and population based registries both have their advantages and disadvantages and should be looked upon as complementary research strategies (23).
SYSTEMATIC LITERATURE EVALUATION
With the increasing number of publications and the almost impossible task for individual doctors to synthesize the information and to summarize the evidence of new health technologies, the systematic literature search as originally described by Archie Cochrane has been of great help, since 1992 established in the International Cochrane Collaboration. When doing such a critical evaluation it is important to have very well defined and explicit methods for the strategy of searching the literature and for grading the scientific quality of the individual papers, where well performed RCTs are of highest value and case reports of lowest. The process of evaluating published evidence is quite rigorous. In the next step there must be a system where the results in the collected literature are used to give an evidence mark, where the highest degree normally can be transferred into a clinical recommendation and low degrees should be a stimulation for further research. Today there are several systems for marking the evidence. The Swedish Council on Health Technology Assessment (SBU) has recently adopted the GRADE system (24).
When performing systematic literature overviews a depressing fact is that the majority of what is published is of low quality, often less than 10% for a given problem is of acceptable quality.
When a systematic review is performed, by necessity a number of questions and unresolved problems will be identified. As an example two SBU projects with the same chairman were chosen. The questions identified as well as low evidence statements were listed and all project members prioritized listed items. In Tables 4 and 5 the ten items judged to be of highest priority are illustrated. SBU is at present working on already finished projects to identify clinically relevant problems, where the evidence is low or insufficient. By identifying knowledge gaps research could hopefully be focused on questions where we lack answers and stimulate well designed studies, bringing the evidence to a higher level. When a systematic review has identified gaps in knowledge and uncertainties about effects, controlled trials should be the logical step forward. SBU has started to develop a database for interventions where we lack knowledge or where uncertainties exist (25). The database is primarily based on a similar database “DUET” in UK (26).
Projects on venous thromboembolism (VTE) where well performed studies are needed
Projects on lower limb ischaemia where well performed studies are needed
IDEAL
The IDEAL method has recently been suggested when evaluating surgery and other invasive therapies (27 –29). IDEAL is the acronym for Idea, Development, Exploration, Assessment and Long-term study and resulted from conferences by the Balliol Collaboration, supported by the Nuffield Department of Surgery in Oxford, England, and the Department of Surgery, McGill University in Montreal, Canada, the initiative taken by Jonathan Meakins, professor of surgery in Oxford. During the idea phase there is a proof of concept studies and reports of small series, often as structured case reports. It is important to report new methods, whether successful or not. If there seems to be a benefit, studies are focused on technical development and there may be modifications and the safety aspects are also brought into focus. This is the phase where there still may be a learning curve with alterations over time, also because of increasing experience with the method. During the exploration phase the number of included patients has increased and ideally an RCT should be performed. One controversial issue is when there is appropriate timing to start an RCT. During the assessment phase the intervention has become stable in clinical outcome on an intermediate basis which is important to report. The effectiveness of new technology should be tested against current standards. During the long-term study phase the method is diffused into the population of interest and during this surveillance phase all consecutive patients should be prospectively included into a registry, which has been discussed above. During this phase rare side effects may be identified. It is important that standardized protocols are used, and that there is adherence to reporting standards, such as has been advocated in CONSORT and STROBE (30 –32).
CONCLUDING REMARKS
The motivation for proper health technology assessment is to make it possible for decision makers to choose the best options for diagnosis and treatment. In this way HTA is a bridge between evidence based research and the process of decision-making. The methodology has developed considerably during the 30–40 years HTA has been around. Its importance does not only lie in evaluating technology for use in the health sector but also to stimulate research, where there is no evidence that a method works or to stop methods where there is evidence that they do not work. One problem still is the lack of awareness how HTA functions and to overcome this problem is a major challenge.
