Abstract
The European Society of Gastrointestinal Endoscopy and United European Gastroenterology present a short list of key performance measures for lower gastrointestinal endoscopy. We recommend that endoscopy services across Europe adopt the following seven key performance measures for lower gastrointestinal endoscopy for measurement and evaluation in daily practice at a center and endoscopist level:
Introduction
The European Society of Gastrointestinal Endoscopy (ESGE) and United European Gastroenterology (UEG) have identified quality of endoscopy as a major priority. We described our rationale for this priority in a recent manuscript that also addressed the methodology of the current quality initiative process. 1
Because of the variation in physicians’ performance and the introduction of nationwide colorectal cancer (CRC) screening programs, lower gastrointestinal (LGI) endoscopy was the first area of endoscopy to address quality.2–4 Over more than a decade, several potential measures of quality in LGI endoscopy have been identified. In consequence, many professional societies have published recommendations on performance measures for LGI endoscopy.5–7 These recommendations are, however, numerous (44 different performance measures),5–7 country specific, and not always evidence based, which has limited their wider adoption in Europe.
The aim of the ESGE LGI working group was to identify a short list of key performance measures for LGI endoscopy that were widely applicable to endoscopy services throughout Europe. This list would ideally consist of performance measures with the following requirements: proven impact on significant clinical outcomes or quality of life; a well-defined, reliable, and simple method/approach for measurement; susceptibility for improvement; and application to all levels of endoscopy services.
This paper reports the agreed list of key performance measures for LGI endoscopy and describes the methodological process applied in the development of these measures.
Methodology
We previously described the multistep process for producing such performance measures. 1 In brief, at the United European Gastroenterology Week in 2014, we used a modified Delphi consensus process to develop quality measures in the following domains: pre-procedure, completeness of procedure, identification of pathology, management of pathology, complications, procedure numbers, patient experience, and post-procedure.1,8,9 We decided to have one or two key performance measures for each quality domain.
In order to identify key performance measures, we first created a list of all possible performance measures for LGI endoscopy through email correspondence and teleconferences that took place between December 5, 2014 and February 7, 2015. All possible performance measures that were identified by this process were then structured using the PICO framework (where P stands for population/patient; I for intervention/indicator; C for comparator/control, and O for outcome) to inform searches for available evidence to support the performance measures. This process resulted in 38 PICOs. Detailed literature searches were performed by an expert team of methodologists and yielded results for 29 PICOs (see Supporting information, available online). Working group members also identified additional articles relevant for the performance measures in question.
The PICOs and the clinical statements derived from these were adapted or omitted during iterative rounds of comments and suggestions from the working group members during the Delphi process. The evolution and adaptation of the different PICOs and clinical statements during the Delphi process can be reviewed in the Supporting information. The domain addressing the competence of endoscopists’ quality (including procedure numbers), along with its associated PICOs and clinical statements, was moved for future initiatives.
In total, working group members participated in a maximum of three rounds of voting to agree on performance measures in predefined domains and their respective thresholds, as discussed below. Statements were discarded if agreement was not reached over the three voting rounds. The agreement that is given for the different statements refers to the last voting round in the Delphi process. The key performance measures were distinguished from the minor performance measures based on the ISFU criteria (importance, scientific acceptability, feasibility, usability, and comparison with competing measures), and expressed by mean voting scores.
The performance measures are displayed in boxes under the relevant quality domain. Each box describes the performance measure, the level of agreement during the modified Delphi process, the grading of available evidence (the evidence was graded according to the Grading of Recommendations Assessment, Development and Evaluation [GRADE] system), 10 how the performance measure should be measured, and recommendations supporting its adoption. The boxes further list the measurement of agreement (scores), the desired threshold, and suggestions on how to deal with underperformance.
The minimum number needed to assess whether the threshold for a certain performance measure is reached can be calculated by estimating the 95% confidence intervals (CIs) around the predefined threshold for different sample sizes.8,9,11 For the sake of practicality and to simplify implementation and auditing, we suggest that at least 100 consecutive procedures (or all, if <100 performed) should be measured to assess a performance measure. Continuous monitoring should however be the preferred method of measurement.
Performance measures for LGI endoscopy
The evidence derived by the literature search group and input from the working group members were used to formulate a total of 34 clinical statements addressing 27 potential performance measures grouped into eight quality domains. Over the course of two voting rounds, consensus agreement was reached for 18 statements regarding 14 potential performance measures (agreement in both voting rounds). The remaining 16 statements were again rephrased and subjected to a third and final voting round, with a further four statements being accepted. In total, 22 statements regarding 18 performance measures were accepted after three voting rounds. Over the course of voting, we decided that the quality domain on competence of endoscopists (including three accepted statements and three performance measures) would be discarded from these guidelines and left for future initiatives. Therefore, a final total of 15 performance measures (19 statements) attributed to seven quality domains were accepted for these guidelines (see Figure 1). The entire process of performance measure development can be reviewed in the Supporting information. The statement numbers correspond to those used in Supporting Information.
The domains and performance measures chosen by the working group (N/A = not available).
We used the highest mean voting scores to identify one key performance measure for each of the seven quality domains (Figure 1). The remaining performance measures were considered minor performance measures. In the management of pathology domain, there were two performance measures (“appropriate polypectomy technique” and “tattooing resection sites”) that had similar voting scores. We decided to select “appropriate polypectomy technique” as the key performance measure for this domain, based on its wider usability and better feasibility.
All performance measures were deemed valuable by the working group members and were obtained after a rigorous process, as described above. From a practical viewpoint, it may however be desirable to implement the key performance measures first in units that are not monitoring any performance measures at this time. Once a culture of quality measurement (with the aim of improving practice, outcomes, and patient experience) is accepted and software is available, the minor performance measures may then further aid the monitoring of quality in LGI endoscopy. The use of appropriate endoscopy reporting systems is key to facilitate data retrieval on identified performance measures. 12
The acceptance of this performance measure is based on agreement with the following statements:
In patients undergoing screening or diagnostic colonoscopy, bowel preparation quality should be recorded using a validated scale with high intraobserver reliability. (Statement number N1.1) Agreement: 100% A service should have a minimum of ≥90% procedures and a target of ≥95% procedures with adequate bowel preparation, assessed using a validated scale with high intraobserver reliability. (N1.2) Agreement: 100%
The quality of bowel preparation is important for the efficacy of colonoscopy. As pointed out in the ESGE guidelines on bowel preparation for colonoscopy, 13 the quality of bowel preparation is associated with two other important performance measures for colonoscopy, namely adenoma detection rate (ADR) and cecal intubation rate. 14 Suboptimal bowel preparation results in further costs and inconvenience because the examination has to be repeated or an alternative examination has to be arranged. 15
To determine the scientific acceptability of measuring bowel preparation quality, we focused on the performance of different bowel preparation scales and the quantification of adequacy of bowel preparation. There were no direct comparisons of performance between the bowel preparation scales (see Supporting information). Three bowel preparation scales have undergone comprehensive validation and have shown sufficient validity and reliability: the Boston Bowel Preparation Scale (BBPS), 16 the Ottawa Scale, 17 and the Aronchick Scale. 18 The BBPS is the most thoroughly validated scale and should be the preferred one. 19 There were no significant differences between intermediate and high quality bowel preparation (regardless of the scale used) in terms of the detection rates for adenomas or advanced adenomas (see Supporting information). 20 Therefore, adequate bowel preparation may be defined as: BBPS ≥6, Ottawa Scale ≤7, or Aronchick Scale excellent, good, or fair. The adoption of validated scales for bowel preparation quality assessment has been proven to be feasible in routine practice. 21
The acceptance of this performance measure is based on agreement with the following statement:
Colonoscopy needs adequate time allocated for insertion, withdrawal, and therapy. Routine colonoscopy should be allocated a minimum of 30 min. Colonoscopies following positive fecal occult blood testing should be allocated a minimum of 45 min to allow for therapeutic intervention. (N1.3) Agreement: 100%
There is some evidence that productivity pressure may negatively affect the quality of colonoscopy. 27 Although it has been shown that working behind schedule is not associated with lower ADRs, 28 the effect of a very tight schedule on colonoscopy performance is unknown (see Supporting information). The working group members suggested that 30 min and 45 min are minimum times that should be allotted for routine colonoscopy and colonoscopy after positive fecal occult blood testing (longer time to accommodate high prevalence of large polyps), respectively. These values correspond well with mean total procedure times for colonoscopy reported in recent studies.29,30
The acceptance of this performance measure is based on agreement with the following statement:
For audit purposes, the colonoscopy report should include an explicit indication for the procedure, categorized according to existing guidelines on appropriateness of colonoscopy use. (N1.4) Agreement: 93.8%
Appropriate referrals for colonoscopy may help to optimize the use of limited resources and protect patients from the potential harms of unnecessary invasive procedures. Colonoscopies with an appropriate indication are associated with significantly higher diagnostic yields for cancer and other relevant lesions than colonoscopies without an appropriate indication.31–34 The American Society for Gastrointestinal Endoscopy (ASGE) and the European Panel on the Appropriateness of Gastrointestinal Endoscopy (EPAGE) II guidelines, on the appropriateness of colonoscopy use,35,36 consistently show 67%–96% sensitivity and 13%–40% specificity for the detection of relevant findings (see Supporting information).31–34
The acceptance of this performance measure is based on agreement with the following statements:
Complete colonoscopy requires cecal intubation with complete visualization of the whole cecum and its landmarks. (N2.1) Agreement: 100% A service should have a minimum unadjusted cecal intubation rate of ≥90% and a target rate of ≥95% as a measure of the completeness of colonoscopy examination. (N2.2) Agreement: 93.8% Complete colonoscopy (cecal intubation) should be documented both in written form and in a photo or video report. (N2.3) Agreement: 100%
Cecal intubation is a prerequisite for complete visualization of the colorectum. Cecal intubation must be confirmed with photo or video documentation. Clear cecal image documentation is associated with a higher polyp detection rate (PDR). 38 For the purpose of colorectal neoplasia detection, terminal ileum intubation is useful only to confirm completion of the colonoscopy when classic cecal landmarks are not confidently seen. 39
The acceptance of this performance measure is based on agreement with the following statement:
Adenoma detection rate should be used as a measure of adequate inspection at screening or diagnostic colonoscopy in patients aged 50 years or more. (N3.1) Agreement: 100%
The detection and removal of adenomas, which are major precursor lesions for CRC, is seen as a key aspect of CRC prevention. However, there is a wide variation between endoscopists in terms of their skills at detecting adenomas, expressed as the ADR.22,43,46–48 ADR has been inversely associated with the risk of interval CRC and CRC death.46,47 A similar relationship with the incidence of distal interval CRC was confirmed for flexible sigmoidoscopy screening. 49 Of note, the detection rate of serrated polyps has been shown to strongly correlate with the ADR. 43 Although ADR is considered a surrogate for meticulous inspection of the colorectal mucosa, the correlation with other important, but non-neoplastic, findings has never been studied.
Several interventions, including education, creating awareness, feedback, and benchmarking on colonoscopy quality, have all helped to improve the ADR.50–53 Recently, it has been shown that an improved ADR translates to risk reductions for interval CRC and death, which closes the quality improvement loop. 54
It has been postulated that ADR has an inherent limitation of not measuring the total number of adenomas detected. 41 A potentially more accurate measure, namely number of adenomas per colonoscopy, has been proposed, but this was proven not to be superior to ADR in a recent study. 55
The acceptance of this performance measure is based on agreement with the following statement:
A mean withdrawal time of at least 6 min should be used as a supportive measure of adequate identification of pathology at negative screening or diagnostic colonoscopy. (N3.6) Agreement: 87.5%
Colonoscope withdrawal time provides information about the time that endoscopists spend identifying pathology. A mean withdrawal time of >6 min has been associated with higher ADRs. 56 Although the association between withdrawal time and ADR was not observed in all studies, 57 a recent large population-based analysis confirmed the positive relation between these two measures, with a 3.6% absolute increase in ADR per minute increase in withdrawal time. 24 Importantly, the latter study also showed an inverse association between mean withdrawal time and the incidence of interval CRC. 24 The observed association was not linear and the risk of interval CRC leveled off at a mean withdrawal time of 8 min (the most significant difference was observed for the 6-minute cut-off). In another study, an increase in mean withdrawal time beyond 10 min had minimal effect on ADR. 58 Therefore, the minimum standard mean withdrawal time of 6 min and the target standard of 10 min are quite well defined.
The acceptance of this performance measure is based on agreement with the following statement:
Polyp detection rate should be used as a measure of adequate inspection at screening or diagnostic colonoscopy in patients aged 50 years or more. (N3.5) Agreement: 84.6%
The acceptance of this performance measure is based on agreement with the following statement:
Adequate resection technique of small and diminutive colorectal polyps includes biopsy forceps removal of polyps ≤3 mm in size and snare polypectomy for larger polyps. (N4.6) Agreement: 93.3%
Incomplete polypectomy is considered the cause for up to 25% of interval CRCs.69,70 Incomplete resection of polyps 5–20 mm in size varies from 6.5% to 22.7% among endoscopists; 71 however, completeness of polyp resection is considered challenging to measure, and statements regarding this topic have not reached agreement in the current Delphi process (see Supporting information).
Biopsy forceps resection of polyps 4–5 mm in size or larger has been shown to be inferior to snare techniques, with regard to completeness of resection.72,73 Therefore, the appropriate resection technique for colorectal polyps includes biopsy forceps removal of polyps ≤3 mm in size, and snare (cold or with diathermy) polypectomy for larger polyps. Despite this, in a recent large cohort study, it was demonstrated that 28.2% of lesions ≥5 mm in size were resected using biopsy forceps instead of a snare technique. 74 Contrary to this, in a large study from the UK, over 90% of polyps larger than 3 mm in size were removed using a snare. 75
The acceptance of this performance measure is based on agreement with the following statement:
In patients undergoing removal of colorectal lesions with a depressed component (0-IIc, according to the Paris classification) or non-granular or mixed-type laterally spreading tumors, located between the ascending and the sigmoid colon, the resection site should be tattooed to improve future re-location of the resection site. (N4.1) Agreement: 93.3%
Colorectal lesions with a depressed component and non-granular or mixed-type laterally spreading tumors (LSTs) harbor an increased risk of malignancy.76–78 Therefore, the site of endoscopic removal of these lesions often needs to be re-located to identify recurrence or to guide surgical management. It has been shown that tattooing significantly shortens the time to re-locate the resection site on endoscopy. 79 There is however no evidence that tattooing the resection site increases the rate of re-location of lesions (see Supporting information). Preoperative tattooing using prepacked kits was proven to be a very effective method of tumor localization in laparoscopic surgery. 80 Moreover, some studies have shown that tattooing improves lymph node yield and facilitates the harvesting of suspicious lymph nodes during colorectal surgery.81,82
The acceptance of this performance measure is based on agreement with the following statement:
The non-diminutive polyp retrieval rate should be monitored. A service should have a polyp retrieval rate of ≥90%. (N4.2) Agreement: 86.7%
The retrieval of polyps after endoscopic resection is a “sine qua non” requirement for histopathology examination. Histopathology examination guides further management including post-polypectomy surveillance. Diminutive polyps (≤5 mm in size) harbor a very low risk of cancer or advanced histology and are considered amenable for a resect-and-discard policy following in vivo optical diagnosis under strictly controlled conditions. 85 Furthermore, diminutive polyps are frequently removed using biopsy forceps, which makes their retrieval quite straightforward.
It has therefore been decided to monitor only the retrieval of polyps larger than 5 mm in size. Their retrieval is not only more important from the clinical perspective but also technically more difficult because it requires the transected polyp to be suctioned into a trap, ensnared, or grasped using a Roth net, so that it can be removed together with the endoscope.86,87 Even though the need for polyp retrieval seems obvious, it is unknown what the effect of substandard retrieval is on repeat colonoscopy rates or the appropriateness of recommended post-polypectomy surveillance.
The acceptance of this performance measure is based on agreement with the following statement:
In patients undergoing removal of colorectal lesions with a depressed component (0-IIc, according to the Paris classification) or non-granular or mixed-type laterally spreading tumors, conventional or virtual chromoendoscopy should be used to improve delineation of lesion margins and predict potential depth of invasion. (N4.4) Agreement: 93.3%
In 2014, the ESGE issued guidelines on advanced endoscopic imaging for the detection and differentiation of colorectal neoplasia in which it suggested the use of advanced endoscopic imaging for margin assessment and prediction of deep submucosal invasion in lesions with a depressed component (0-IIc) or non-granular or mixed-type LSTs. 85 The quality of evidence supporting these recommendations was considered very low and moderate for margin delineation and assessment of depth of submucosal invasion, respectively. Since then no new evidence with clinically relevant endpoints for the patients (incomplete resection, interrupted procedure, cancer detection) has been published to further support its use (see Supporting information).
The acceptance of this performance measure is based on agreement with the following statement:
The Paris classification should be routinely used to describe the morphology of non-polypoid lesions identified at colonoscopy. (N4.5) Agreement: 84.6%
The Paris classification was developed with the aim of standardizing the terminology of superficial colorectal lesion morphology. 76 It divided lesions into two main groups: polypoid and non-polypoid, further defining four subtypes of the latter. Although its use is widely endorsed, it has never been fully validated. Recent studies have shown only moderate interobserver agreement for the Paris classification, even among experts.91,92 More importantly, short training sessions are not sufficient to improve the agreement, suggesting that refinement of the classification is needed. 91 Adoption of the classification in the community setting is unknown. The introduction of the Paris classification did however have two important effects: it raised awareness of subtle colorectal lesions among Western endoscopists, 93 and helped to predict submucosal invasion of colorectal lesions before their removal.78,93
The acceptance of this performance measure is based on agreement with the following statement:
In patients undergoing colonoscopy, a 6-day readmission rate and 30-day mortality rate should be monitored using a reliable system. (N5.1) Agreement: 93.8%
The rate of complications, adverse events, and harms are important outcome measures of colonoscopy performance. Some studies and guidelines have reported rates for specific complications such as perforation, bleeding, or sedation-related cardiopulmonary adverse events.6,45,94–96 These specific outcomes are however difficult to compare across services because they are infrequent, have variable definitions, and depend on case mix. For feasibility reasons, we propose to measure adverse outcomes, as defined in previous studies,97–100 to give an overall rate of complications and to drill down into specific outcomes only if the standard is not met.
The definitions of complications are of paramount importance because the differences between major and minor complications or between minor complications and routine events encountered during the course of the procedure can be vague. The all-cause 30-day mortality rate is certainly well defined and important to measure. In large clinical or administrative databases, the rate of all-cause 30-day mortality has been estimated at 0.07% (1 in 1500),95–97,100–102 and the colonoscopy-specific mortality at more than 10 times lower (1 in 15 000 or lower).95,96,102,103 Although all-cause 30-day mortality rates would be impossible to compare across services, all deaths should be discussed during morbidity and mortality conferences. 104 The LGI working group members decided that, although the accepted statement focused on the 6-day readmission rate, this should be changed to a 7-day readmission rate in order to make it more comparable with the published literature. The 7-day or 30-day hospital admission/readmission rate is a well-defined and objective way to track late complications of colonoscopy.95–97,99,100
Late complications represent over half of all colonoscopy-associated complications. 98 Furthermore, the 6-day readmission rate was shown to predict 30-day all-cause mortality. 99 The reported all-cause 7-day and 30-day hospital admission/readmission rates were 0.5% and 1.1%–3.8%,95,97,99,100 respectively (0.5% for colonoscopy-specific readmission rates). 95 Therefore, the minimum standard of 0.5% seems acceptable for 7-day overall or 30-day colonoscopy-specific readmission rates.
The early complication rate (diagnosed immediately during the procedure or before patient discharge) is relatively easy to measure using appropriate endoscopy reporting systems. 12 The definition of an early complication is however more challenging and, in the view of the working group, should only include complications that result in one of the following: (i) lengthening of the hospital stay; (ii) unscheduled further endoscopic procedure; or (iii) emergency intervention, including blood transfusion or surgery. 6
The acceptance of this performance measure is based on agreement with the following statements:
Patient experience during and after unsedated or moderately sedated colonoscopy or sigmoidoscopy should be routinely measured. (N7.1) Agreement: 93.8% Patient experience with colonoscopy or sigmoidoscopy should be self-reported by a patient using a validated scale. (N7.2) Agreement: 93.8%
Colonoscopy may be perceived to be a painful and embarrassing procedure and this perception hampers patient participation in screening programs, adherence to surveillance recommendations, and even diagnostic work-up for large bowel symptoms.105–107 Although sedation may decrease pain during colonoscopy, it does not eliminate it, 108 has little effect on post-procedure pain, 22 and increases the risk of complications. 109 Therefore, monitoring patient experience, including intra- and post-procedure pain levels, is crucial.
The acceptance of this performance measure is based on agreement with the following statement:
Adherence to post-polypectomy surveillance recommendations should be monitored. The reason for deviation from national/European guidelines should always be provided. (N8.1) Agreement: 93.8%
Patients who have had adenomas removed are believed to be at increased risk of developing new adenomas or cancer in the future.122–124 In order to mitigate this risk, professional societies recommend patients undergo colonoscopy surveillance depending on age, comorbidity, and adenoma characteristics.125,126 Surveillance intervals recommended in the guidelines represent the best evidence-based balance between the benefits (protection against CRC) and harms (too frequent invasive examinations) of subsequent colonoscopies.
Adherence to these recommendations is key to the efficacy and efficiency of colonoscopy surveillance. Unfortunately, studies from the Netherlands and Canada have shown that less than 30% of patients who have undergone adenoma removal receive appropriate surveillance.127,128 One of the key reasons for inappropriate surveillance is inappropriate recommendations given by gastroenterologists, surgeons, or primary care physicians.129,130 The adherence of physicians to the post-polypectomy surveillance recommendations could be relatively easily monitored using modern endoscopy reporting systems. 12 Any deviation from guideline recommendations should be clearly stated in the reporting system, with the rationale for this provided.
No minimum standard for this key performance measure was defined because of lack of evidence.
General conclusions, research priorities, and future prospects
This paper describes a short list of key performance measures for LGI endoscopy that have the best evidence-based impact on clinical outcomes, while being feasible to measure and susceptible to improvement.
Areas for further research.
CRC, colorectal cancer.
The other notable feature of the identified performance measures is that the evidence behind them comes almost exclusively from the field of CRC prevention and early detection. Although performance measures from the pre-procedure and completeness of procedure domains are largely universal, performance measures within the identification of pathology, management of pathology, and post-procedure domains are not applicable outside of the CRC screening/surveillance setting. Further research on these topics is warranted (see Table 1).
The first step now is to implement these key performance measures in endoscopy practice throughout Europe. We encourage individual endoscopists, as well as heads of endoscopy units, to start implementation of the performance measures without delay. Implementing performance measures is important to identify services and individual endoscopists with substandard levels of performance. The aim is not to penalize these endoscopists or services but to have a tool to improve the quality of endoscopy. Feedback and benchmarking of colonoscopy performance measures are usually sufficient to positively influence the overall quality of colonoscopy.54,132 If the provision of such information turns out to be insufficient to promote improvement, the next step is to provide assistance and additional training.50,52
At a service level, the implementation of key performance measures may well require investment in hardware to accommodate a more efficient auditing process. We want to encourage hospital management to support the implementation of these performance measures in their endoscopy services. We think that, in an era where general hospital accreditation has become increasingly important, hospital administrations will be more susceptible to support such actions. Moreover, we owe it to our patients to overcome individual or financial barriers to ensure that endoscopy services are of the highest quality and to set research priorities to gather data that will inform the next generation of performance measures.
Footnotes
Acknowledgments
The authors gratefully acknowledge the contributions from: Dr. Stuart Gittens, ECD Solutions in the development and running of the web platform; Iwona Escreet and all at Hamilton Services for project administrative support; the Scottish Intercollegiate Guidelines Network for hosting the critical appraisal module; EuropaColon for their support.
Declaration of Conflicting Interests
Funding
Michal F. Kaminski, Marek Bugajski, Michael Bretthauer, Kjetil Garborg, and Geir Hoff are supported by a grant (grant number Pol-Nor/204233/30/2013) from the Polish–Norwegian Research Program. Michael Bretthauer is supported by Top Researcher Grants of the Norwegian Cancer Society and the Norwegian Research Council. UEG supplied co-funding and additional project governance to this endeavor.
