Abstract
Background
Many endoscopists acknowledge that the appearance of the papilla of Vater seems to affect biliary cannulation. To assess the association between the macroscopic appearance of the papilla and biliary cannulation and other related clinical issues, a system is needed to define the appearance of the papilla.
Objective
The purpose of this study was to validate an endoscopic classification of the papilla of Vater by assessing the interobserver and intraobserver agreements among endoscopist with varying experience.
Methods
An endoscopic classification, based on pictures captured from 140 different papillae, containing four types of papillae was proposed. The four types are (a) Type 1: regular papilla, no distinctive features, ‘classic appearance’; (b) Type 2: small papilla, often flat, with a diameter ≤ 3 mm (approximately 9 Fr); (c) Type 3: protruding or pendulous papilla, a papilla that is standing out, protruding or bulging into the duodenal lumen or sometimes hanging down, pendulous with the orifice oriented caudally; and (d) Type 4: creased or ridged papilla, where the ductal mucosa seems to extend distally, rather out of the papillary orifice, either on a ridge or in a crease. To assess the level of interobserver agreement, a web-based survey was sent out to 18 endoscopists, containing 50 sets of still images of the papilla, distributed between the four different types. Three months later a follow-up survey, with images from the first survey was sent to the same endoscopists.
Results
Interobserver agreement was substantial (κ = 0.62, 95% confidence interval (CI) 0.58–0.65) and were similar for both experts and non-experts. The intraobserver agreement assessed with the second survey was also substantial (κ = 0.66, 95% CI 0.59–0.72).
Conclusion
The proposed endoscopic classification of the papilla of Vater seems to be easy to use, irrespective of the level of experience of the endoscopist. It carries a substantial inter- and intraobserver agreement and now the clinical relevance of the four different papilla types awaits to be determined.
Keywords
Introduction
In the early days of fibre-optic endoscopy there were limited options for image documentation and storage. With the digital revolution and computerized image documentation, there are few, if any limitations in acquisition and storage of high-quality images during gastrointestinal endoscopy.
The ambition to standardize terminology and reporting of endoscopic procedures was emphasized by the World Organization of Gastrointestinal Endoscopy (OMED) in the development of the Minimal Standard Terminology. 1 The recent updated guidelines 2 emphasized that images, and even video sequences, captured during endoscopic examinations and therapeutic interventions are considered almost mandatory to achieve a high level of quality documentation. With the increased use of advanced imaging technologies such as high definition white-light endoscopy, conventional or virtual chromoendoscopy, probe-based confocal laser endomicroscopy, 3 the demands for image storage, interpretation and standardization have expanded even further,4,5 Studies on the agreement between investigators in the assessment of endoscopic images are of vital importance not only for clinical practice but even more so in the design of and reporting on research protocols.
In the field of gastrointestinal endoscopy, several previous studies have dealt with endoscopists lack of concordance in image interpretation regarding specific pathological conditions. These issues have been addressed in oesophageal varices, 6 Barrett’s oesophagus,7,8 oesophagitis,9,10 eosinophilic oesophagitis,11,12 bleeding peptic ulcers, 13 ulcerative colitis 14 and dysplastic lesions in the colon 15 with varying outcomes.
All endoscopists carrying out endoscopic retrograde cholangio-pancreaticography (ERCP) recognize the obvious differences in the appearance of the papilla of Vater. There are several descriptions of the papilla of Vater, anatomically,16–18 radiologically19,20 and histologically, 21 but none of these contain any structured classification of the macroscopic appearance of the papilla of Vater as perceived during the endoscopic examination. Moreover, there is a widespread opinion that the macroscopic appearance might well co-vary with the cannulation complexity,22,23 complication rates 24 and also with other factors of relevance for the management of individual patients. 25 To comprehensively assess the association between the macroscopic appearance of the papilla and the cannulation complexity and other related clinical issues, a system is requested to accurately and reliably describe the papilla. The objectives of this study were therefore twofold; first to present an endoscopic classification of the papilla of Vater and secondly to determine inter- and intra-observer agreement among experts as well as non-expert endoscopists when assessing the different papillae types.
Material and methods
Still images from 140 different patients were captured during clinically indicated ERCP examinations, with standard duodenoscopes (TJF-Q180V, Olympus Medical Systems Co., Tokyo, Japan) connected to standard endoscopy processors (CV-180, Olympus). No one of the patients had previously undergone treatment such as endoscopic sphincterotomy, papillary dilatation or biliary stenting. No gross pathology or anatomical variant, such as a juxtapapillary diverticula or tumour, was present. A set of at least four still images of the same papilla was captured before the clinical procedure started. First photographs were taken of the papilla and the surrounding duodenal wall at full inflation at two different angles, followed by one close-up view and finally one frontal image with a standard sphincterotome positioned on the side of the papilla, acting as a yardstick (Figure 1).
Endoscopic classification of the macroscopic appearance of the Papilla of Vater.
During the work on a previous study in the Scandinavian Association for Digestive Endoscopy (SADE) study group of ERCP concerning difficult cannulation,
26
the possible impact of the macroscopic appearances of the papilla were more closely discussed. With that discussion in mind, all image sets were presented to a group of expert endoscopists at our own institution, whereupon the papilla images were separated and collected into four different types based on their respective endoscopic appearance (Figure 1).
Type 1; Regular papilla, with no distinctive features i.e. ‘classic appearance’. Type 2; Small papilla, often flat, with a diameter not bigger than 3 mm (approximately 9 Fr). Type 3; Protruding or pendulous papilla. A papilla is standing out, protruding or bulging into the duodenal lumen or sometimes hanging down, pendulous with the orifice oriented caudally. Type 4; Creased or ridged papilla, where the ductal mucosa seems to extend distally, rather out of the papillary orifice, either on a ridge or in a crease.
To assess the accuracy by which different endoscopists could group the different images into the four types (interobserver agreement), a web-based survey was constructed containing 50 sets of still images of various papilla distributed between the four different types. Eighteen endoscopists were invited to participate in the survey, of whom nine were expert endoscopists from the Nordic countries and nine were non-experts recruited at random from participants in an introductory ERCP training course. The survey were constructed and distributed in a web-based interface (Enalyzer Survey Solution, www.enalyzer.se) accessible only to the invited endoscopists and investigators. No patient data were available, nor were patients’ characteristics or symptoms presented. The survey started with an introduction of the predefined classification system followed by a few training exercises, with direct feedback to the respondents upon each decision. During the sharp survey the respondents were asked to select one out of the four standard types, which they thought each individual set of images portrayed, by means of multiple choice assessment. The endoscopists had unlimited time to view each set of pictures, and were allowed to go back and change a previous decision. The survey presented in total 15 cases with a Type 1, 11 with Type 2, 13 with Type 3 and finally 11 cases with Type 4 papilla. We also collected information on the ERCP experience of the respective participant as well as the annual ERCP volume per institution. Data were anonymously collected and entered directly into a coded database.
Three months later the same endoscopists were asked to respond to a similar survey containing a stratified random sample of 20 sets (five each of the four different standard types) allowing for an assessment of intraobserver agreement.
Statistics and ethics
Interobserver agreement analysis was carried out for the entire group of respondents, after which a sub-analysis was done by dividing the respondents into groups based on their respective level of ERCP experience. Likewise, intraobserver agreement was calculated by comparing the respondent’s answers in both surveys.
The agreement between the respondents’ decisions and the predefined classification was measured using kappa statistics. The agreement, as reflected by the kappa value, was interpreted as suggested by Landis and Koch 27 as either; slight (κ ≤ 0.2), fair (0.21 ≤ κ ≤ 0.40), moderate (0.41 ≤ κ ≤ 0.60), substantial (0.61 ≤ κ ≤ 0.80) to almost perfect (0.81 ≤ κ ≤ 1.00).
The 95% confidence interval (CI) was calculated to measure the level of precision of the respective kappa values. The analysis was done using IBM SPSS Statistics, v. 23 (IBM, Chicago, Illinois, USA).
The regional ethics committee at Karolinska Institutet, Stockholm approved the study protocol (Dnr 2013/908-31/2).
Results
Details the endoscopic retrograde cholangio-pancreaticography (ERCP) experience of the expert and the non-expert endoscopists.

Observed agreement and distribution of alternative responses relative to the four predefined types of papillae.
Interobserver agreement and intraobserver agreement among experts and non-experts.
CI: confidence interval.
Discussion
Many endoscopists recognize that there may be differences in the macroscopic visual appearance of the major duodenal papilla with potential to influence ERCP cannulation rates. In order to address this and related issues, a classification system has to be available which accurately describes the different types of papillae. Until now there has been no validated classification system to describe the endoscopic appearance of the papilla of Vater. Here we describe four different types of the virgin papilla of Vater and the validation process suggests that endoscopists can identify these four different gross appearances of the papilla with substantial interobserver as well as intraobserver agreement. Of importance also was our finding that we were unable to detect any differences between the expert and non-expert groups of endoscopists. These results illustrate the clinical usefulness of the classification system and that it has obvious potential to become an important research tool. Previous endoscopic classification systems for the papilla of Vater are scarce. Horiuchi et al. 28 described three different types of papillae, separated from each other depending on the degree of protrusion into the duodenal lumen. Their system was developed to act as a guide when choosing between different precut techniques in cases with so-called difficult cannulation. Based on Horiuchi et al.’s classification, Lee et al. 29 added a fourth ‘distorted type’, in their study regarding precut fistulotomy cannulation. However, it has to be remembered that none of these classification systems have been subject to a validation process, and neither of these studies include a description of a small or creased papilla. These studies do, however, emphasize the clinical relevance of an endoscopic classification of the papilla of Vater pertinent for the prediction of the complexity of the subsequent cannulation. Currently we have no data to offer on the cannulation success rates in the respective four types of papillae, but intuitively Types 2 and 3 are those in which cannulation difficulties are foreseeable.24,25 Previous publications have determined technical aspects during cannulation to define ‘a papilla difficult to cannulate’ 30 and subsequent studies have defined difficult cannulation, 26 criteria that now have been accepted to be used in future cannulation studies. 31 These and related issues are currently specifically addressed in a forthcoming dedicated study dedicated study. Recent technical developments in capturing, storing and distributing digital images through digital media have paved the way for the use of high-quality image documentation, basically during all endoscopic procedures. These utilities will significantly improve both education and research, and allow endoscopic observations to be objectively verifiable and reproducible. Moreover, these digital and technological developments foster the processing of endoscopic classification systems for virtually the entire gastrointestinal tract. Unfortunately, studies as represented by the present investigation are often lagging behind, since the interpretation and reporting of endoscopic images may seriously suffer from precision and accuracy. 32 Accordingly validation studies, such as inter- and intraobserver agreement evaluations, represent important tools in the attempt to standardize an otherwise ambiguous clinical and research situation.
The use of still images, instead of video sequences, can be criticized since the latter better mimics the clinical duodenoscopy situation. Previous studies in the field of endoscopic validation studies have, however, used still images of good quality and found these to be of sufficient value, not the least from a study logistic perspective.12,15 Video sequences are hampered by the fact that they are time-consuming to watch and evaluate and are often difficult to produce and reproduce with optimal image quality and proper freeze framing.
Furthermore, there exists some controversy regarding the use and relevance of kappa statistics. The calculated values in kappa statistics are influenced by the distribution of the traits in the data sets and are difficult to compare between different studies or study populations. Conclusions drawn from a single kappa value have to be made with caution, bearing in mind that that the value is just descriptive and needs to be interpreted. Accordingly, one single value is of limited or no value but the entire series of outcomes should be in focus for the reported outcomes. In the present study these outcomes were consistent throughout the various analytical exercises. Corresponding kappa analyses contain no component of hypothesis testing or comparisons between groups to distinguish if a value is ‘true’ or not. This makes power or sample size calculations obsolete but it has to be mentioned that the number of participants as currently practised are in accord with other studies in the field. 33
In conclusion, the proposed endoscopic classification of the papilla of Vater seems to be easy to use, irrespective of the level of experience of the endoscopist. It carries a substantial level of inter- and intraobserver agreement reflecting its potential to be clinically useful, and now the potential clinical relevance of the four different papilla types awaits to be determined.
Footnotes
Acknowledgements
The authors wish to express their thanks to Salmir Nasic for valuable statistical support. The SADE study group of ERCP members are: Lars Aabakken, Department of Cancer, Surgery and Transplant Surgery, Oslo University Hospital, Norway; Juha Grönroos, Division of Digestive Surgery and Urology, Turku University Hospital, Finland; Jorma Halttunen, Department of Gastrointestinal and General Surgery, Helsinki University Central Hospital, Finland; Truls Hauge, Department of Gastroenterology, Oslo University Hospital, Norway; Björn Lindkvist, Sahlgrenska Academy, University of Gothenburg, Sweden; Marja-Leena Kylänpää Department of Gastrointestinal and General Surgery, Helsinki University Central Hospital, Finland; Palle N Schmidt, Department of Gastroenterology and Gastrointestinal Surgery, Hvidovre Hospital, Denmark; Arto Saarela, Gastrointestinal Surgery Division, Oulu University Hospital, Finland; Ervin Toth, Department of Clinical Sciences, Department of Gastroenterology and Nutrition, Skåne University Hospital, Lund University, Sweden.
Declaration of conflict of interests
The authors declare that there is no conflict of interest.
Funding
The study was supported by grants from the Research Fund at Skaraborg Hospital, Skövde (VGSKAS 308021) and from the Stockholm County Council, the Karolinska Institutet (SLL: ALF 20130512 to UA).
