Abstract
Background and aims:
Whereas the value of endoscopic retrograde cholangiopancreatography (ERCP) training in clinical practice is well known, the impact on stress markers and performance in a virtual reality (VR) simulator is not. The primary aim of the study was to see how the number of clinical ERCPs performed during a 1-year period influenced VR-ERCP performance. A secondary aim was to compare differences in salivary stress marker levels, between the first and final simulator attempts.
Methods:
Thirty-one endoscopists completed three VR-ERCP procedures of increasing difficulty. The times taken to complete the different steps of the procedures were recorded. Saliva chromogranin A, cortisol, and α-amylase were measured before and after each phase of the cystic leakage procedure. Participants then did 1 year of clinical ERCP training at their respective centers. The remaining cohort (26/31) was divided into two subgroups according to their level of clinical training. They then completed the same VR-ERCP procedures. Differences in time before and after each phase as well as stress marker levels during the cystic leakage procedure were assessed.
Results:
Those with >100 ERCPs of clinical training improved times to completion of all 15 phases in the VR-ERCP procedures (p < 0.05) in contrast to the group with 20–50 ERCPs who only improved in 11/15. Differences in increases in salivary stress marker levels of chromogranin A before and after each phase of the cystic leakage procedure, adjusted for number of ERCPs, showed significant reductions in four of the five phases measured.
Conclusion:
Clinical ERCP training enhances subsequent performance in terms of time to completion in a VR-ERCP simulator. Additional intended use of simulators could be used as a benchmark for clinical progress. Saliva markers may be feasible to use in measuring stress reactions in a training setting.
Context and Relevance
Since the early 2000s, it has been known that resident physicians who have trained on advanced medical, surgical, and endoscopic simulators show significantly improved outcomes in procedures such as laparoscopic cholecystectomy and endoscopic retrograde cholangiopancreatography (ERCP). However, the reciprocal relationship—that is, the impact of clinical experience on outcomes in repeated simulator training—is less well understood, as is the extent to which increased clinical training affects measured stress levels. This study indicates an additional intended use of ERCP simulators as a benchmark to ensure how clinical practice is progressing. Furthermore, altered levels of biochemical stress markers in saliva may be used to monitor stress reactions during ERCP simulation procedures.
Introduction
Simulation-based training is widely acknowledged for its effectiveness in providing structured training, homogenized learner skills, and ultimately improved patient outcomes by transfer of the skills acquired to clinical practice.1–3 However, the reciprocal relationship, that is, the impact of clinical experience on simulator performance, is less well understood.
It has been suggested that clinical surgical training enhances performance achieved in simulated procedures since clinical experience provides a deeper understanding of surgical procedures, in turn making simulator performance more focused and effective. One study showed that trainees could apply their clinical surgical experience to improve their performance in the simulator, particularly in preoperative preparedness and intraoperative focus. 4 Another study suggested that clinical experience helped residents perform better in the simulator by enhancing targeted feedback and improvement in the controlled simulator setting. 5 Furthermore, a randomized controlled trial found that junior surgeons who had prior clinical surgical experience performed better in simulation-based assessment. 6 Finally, a systematic review and meta-analysis found that surgeons with greater clinical experience performed better in simulator performance measures such as the Global Rating Scale (GRS) and operation time. This suggests that clinical surgical experience improves performance in the simulator. 7
Little is known regarding the impact of clinical experience on performance of ERCP in a VR simulator. A narrative review concluded that trainees, after clinical experience, show significant improvements in VR ERCP simulation performance, suggesting that practical experience enhances the effectiveness of simulator-based training. 8 Furthermore, a study found that clinical experience combined with immersive virtual reality (IVR) training can enhance the learning process and improve performance benchmarks in the simulator. 9 However, both studies used subjective evaluation assessment, which may have little or no correlation with objective measures. 10
Three salivary stress hormones, chromogranin A (sCgA), cortisol (sC), and α-amylase (sAA), are considered accurate biomarkers of stress and relevant in stress assessment. It has been shown that familiarization with the ERCP simulator greatly reduces stress in terms of less increase in stress marker levels. 11 They have emerged as a valuable tool for assessing stress due to ease of saliva collection. They provide insight into acute and chronic stress responses, reflecting physiological changes associated with stress.
Chromogranin A (CgA): Is a protein released from the adrenal medulla and other neuroendocrine tissues during stress. It is considered to represent activation of the sympathetic-adrenal-medullary (SAM) system and therefore used as a marker of acute stress. It has been used to measure stress responses in various settings, including reproducible VR simulator procedures, 11 although results are sometimes inconsistent. 12
Cortisol (sC): Is a glucocorticoid hormone released in response to stress and is a primary marker of the hypothalamic–pituitary–adrenal (HPA) axis. Levels have a circadian rhythm with a peak in the morning, declining thereafter. 13 The concentration of cortisol in saliva is proportional to the plasma level, thus reflecting activity of the HPA axis. 14 Studies have consistently shown that sC levels accurately reflect the body’s stress response, making it a valuable tool for measuring acute stress. 15 Significant increases in cortisol levels in saliva have been observed during stress-induced situations. 11
Salivary α-amylase (sAA): Is an enzyme that increases in response to stress, reflecting activity of the SAM axis. In surgical practice, sAA could serve as a practical non-invasive method for assessing acute stress in real time. 16 In recent years, sAA activity has emerged as a valid and reliable marker of sympathetic activation in stress research. 17
No agreement has been reached on which of the three salivary stress biomarkers is the most appropriate for acute stress detection. sAA has been suggested as a valid and reliable marker of autonomic nervous system activity in stress research, emphasizing its importance in behavioral medicine. 18 Furthermore, Deneva et al. 12 stated that sC is frequently used as a “gold standard marker of stress,” suggesting its widespread acceptance and reliability in stress evaluation. For the purposes of this study, we used all three biomarkers.
The primary aim of the study was to see how the number of clinical ERCPs performed during the year of the study influenced times taken in the final simulator attempt compared to the first.
A secondary aim was to compare differences in the salivary stress marker levels, between the first and final simulator attempts, and whether any improvements seen correlated with the number of ERCPs performed during the year between.
Methods
This is a single-center longitudinal study performed at University Hospital Pulmed in Plovdiv, Bulgaria. Its initial phase was performed on week 17, 2022 and was repeated 1 year later. Written informed consent was obtained from all participants. The study group consisted of 31 endoscopists under training, 16 females and 15 males. All participants were residents in gastroenterology or general surgery, without any previous experience in ERCP. They began by performing three ERCP procedures with increasing levels of technical difficulty in the simulator (GI Mentor II; Surgical Science Sweden AB, Gothenburg, Sweden), that is, requiring a combination of several different and demanding technical aspects such as for example performing a sphincterotomy, diagnosing a cystic leakage and downstream control with the introduction and correct placement of an endoscopic stent such as in procedure #3:
ERCP Procedure 1: Bile duct stone removal (ERCP Module 1, Case Study 4).
Deep cannulation of the bile duct (BD) with a sphincterotomy catheter, contrast injection to diagnose the common bile duct stone (CBDS), then sphincterotomy followed by stone extraction using an extraction balloon.
ERCP Procedure 2: Diagnosis of hilar stenosis and brush cytology (ERCP Module 1, Case Study 2).
Cannulation of the BD and insertion of a guidewire. Contrast injection to reveal hilar stenosis. After sphincterotomy, brush cytology of the stenosis is performed.
ERCP Procedure 3: Diagnosis of cystic bile duct leakage and treatment with placement of a bile duct stent (ERCP Module 2, Case Study 4).
BD cannulation using a sphincterotomy catheter followed by contrast injection to reveal cystic leakage. After sphincterotomy, a plastic stent is introduced into the common bile duct to prevent the outflow of bile from the common bile duct into the leaking cystic duct as well as to facilitate the downstream flow of bile into the duodenum.
Each procedure was divided into five phases and the times taken to completion of each phase was used as a proxy for performance. No mentor intervention was allowed during the procedures.
For each simulator procedure, the times in seconds (s) taken to completion of five phases were recorded for analysis:
Removal of bile duct stone
Time to view the papilla correctly (s) Time to deep cannulation of the BD (s) Time to stone diagnosis (s) Time to sphincterotomy (s) Time to remove bile duct stone (s)
Brush cytology of hilar stenosis
Time to view the papilla correctly (s) Time to deep cannulation of the BD (s) Time to hilar stenosis diagnosis (s) Time to sphincterotomy (s) Time to brush cytology (s)
Diagnosis of cystic leakage and management with stent placement
Time to view the papilla correctly (s) Time to deep cannulation of the BD (s) Time to cystic leakage diagnosis (s) Time to sphincterotomy (s) Time to stent placement (s)
All procedures were video filmed, and one expert assessor (L.E.), blinded to the name and seniority of the endoscopist, reviewed the videos regarding time to complete the above-mentioned phases.
Procedure #3 started after a rest of 30 min where baseline (BL)-phase saliva sample was collected using an unstimulated passive drool technique. In addition, 10 min after finishing the procedure, a “completion” saliva specimen was collected and subsequently stored at −20 °C within 4 h from collection, for chromogranin A (sCgA), cortisol (sC), and α-amylase (sAA) analyses.
A year later, 26 of the 31 residents who continued with clinical ERCP training (11 females and 15 males) repeated the same procedures exactly as those of the initial phase. The number of clinical ERCP procedures performed by each participant during the year was noted and participants were divided into two groups as follows:
Subgroup A: Those who performed between 20 and 50 ERCP procedures (N = 8; 6 females, 2 males).
Subgroup B: Those who performed >100 ERCP procedures (N = 18; 5 females, 13 males).
Saliva samples were assessed using commercially available kits. For salivary cortisol (SME-1-3002 Salivary Cortisol Research ELISA kit) and for α-amylase (SME-1-1902 Alpha-amylase Kinetic Reaction Kit Research) were used, both from Salimetrics, Carlsbad, CA, USA (www.salimetrics.com). Human chromogranin A (sCgA) was measured with an EIA Kit (Cat. No.: RSCYK070R, BioVendor GmbH Germany). Concentrations of the saliva biomarkers were determined following the manufacturer’s instructions. Percentage differences (%Diffafter) in biomarker levels between before (Valuebefore) and completion (Valueafter) of each phase were calculated for each stress marker:
Statistical analysis
Statistical analyses were performed using JMP Pro V.16.0.0 (SAS, Cary, NC, USA). Student’s t-test was used to compare the differences between the different subgroups of training with operative time as outcome. Univariable linear regression analysis was used to compare saliva hormone levels to the different phases of the ERCP procedures. Multivariable linear regression analysis was used to adjust for the volume of training. p < 0.05 was considered statistically significant.
Ethical considerations
All participating subjects had to read and sign an Informed Consent Form. As the studies did not involve trials on patients but only simulator trials, the assessment was made by the Medical Ethics Committee at University Hospital Pulmed OOD declaring that an ethics application was not necessary. The study was performed in accordance with the Declaration of Helsinki.
Results
Data from 26 participants were analyzed. Table 1 shows the mean, SD, and statistical significance of differences in time to completion of each phase for each simulator ERCP procedure between the initial and second (final) attempts. As can be seen, statistically significant decreases in times taken were seen a year later for all three procedures (p < 0.0001).
Mean, SD, and p-values of the time differences between the various phases of each ERCP procedure performed both at the initial as well as at the second attempt a year later.
As can be seen in Table 2 both subgroups (20–50 and >100 ERCP) improved performance with clinical training. Subgroup B (>100 ERCPs) improved times to completion of all five phases at all final simulator attempts. In contrast, in Subgroup A (20–50 ERCPs), only two of the eight participants successfully completed all five phases of the bile duct stone removal procedure and only five of the eight participants completed the five phases in the hilar stenosis procedure. However, all participants successfully completed the final cystic leakage procedure (ranked as hardest). It is also of interest to note that although all times taken by Subgroup A in the initial attempt were significantly shorter than Subgroup B, these differences did not persist after clinical training (Table 2).
Mean, SD, and p-values of the time differences between the various phases of each ERCP procedure performed both at the initial and at the second attempt in the two subgroups (20–50 ERCP, >100 ERCP) and inter-comparisons among them.
Unadjusted analysis of the saliva biomarker differences between the initial and second attempts one year later in the total group of participants, revealed no statistical differences (Fig. 1).

Differences in increases in salivary stress markers for Subgroups A and B (20–50 and >100 ERCPs) between the first and final attempts during Procedure 3 (simulated cystic leakage). Box plots with grand mean given.
Table 3 depicts saliva biochemical indices level differences in relation to the different procedures of diagnosing and treating cystic leakage with stent placement in VR-ERCP. The differences are presented unadjusted and with adjustment for level of training between first and second attempts. As can be seen in Table 3A, the only significant correlation found was for sCgA and three of the five measured variables (time to deep cannulation of the BD, to cystic leakage diagnosis, to do sphincterotomy) in the cystic leakage procedure after training. When adjusting for level of training between first and second attempts (Table 3B) also sCgA levels showed a significant decrease in relation to time to stent placement (p = 0.0431).
Saliva hormone level differences (%) in biomarker levels in the cystic leakage simulator procedure before and at completion in relation to the five phases of the ERCP procedure.Unadjusted and adjusted for level of training, respectively.
Discussion
It is generally accepted that ERCP training improves clinical performance and that a higher ERCP case volume is associated with safer ERCP and successful outcome. 19 However, little is known about the opposite, that is, does clinical experience enhance simulator performance. The present study shows that clinical training does have a positive impact on performance in a VR simulator. This suggests that simulator training is valid, and that the skills acquired match the surgical competence needed in clinical practice.
Endoscopic retrograde cholangiopancreatography (ERCP) is a complex procedure used in diagnosing and treating conditions related to the biliary tree and pancreatic duct. The ERCP VR simulator is one of many strategies in ERCP training, allowing trainees to develop their skills without the risk of causing harm to the patient. 20 This is particularly important given the high risk of complications associated with ERCP such as post-ERCP pancreatitis. 21
Trainees are generally required to complete a substantial number of ERCP procedures to be regarded as competent. The Joint Advisory Group (JAG) recommends a minimum of 300 hands-on procedures 22 whereas the American Society for Gastrointestinal Endoscopy (ASGE) suggests at least 200 supervised ERCPs. 23 While procedure volume is not strictly a performance indicator, a case volume of 75–100 procedures per year is often recommended for the endoscopist to maintain his or her competence.22,24 Differences exist between individuals regarding learning curves and case volumes needed to maintain ERCP skills. The simulator is an excellent tool to measure actual level of competence acquired by an individual. The present study supports the validity of the simulator for this purpose. Despite the small cohort in this study, the fact that those participants who did not complete the study were females is in accordance with the well-documented, and somewhat unfortunate, abundance of male endoscopists in this field.25,26
GI Mentor II is a high-fidelity simulator for ERCP training. It provides realistic visual and tactile feedback and includes modules for various ERCP procedures, such as bile duct stone removal, brush cytology, and stent placement.17,20
There are several key performance indicators (KPIs) commonly used to assess ERCP competency:
Selective Duct Cannulation Rate: This is the most consistently recommended KPI across guidelines. The target success rate ranges from 80% to 90% for selective cannulation of the desired duct (usually the common bile duct).20,26
CBD Stone Clearance Rate: Successful clearance of common bile duct stones in 75%–80% of cases at the first ERCP attempt.22,24
Stent Placement Success Rate: Successful stent placement in 80-85% of patients with extrahepatic strictures, where appropriate.22,24
In our study we observed several dropouts, mainly among female endoscopists. Female gastroenterologists and surgeons tend to drop out from research studies at higher rates than their male counterparts due to a multifactorial interplay of occupational stressors, work–life imbalance, and systemic institutional challenges as well as an interplay of personal, institutional, and cultural factors. One major contributor is the impact of family responsibilities, particularly concerning pregnancy and motherhood.
Studies have documented that female surgeons invest significantly more time in parenting duties, thereby missing critical work activities and research commitments compared to males. For instance, research shows that female orthopedic surgeons are more involved in family responsibilities, contributing over 12 h per week to childcare compared to less than 3 h for males, leading to a higher risk of burnout among females.27,28
Institutional and cultural barriers further exacerbate research dropout rates among female surgeons. Multiple studies highlight the lack of institutional support, mentorship, and the presence of gender biases in academic surgical departments. For example, qualitative research discovered that new mothers in surgical training often encounter fewer opportunities for career development and receive less mentorship compared to their male colleagues, undermining their engagement in research activities. 29
One critical element is the heightened prevalence of burnout among gastroenterologists, which has been consistently documented in recent reviews. In these studies, female gender is identified as a significant risk factor for burnout, with associated symptoms of emotional exhaustion and depersonalization adversely affecting their capacity to engage in sustained research activities. 30
This study shows that after a year of clinical training, the times required to accomplish ERCP benchmarks substantially decreased. It is worth noting that in the final attempt, the participants were approximately three times as fast to complete the five phases of the three ERCP simulator procedures (BD-stone removal, hilar stenosis, and cystic leakage diagnosis and management) (Table 1). Although completion times for Subgroup A were shorter than those of Subgroup B at the first simulator attempt, these differences did not persist after training (Table 2). Research has shown that experienced endoscopists with greater clinical ERCP experience perform better on simulators compared to those with less experience. 31
In summary, our findings suggest that clinical ERCP experience shortens the time required to complete ERCP phases in the simulator.
We calculated the percentage difference (Diffafter) in the level of each salivary biomarker before and after each phase instead of absolute values to reduce bias caused by external factors such as lack of sleep 32 as percentage difference most accurately reflects the stress reaction to the exercise. However, no statistically significant differences in any of the saliva biomarkers were seen between the initial and the second attempt a year later (Fig. 1). This contrasts with the well-documented stress reduction observed after simulator training for endoscopists in the literature. Furthermore, a recent study showed that familiarization with the ERCP simulator greatly reduced stress as measured by the three salivary stress biomarkers we used, with sAA being the more sensitive stress predictor. 11 Our findings indicate that clinical ERCP experience, even > 100 cases per year, did not significantly reduce stress levels in the simulator. A larger sample over a longer period of time with larger ERCP volumes is required to show whether stress reduction occurs after clinical ERCP training.
When comparing saliva hormone level differences of the three saliva stress biomarkers at each phase of procedure 3 (cystic leakage) between the first and final attempts, no statistically significant differences for sC and sAA were seen for the total cohort and for both subgroups (Table 3). The only significant differences were seen with a decrease in sCgA Diffafter at the second attempt relative to three of the five phases of the procedure. When adjusting for level of training the time to stent placement showed a significant decrease of sCgA Diffafter levels as well (Table 3).
The fact that sCgA Diffafter significantly responded but not cortisol and sAA does not conform with the findings of Tammayan et al. who found a greater response of sC and sAA to stress than sCgA, and that the response pattern of sCgA did not coincide with that of sAA despite them both reflecting sympathetic activity. 33 Nevertheless, whereas in the present study sCgA seemed to be a more accurate stress biomarker, sC and sAA are more frequently employed and acknowledged in stress evaluation studies. 17 These conflicting results could depend on factors such as the mode of stress (psychologic or physical) or timing of saliva collection. Moreover, sCgA is only released from the submandibular glands 34 and thus, the collection method may affect the level detected in saliva. Therefore, sC and sAA may be more reliable than sCgA in the context of evaluating acute stress in surgery, unless the sampling technique is standardized. More studies are needed to elucidate the patterns of sCgA in saliva during stress.
However, mainly due to the small sample size, the findings of our saliva stress biomarkers are exploratory and should be interpreted with caution. Further investigation into the impact of simulator practice on stress biomarkers in experienced ERCP endoscopists could provide valuable insights into the stress-reducing effects of simulator training in this group. Future research should also focus on how clinical experience influences proficiency and decision-making in a simulator.
There are several limitations of our study: First although a plethora of quality indices and measures have been proposed for ERCP, 35 our study focused only on some parameters of the intraprocedural procedure. In addition, it was a single-center longitudinal study with a limited number of participants. The study did not have sufficient power to detect minor increases in the level of stress markers. If the study group had been larger, greater responses of sC and sAA to stress, like sCgA, might have occurred. Second, we only measured stress markers immediately after the procedure, therefore not allowing for any delayed surge such as reported with cortisol. 36 Third, we also did not extend our study in time and volumes to detect any changes in stress reduction. The number of ERCPs performed during the year between the first and final attempts was not regulated by the study protocol but depended on the circumstances under which the participants worked. A more homogeneous study group with standardized routines for ERCP training could have provided more reliable information. Furthermore, using only the parameter time to measure competence can perhaps be considered somewhat suboptimal. However, since this parameter is a continuous variable where, on the basis of the recorded video material, from both the first and second ERCP examinations after one year, it was possible to precisely determine the time for each part performed and compare statistically, so therefore we chose this parameter as a proxy for progress. Last but not least, no other caseload in other endoscopy diagnostic and therapeutic (EUS, ESD, EMR) was taken into account, as these procedures could contribute as undependable variables to the ERCP performance.
Conclusion
The findings of this study suggest that clinical ERCP training improves subsequent performance in a VR ERCP simulator. Clinical ERCP experience may have an impact on subsequent simulator performance probably by improving dexterity and decision-making processes. Therefore, an additional intended use for simulators could be as a benchmark to ensure that the clinical development progresses in a desirable manner. On the other hand, this study showed no conclusive results regarding reduction in stress for sC and sAA after 1 year of clinical ERCP practice, in contrast to sCgA. This was even more pronounced in Subgroup B (>100 ERCPs). We anticipate that if saliva sampling becomes standardized, sCgA could be a marker of stress to consider in studies as well as in clinical training to provide feedback.
Footnotes
Acknowledgements
Dr Peter Cox provided careful language editing.
Author contributions
NB: Conception and design, experiments, drafting of the article.
KG: Conception and design, drafting of the article, final approval of the article.
TD: Stress hormone analyses and interpretation of data.
KS, KM: Data acquisition and analysis.
GS: Critical revision of the article regarding its intellectual content.
LE: Analysis and interpretation of the data, critical revision of the article; final approval of the article.
Data availability
Anonymous data are available and can be provided upon request.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Open access funding provided by Umeå University.
Clinical trial registration
N/A.
