Abstract
Background and objective:
Postoperative pancreatic fistula (POPF) is the leading cause of morbidity and early mortality in patients undergoing pancreatic resection. In addition, recent studies have identified postoperative acute pancreatitis (POAP) as an independent contributor to morbidity. Most perioperative mitigation strategies experimented for POPF have been shown to be in vain with no consensus on the best perioperative management. Clinical prediction models have been developed with the hope of identifying high POPF risk patients with the leading idea of finding subpopulations possibly benefiting from pre-existing or novel mitigation strategies. The aim of this review was to map out the existing prediction modeling studies to better understand the current stage of POPF prediction modeling, and the methodology behind them.
Methods:
A narrative review of the existing POPF prediction model studies was performed. Studies published before September 2022 were included.
Results:
While the number of POPF prediction models for pancreatoduodenectomy has increased, none of the currently existing models stand out from the crowd. For distal pancreatectomy, two unique POPF prediction models exist, but due to their freshness, no further external validation or adoption in clinics or research has been reported. There seems to be a lack of adherence to correct methodology or reporting guidelines in most of the studies, which has rendered external validity—if assessed—low. Few of the most recent studies have demonstrated preoperative assessment of pancreatic aspects from computed tomography (CT) scans to provide relatively strong predictors of POPF.
Conclusions:
Main goal for the future would be to reach a consensus on the most important POPF predictors and prediction model. At their current state, few models have demonstrated adequate transportability and generalizability to be up to the task. Better understanding of POPF pathophysiology and the possible driving force of acute inflammation and POAP might be required before such a prediction model can be accessed.
Keywords
Summary for Twitter
While POPF prediction models are being published in increasing numbers, external validity and calibration are chronically overlooked. Better understanding of the interplay between POPF and postoperative pancreatitis might provide new avenues for predictions.
Context and relevance
Numerous clinical prediction models for pancreatic fistula after pancreatic surgery exist. This review summarizes the state of these prediction models. A shift from perioperative to fully preoperative predictions is observed. Especially, predictors from preoperative radiological imaging have received more focus during the last few years. The majority of published prediction models, however, seem to suffer from suboptimal methodology, high instability of predictions, and lack of external validity. Still, a well-performing pancreatic fistula risk model could have clinical and academic implications. Unfortunately, at their current state, all of the models warrant further exploration and validation before utilization in clinics becomes relevant. While no prediction models for postoperative pancreatitis exist, recent studies have highlighted its clinical relevancy and potential coupling with pancreatic fistula, and its further examination could provide avenues for more accurate predictions.
Introduction
Pancreatic surgery is regarded as a highly challenging branch of abdominal surgery. Its centralization to high-volume centers has led to better short-term patient prognosis. 1 Still, postoperative morbidity remains high.2,3 Regardless of the type of resection—pancreatoduodenectomy (PD) or distal pancreatectomy (DP)—most of the severe complications are related to leakage and failure of pancreatic stump, ultimately leading to postoperative pancreatic fistula (POPF).4–6 The 2016 POPF definition by the International Study Group for Pancreatic Surgery (ISGPS) is based on elevated drain fluid amylase on the third postoperative day and the severity is graded according to the gravity of interventions required, with grades B (antibiotics, drainage) and C (reoperation, organ-failure, death) regarded as clinically relevant. 6 Isolated elevated drain fluid amylase without clinical repercussions is not regarded clinically relevant and is known as biochemial leak in the 2016 ISGPS definition and grade A in the original 2007 ISGPS definition. In this review, POPF abbreviation denotes clinically relevant fistula (grades B and C). Median POPF incidence is approximately 15% after either PD or DP according to a recent systematic review. 7 Significant variability in reported POPF incidence exists probably due to heterogenic interpretation of current POPF definition and differing protocols between centers.
Recent studies have noted postoperative acute pancreatitis (POAP) to possibly have clinical relevance after pancreatic resections.8–11 First proposed in 2016 by Connor, 12 it was defined as an alteration in early serum or plasma amylase levels, with subsequent rise in C-reactive protein denoting a more severe form of POAP. POAP has been speculated to drive the formation of POPF, but studies have also demonstrated it to be independently associated with postoperative morbidity. 9 There has been a lot of discussion regarding the definition and relevancy of POAP. 13 An ISGPS consensus statement for definition of postpancreatectomy acute pancreatitis (PPAP) was recently published. 14 In addition to sustained alteration in early systemic amylase (POAP), it requires the appearance of radiological alterations consistent with pancreatic inflammation.
Postoperative mortality after pancreatectomy has been reported to range from 2% to 9% with the most favorable outcomes reported from high-volume centers.15,16 Studies analyzing the root-causes for postoperative mortality after pancreatectomies have identified POPF to contribute to roughly half of the postoperative mortality. 4 A systematic review reported POPF-related postoperative mortality to have remained constantly at 1% for the last 25 years. 2 The consequences of POPF can be so devastating that even prophylactic total pancreatectomy has been proposed as an alternative for pancreaticojejunostomy for certain high-risk patients. 17 POPF has been shown to be more prevalent after DP compared to PD, but not as severe.18,19 In addition, POPF has been shown to associate with worse overall survival and tumor recurrence for pancreatic ductal adenocarcinoma. 20
Due to the clear negative impact of POPF on short- and long-term outcomes, there has been a high interest for its prediction, prevention, and mitigation.3,21 Some mitigation strategies explored have been different modalities of pancreatoenteric anastomosis in PD, stump closure mechanisms in DP, use of internal and external pancreatic stents, and perioperative somatostatin analogues.3,22 Of these, pasireotide (a type of somatostatin analogue) has shown benefit in randomized controlled trials.21,23 Individualized POPF risk estimation could provide clinicians and researchers a useful tool for controlling case-mix and assessing the effect of different mitigation strategies in a risk stratified setting. Numerous pre- and intraoperative clinical prediction models have been developed for POPF (Tables 1 and 2).
Prediction models for pancreatic fistula after pancreatoduodenectomy.
Prediction models for pancreatic fistula after distal pancreatectomy.
D/V: development/validation; MPD: main pancreatic duct; PT: pancreatic thickness; BMI: body mass index.
Risk factors for POPF
A systematic approach to analyzing pre-, peri-, and postoperative risk factors identified for POPF is out of the scope of this review and has also been performed earlier.40–43 A collection of risk factors that are or could be of interest in POPF prediction modeling is presented in Fig. 1.

Collection of risk factors for pancreatic fistula with focus on prediction.
Soft pancreatic texture and narrow pancreatic duct are the most reported and utilized risk factors for POPF after PD.7,25,28,32,33,41,42 Their coexistence is concordant with benign or extrapancreatic disease, and non-atrophic gland. 44 Their association with POPF is not as well defined after DP, with conflicting evidence on the direction of association.45,46 Pancreatic texture is not ideal in terms of prediction horizon or objectivity as it cannot be assessed preoperatively and there exists no consensus on how to define a “soft” pancreas. In addition, as DP’s are mostly performed minimally-invasive, assessing the texture of pancreatic remnant is difficult and thus the knowledge on pancreatic texture is often lacking for DP-patients in national registries. 47
Acinar cell count at the resection margin could provide a more accurate and objective measure for POPF risk compared to pancreatic texture. A few studies have investigated acinar cell count and noted it to be easily measured and strongly associated with POPF.48,49 One study also demonstrated the higher rate of POPF after DP compared to PD to be attributable to higher median acinar cell count at resection margin. 50
Especially for DP’s, preoperative computed tomography (CT) scan measurements could provide an avenue for objective preoperative POPF risk assessment. A few small-scale studies have reported pancreatic thickness to be associated with POPF after DP.51–53 It has been hypothesized that the amount of pancreatic tissue transected increases the risk of stump leakage, possibly leading to POPF. 39 Other CT measurements, such as psoas muscle thickness, visceral fat proxies, pancreatic density (Hounsfield unit, HU), and duct diameter have also been assessed, but their association with POPF has not been noted to be as unanimously strong as with pancreatic thickness.38,39,43,54–57 A study by our group also noted some of these variables, for example duct diameter, to have significant interobserver variability when measured. 39
Other, non-pancreas-specific risk factors, such as male sex, body mass index (BMI), hypoalbuminemia, and age have also been demonstrated to associate with POPF.28,32,47,58,59 As pancreatic fistula is in essence a process of the pancreatic remnant, these non-pancreas-specific factors probably represent important confounders for the relationship of pancreas-specific variables (such as texture, thickness, and histology) and POPF.
Prediction modeling
The aim of clinical prediction model is to provide estimates for the probability of a certain event occurring in the future. The process of developing and validating a prediction model has been thoroughly explained in existing literature and is not going to be exhaustively described in this review.60,61 In a nutshell, a prediction model should be developed in an adequately sized patient cohort and validated in a geographically or temporally distinct patient cohort to ensure its transportability and generalizability. Most prediction models are constructed with logistic regression methods, although artificial intelligence (AI)-based solutions are applied at an increasing rate. TRIPOD statement (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis), which is a reporting guideline and a 22-item checklist for prediction model studies, should be used when conducting a prediction model study to ensure correct reporting, permitting reproducibility in validation studies. 62
The performance of a prediction model is usually assessed with discrimination and calibration, albeit systematic reviews have noted calibration to be an unfortunately underreported measurement. Discrimination, measured with area under receiver-operating characteristic curve (AUROC), is the model’s ability to classify patients with an event from patients without it. It receives values between 0.5 and 1.0, and is generally interpreted as 0.60–0.69 poor, 0.70–0.79 adequate, 0.80–0.89 strong, and over 0.90 excellent. Calibration is the agreement between predicted and observed probabilities and is usually illustrated with calibration plotting (predicted probability on x-axis, observed probability on y-axis), and measured with calibration curve intercept (optimal value 0) and slope (optimal value 1). For further reading on calibration, the authors recommend the article by Van Calster et al. 63
It has been noted that the methodology behind the majority of published prediction models, regardless of subject, suffer from insufficient reporting, suboptimal methodology and high risk of bias for overly optimistic measures of performance. 64 For example, a systematic review found out that essential information for model usage was incomplete in over 80% of prediction model studies published. 64
Prediction models for POPF after PD
Prediction models for POPF after PD from the last decade are presented in Table 1. In terms of utilization, the Fistula Risk Score (FRS), alternative-FRS (a-FRS), and updated-alternative-FRS (ua-FRS) are the most used to date.25,28,32 They are intraoperative models with pancreatic texture and duct diameter as the model backbone. Intraoperative blood loss was included in the original FRS, but the evidence for its relationship with POPF has been demonstrated conflicting, and thus the following models excluded it. 65
Temporally, Yamamoto et al. were the first to develop a prediction model for POPF in 2011, but their model did not receive wide-spread utilization, probably due to variables that were difficult to assess, such as duct diameter/pancreatic thickness—index and their obscure “away from portal vein”—variable, as well as suboptimal performance upon external validation studies (pooled AUROC 0.61). 66
In a systematic review by Pande et al., 66 all prediction models for POPF after PD published before 2020 were analyzed for their performance in external validation studies. It turned out, that while the AUROC-values in the original development studies for some models were reported to be very high, for example, 0.94 for FRS, the pooled AUROC values from all external validation studies were 0.71 for FRS and 0.72 for ua-FRS (which performed the best).
One problem with the older models has been their intraoperative setting, or even postoperative in case of histology or postoperative laboratory values as a predictors. Intraoperative risk stratification does not allow for preoperative planning, patient information, randomization, and so on. Recently, many different prediction models with focus on preoperative prediction of POPF after PD have been published.30,33–36 Only the models by Perri et al. and Shi et al. were externally validated with AUROC values upon validation at 0.65 and 0.81, respectively. The three others lacked external validation, rendering assessment of their performance more unreliable. Perri et al. 33 developed a risk-tree model which depends on duct diameter and BMI. The problem with risk-trees is the inherent dichotomization of predictors, which from a methodological standpoint is not advisable and translates often to suboptimal performance in external validation. 67 Shi et al. describe the development and validation of CT-FRS, which relies heavily on preoperative assessment of CT-scans requiring external software. 30
A few AI-based prediction models for POPF after PD have been developed.34,31,37 While their performance is often claimed superior by the authors to the more conventional regression models, their reproducibility and transportability is low. Often, an external machine learning algorithm is used to fit all the available variables into a machine learning model, resulting in severe overfit (e.g. 18 variables for 37 cases of POPF by Lin et al. 34 ). No external validity is reported due to the fact that the models are practically non-transportable black boxes. None of the studies reported calibration or external validity, rendering the interpretability of results low.
Prediction models for POPF after DP
There existed no prediction model for POPF after DP until 2022 (Table 2). Ecker et al. 47 analyzed over 2000 patients in 2019 aiming to develop a prediction model for POPF after DP. However, they could not identify strong enough predictors. Their multivariable model showed an AUROC of 0.65 during development. It was speculated that as pancreatic texture was missing in nearly half of the patients, and thus could not be used as a predictor, their model simply lacked pancreas-specific variables.
In 2022, two distinct preoperative prediction models were published, the D-FRS by De Pastena et al. 38 and the DISPAIR by our group. 39 Both of the models utilized preoperative assessment of pancreatic aspects from CT scans. The D-FRS utilizes pancreatic thickness and pancreatic duct diameter at pancreatic neck and showed an AUROC of 0.73 upon pooled internal-external validation. 38 The DISPAIR utilizes pancreatic thickness at transection site, transection at neck and diabetes as predictors and showed an AUROC of 0.80 upon external validation. 39 In regards to calibration, the D-FRS is claimed to be perfectly calibrated with calibration slope at 1 and intercept at 0, which is very rarely seen in prediction model studies. The DISPAIR retained adequate calibration in external validation (intercept 0.19, slope 0.72) and also showed good performance in different subgroup analyses. 39 The D-FRS also has an intraoperative version with additional predictors, but it has not been externally validated.
Prediction of postoperative pancreatitis
Due to its freshness, only one study could be identified assessing the risk factors for ISGPS defined PPAP, identifying soft pancreatic texture and postoperative C-reactive protein over 180 mg/l as independent risk factors. 68 However, there is a growing body of evidence for factors associated with POAP.9,69 POAP and POPF seem to share the majority of identified risk factors with pancreatic texture, high BMI, narrow duct, operative complexity, non-malignant histology, and male sex predisposing for POAP.8,9
No prediction models for POAP or PPAP exist to date, but one study demonstrated that POPF prediction models could be used to predict clinically relevant POAP (elevated amylase and C-reactive protein over 180 mg/l on second postoperative day), with ua-FRS performing the best with an AUROC of 0.83. 9
Pitfalls in previous studies
As noted above, a majority of studies describing the development and validation (if performed) of a prediction model have serious methodological weaknesses and unfortunately most prediction models for POPF do not stand out in this sense. A systematic review of POPF prediction models for PD found that most models either lacked external validation or if conducted had optimistic performance debunked in the external validation. 66 This overiftting (overly optimistic measures of performance) springs up from intrinsic and inherent associations of the variables and the outcome in a specific data set. In other words, a prediction model always works best in the population it is developed in, but this “extra” performance (or optimism) is not translated to the external (or actual) model performance. Overfitting is especially strong when sample size is inadequately small, as is the case with the majority of POPF prediction model studies. Some studies have also mistakenly compared other prediction models to their recently developed model in the same data set that was used for the actual development, leading to false conclusions of superiority. 35 The two recent models for DP have not yet undergone any additional external validation and their transportability remains to be seen. Of the seventeen prediction models included in Tables 1 and 2, only four reported true external validity in the original study; the risk tree model, the a-FRS, CT-FRS, and the DISPAIR. As a tool with clinical implications, a non-validated prediction model might be dangerous, since its utilization could possibly lead to false predictions. Furthermore, of the 14 studies published after 2015, only five adhered to the TRIPOD statement (published in 2015). 62
Discussion
The number of POPF prediction models being published seems to increase almost exponentially, but suboptimal methodology behind development, lack of external validation or unreliable external performance leads to unusable prediction models. This finding is true for the more conventional regression models as well as for novel AI-based models. It is unclear to the authors why studies with new but mediocre prediction models are being developed with inadequate sample sizes and published in a seemingly accelerating rate as opposed to further updating and validation of existing models. In terms of methodology, only few of the currently existing prediction models could be considered for further exploration in clinical context. However, pre-existing models have identified promising new preoperative risk factors.
For PD, the FRS, a-FRS-, and ua-FRS are the most extensively validated models. However, their performance in terms of external AUROC values range from 0.61 to 0.80, 0.62 to 0.79, and 0.67 to 0.76, respectively, with pooled external AUROC values at 0.71, 0.70, and 0.72 indicating adequate, but not particularly good performance. 66 While they have been utilized in studies assessing risk-adjusted outcomes and mitigation strategies, the external performance seems subpar. Also, intraoperative variables are required for the predictions with these models leading to loss of utility. On the other hand, most of the newer prediction models cannot be reliably assessed due to lack of external validation. Statistically speaking, CT-FRS by Shi et al. shows promise with externally validated AUROC of 0.81, 30 but calibration was not assessed. However, predictors utilized in their model seem challenging to assess, with the need for complex external software which might render model’s transportability low. In terms of prediction models for POPF after DP, it seems too early to judge the superiority between D-FRS and DISPAIR, and further comparative external validation is warranted.
As stated by others as well, no consensus for the best prediction model for POPF after PD exists and it seems that no model published to date endures in external validation. 66 We can come up with two different explanations; accurate prediction of POPF might be impossible, or existing models lack correct predictors. Perhaps, better understanding of POPF pathophysiology is required before an accurate model can be developed. As postoperative pancreatitis (POAP and PPAP) has been proposed a driver for POPF, perhaps, its better understanding could provide avenues for POPF prediction. One question arising is if we should switch the focus from trying to predict POPF to trying to predict postoperative pancreatitis.
One appealing source of possible predictors is preoperative imaging. Most of the recent models have utilized some form of measurements from CT scans. The complexity of included predictors ranges from very sophisticated measures requiring external software or AI applications (such as remnant pancreatic volume) to simple approximations of pancreatic anthropometric measures (such as pancreatic thickness). Since the objective is eventually to successfully use the model in clinics, there needs to be a balance between reproducibility and specificity of the measures used. Too complex model with numerous measurements is hard to adopt, while too simple model risks producing unreliable predictions.
The benefits of a well-performing prediction model could be multifold. Identifying the patient-specific baseline risk for POPF would allow for better comparison of outcomes, benchmarking, and surgeon-specific training. In addition, preoperative risk stratification could be used to broaden criteria for operation, allow for more informed clinical decision-making, and provide means for benchmarking. A recent review demonstrated almost all of the randomized-controlled trials (RCTs) evaluating perioperative intervention strategies for POPF after PD to be underpowered. 3 Perhaps, risk stratification could allow for better evaluation of POPF mitigation strategies with future interventions directed to certain risk groups mostly benefiting from them. At any rate, studies examining the utility of the prediction models are warranted.
Conclusion
POPF continues to vex all pancreatologists and bother patients. Tremendous efforts have led to numerous POPF prediction models, and future efforts in POPF prediction should be to focus on quality rather than quantity. Above all, international multicenter studies investigating the role of novel preoperative risk factors should be commenced. Also, more studies focusing on the actual clinical utility of POPF predictions are warranted. One future prospect should be to find consensus on the important POPF predictors, eventually leading to “the” prediction model.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Helsinki University Hospital Research Funds.
