Abstract
Background:
Accurate survival estimation is essential for decision-making in advanced cancer care. However, clinical judgment alone often tends to overestimate survival. The performance of prognostic tools has scarcely been explored in Latin American populations.
Objectives:
To prospectively evaluate and compare the predictive performance of four prognostic scales: the Palliative Prognostic Index, the Performance Status–Palliative Prognostic Index, the Palliative Prognostic Score, and the Delirium-Palliative Prognostic Score, against the clinical judgment of specialists in a Colombian cohort.
Design:
An observational, analytical, prospective cohort study was conducted.
Methods:
The study included 166 patients with advanced cancer admitted to a specialized Palliative Care Unit in Colombia. Participants were followed for up to 90 days or until death. We compared the discrimination of the scales using Harrell’s Concordance Index (C-index) and area under the receiver operating characteristic curve at 7, 30, and 90 days. Calibration was assessed using calibration plots.
Results:
Specialist clinical judgment, the Palliative Prognostic Score, and the Delirium-Palliative Prognostic Score demonstrated excellent discriminatory capacity for short-term survival prediction, with a concordance index greater than 0.8. Clinical judgment tended to underestimate 7-day survival, while all tools showed a tendency to overestimate 90-day survival. A high short-term mortality rate was observed, with nearly 50% of patients dying within 30 days of admission.
Conclusion:
In specialized palliative care settings in Latin America, combining expert clinical judgment with the Palliative Prognostic Score or its delirium variant is recommended for prognostication. The Palliative Prognostic Index and its performance status-based variant are useful alternatives in nonspecialized settings. The high short-term mortality observed highlights a systemic issue of late referral to palliative care services in the region.
Plain language summary
For patients with advanced cancer, knowing approximately how much time they have left can be very important. This information helps patients, their families, and doctors make crucial decisions about medical care and personal matters. Doctors often make an estimate based on their experience (this is called “clinical judgment”), but these estimates can sometimes be overly optimistic. There are also special tools (or “scales”) designed to help make these predictions more accurate, but we didn’t know how well they worked for patients in Latin America. In this study, we worked with 166 adult patients with advanced cancer who were in a palliative care unit in Colombia. Palliative care is a type of medical care focused on providing comfort and improving quality of life when an illness can no longer be cured. We compared the survival predictions made by palliative care specialist doctors against the predictions from four different scoring tools. We found that both the judgment of the specialist doctors and two of the tools (the “Palliative Prognostic Score” and its version that includes “delirium”) were very accurate at predicting who might pass away in the near future (short-term). However, when trying to predict survival over a longer term (like 90 days), both the doctors and all the tools tended to be too optimistic. A key finding was that half of the patients in our study died within the first 30 days of being admitted to the palliative care unit. This study shows that, in this setting, it is best to combine the specialist doctor’s experience with one of the tools that performed well. It also highlights a significant problem: many patients are being referred to palliative care very late in their illness.
Introduction
Cancer is one of the leading causes of death worldwide, with 9.7 million global deaths in 2022. 1 Of these, 14.4% were in the Americas, where 45% of deaths occurred in individuals under 70 years of age. 1
In the management of advanced oncological disease, a precise estimation of survival constitutes a cornerstone for clinical decision-making and advanced care planning.2,3 Accurate prognostication is essential to respect patient autonomy, align the therapeutic plan with their values, and facilitate an informed decision-making process.4,5 Furthermore, effective prognostic communication contributes to mitigating anxiety within the family and structuring end-of-life preparations.4,5
Nevertheless, the clinical estimation of survival is subject to a documented tendency toward overestimation, a bias that can lead to the application of futile interventions and a deterioration in quality of life.6–9 A strategy to mitigate this bias is the use of validated prognostic scales. Despite their proven utility, the lack of comparative studies evaluating the performance of these tools in Latin America represents a significant barrier to their evidence-based implementation.4,5,10
The main objective of this study was to evaluate and compare the performance of four prognostic scales: the Palliative Prognostic Index (PPI), the Performance Status-Based Palliative Prognostic Index (PS-PPI), the Palliative Prognostic Score (PaP), and the Palliative Prognostic Score with Delirium (D-PaP), and the specialist’s clinical judgment in estimating survival for cancer patients treated at a palliative care center in Colombia. For this purpose, the specific objectives were to characterize sociodemographic and clinical variables, to determine the observed survival, to describe the distribution of the scores and clinical judgment, to compare the median survival between risk groups, and to evaluate the predictive performance of each instrument.
Methodology
A prospective, analytical, observational cohort study was conducted on patients with a primary diagnosis of oncological disease who attended the Palliative Care Unit of the Hospital Internacional de Colombia. This institution is a high-complexity (quaternary level), academic medical center, and a regional reference hub for oncology, serving a catchment area of approximately three million inhabitants. As part of the Fundación Cardiovascular de Colombia, it holds Joint Commission International accreditation, ensuring rigorous compliance with international standards for pain management. The unit manages an estimated monthly volume of 1000 patients. The study was conducted from November 1st, 2024 to February 28th, 2025. During this time, each patient was identified, enrolled after providing informed consent, and followed from study admission until death or the end of the follow-up period.
Inclusion criteria were patients over 18 years of age with advanced oncological disease under entirely palliative management who attended the Palliative Care Unit. Patients who resumed curative management during the follow-up were excluded.
For participant enrollment, once eligible individuals were identified, the study’s objectives and scope, as well as related ethical information, were explained. Those who agreed to participate provided their authorization via informed consent. Subsequently, sociodemographic, clinical, and prognostic scale variables were collected from the clinical history. The online calculator predictsurvival.com was used to ensure uniformity in the calculation of composite index scores. It is an open-access, nonprofit web-based tool developed by Dr. David Hui (Professor and Director of Palliative and Supportive Care Research at the University of Texas) and his colleagues, based on the identification of associations between multiple clinical signs in patients admitted to acute palliative care units and multiple validated prognostic algorithms of survival derived from the published literature, focusing on patients with advanced cancer, to support palliative interventions, referrals to acute palliative care units, and to provide improved guidance for patients and their families. The aforementioned tool was used by the participating physicians after completing the Clinical Prediction of Survival to avoid influencing the clinical judgment results. In addition, all information was recorded in a secure electronic form designed for this purpose on the REDCap platform (Vanderbilt University, Nashville, TN, USA) hosted at Fundación Cardiovascular de Colombia. A follow-up of at least 90 days of each participant’s evolution was conducted through periodic review of their electronic medical records and vital statistics to establish their actual survival time.
The primary outcome variable (dependent) was survival, defined as the time in days from the date of the prognostic scale application to the date of the participant’s death or until the end of participant follow-up. The exposure variables (independent) were the prognostic scale scores recorded at the participant’s entry into the study (PPI, PS-PPI, PaP, and D-PaP), as well as the palliative care specialist’s clinical prediction of survival; these scales were selected due to their feasibility in the Latin American context, given their international validity, their noncancer-type–specific nature, their open-access availability, and their reliance on variables routinely collected in our clinical setting. Likewise, the characterization variables encompassed sociodemographic aspects such as age, sex, place of birth, religion, and civil status, and clinical variables such as primary oncological diagnosis, edema, oral intake, delirium, anorexia, dyspnea, leukocytes, lymphocyte percentage, and comorbidities. All data were obtained from the information recorded in the clinical history.
The information for each patient was recorded by the principal investigator to guarantee data collection uniformity and mitigate information bias. A nonprobabilistic convenience sampling was employed in which, to reduce selection bias, all eligible and consenting patients were included consecutively throughout the study period. The clinical estimation of survival was performed by one of two specialists who worked in the Hospital, who were not authors of this article but participated in the recollection of data. The data analysis was carried out on an anonymized database. A sample size estimation was performed for the population, with a 95% confidence interval and a 5% margin of error. The result of this calculation indicated a minimum required sample size of 166 participants, consistent with the standards for the validation of prognostic models, which recommend a minimum of 10–15 events per predictor variable evaluated to ensure model stability and the accuracy of the estimations.
Missing information for survival analyses was handled by data censoring, considering the low percentage of losses. The data were tabulated in an electronic database using the REDCap instrument and analyzed using the specialized software STATA® version 17.0 (StataCorp LLC, College Station, TX, USA) and the Python programming language (Python Software Foundation, Wilmington, DE, USA).
Statistical analysis
For the descriptive analysis, categorical variables were described as absolute and relative frequencies with 95% confidence intervals. Continuous variables, such as prognostic scale scores and the specialist’s clinical prediction of survival, were assessed according to their distribution; those with a normal distribution were presented as means and standard deviations, while quantitative variables with a nonnormal distribution were reported as medians and interquartile ranges (IQR).
For the survival analysis, overall survival and subgroup survival were analyzed using Kaplan-Meier curves to visualize the differences between the risk groups defined by each prognostic scale. The statistical comparison between these curves was performed using the Log-Rank test.
For the performance evaluation, receiver operating characteristic (ROC) curves were generated to assess the predictive capacity at 7, 30, and 90 days, and this was quantified by calculating the area under the curve (AUC) for each scale. Subsequently, the global discriminatory ability of the tools was assessed with Harrell’s Concordance Index (C-index).
Finally, the accuracy of each scale was evaluated using calibration plots. The cohort was divided into risk deciles, and the mean predicted probability was plotted against the observed survival probability for each decile. Calibration performance was assessed by visual inspection of the agreement between the plotted points and the ideal 45° diagonal, identifying patterns of overestimation or underestimation. For all analyses, a p value <0.05 was considered statistically significant.
Results
A total of 166 patients who met the selection criteria were included. Among these, 8 participants had insufficient paraclinical data, which precluded the calculation of the PaP and D-PaP scales for analysis. During the follow-up, one participant was lost due to lack of contact, resulting in a final cohort of 165 patients for survival analysis. However, the descriptive analysis of the baseline characteristics was conducted on the entire sample (n = 166).
The sociodemographic and clinical characteristics of the cohort are summarized in Table 1. Participants were characterized by a median age of 65 years, a predominance of female participants (57.8%), a catholic religious affiliation (78.3%), and most were single or widowed (52.4%). Regarding clinical characteristics, we found that the median number of comorbidities was one, with arterial hypertension (34.9%) and diabetes mellitus (16.9%) being the most prevalent. The most frequent oncological diagnosis was gastrointestinal (34.9%). The majority of patients were recruited from inpatient services (57.8%). Clinically, dyspnea (35.5%) and reduced oral intake (31.9%) were the most prevalent symptoms, in addition to a high prevalence of inflammatory markers, such as leukocytosis (42.2%). All other cohort characteristics are detailed in Table 1.
Sociodemographic profile and clinical characteristics of the cohort (n = 166).
IQR: interquartile range.
Concerning the multidimensional prognostic indices at baseline, a median PPI of 3.75 and a median PS-PPI of 6 were identified. This finding is consistent with the median PaP and D-PaP scores of 6.5, as well as with the median clinical survival estimation of 60 days.
Within the first 7 days, 29.7% of the participants died, at 30 days, 47.9% had already died, and by the end of the follow-up, 70.9% of participants had died. All tools evaluated demonstrated the capacity to stratify patients into prognostic risk groups with differing survival outcomes. A consistent prognostic gradient was observed for every tool: as the score indicated a worse state, the observed median survival decreased in a statistically significant manner, as shown in Table 2.
Survival based on risk groups of multidimensional survival scales (n = 165 for PPI and PS-PPI, and n = 157 for PaP and D-PaP).
PPI: Palliative Prognostic Index; PS-PPI: Performance Status-Based Palliative Prognostic Index; PaP: Palliative Prognostic Score; D-PaP: Palliative Prognostic Score with Delirium.
The stratification of the risk groups is illustrated in the Kaplan-Meier survival curves in Figure 1. The curves for each stratum of the scales show statistically significant differences in survival (Log-Rank test: p < 0.05) according to distinct prognostic risk groups.

Kaplan-Meier curves for prognostic tools.
An AUC analysis was accomplished using ROC curves, as depicted in Figures 2 to 4. A comparative evaluation of the performance of the different prognostic tools at specific time points revealed that all exhibited a predictive capacity ranging from acceptable to good, consistently proving to be statistically superior to a random guess at every time point assessed. A general trend was observed where the performance of all scales peaked at 7 days and progressively decreased at 30 and 90 days.

ROC curves for the predictive capacity of prognostic tools at 7 days.

ROC curves for the predictive capacity of prognostic tools at 30 days.

ROC curves for the predictive capacity of prognostic tools at 90 days.
The specialist’s clinical judgment, the PaP score, and its Delirium variant (D-PaP) were the tools with the highest predictive capacity, with no significant differences among them. These tools proved particularly robust for short-term prediction, showing the highest AUC values for predicting survival at 7 and 30 days. The remaining tools also exhibited clinically good performance, albeit inferior to the other tools.
An analysis of the overall discriminatory performance of the prognostic scales was conducted using the C-index, as illustrated in Figure 5. Considering values above 0.80 as indicative of optimal performance, the specialist’s clinical prediction consistently demonstrated the highest discriminatory ability, followed by the PaP and D-PaP scales. In contrast, the PS-PPI and PPI scales exhibited good discriminatory performance (C-index >0.75), although their performance was inferior to that of the other scales and to the specialist’s prediction.

Bar graph of the global discriminatory performance of prognostic tools.
The accuracy of the prognostic instruments, assessed via calibration plots as shown in Figure 6, indicates that, in general, the agreement between predicted and observed survival was stronger in short- and medium-term prognoses (7 and 30 days).

Calibration plots for prognostic tools at 7, 30, and 90 days.
Regarding 90-day predictions, a general decline in calibration was observed across all tools, with a tendency to overestimate the probability of survival. Conversely, the clinician’s prediction showed a distinct pattern of underestimating 7-day survival. A synthesis of the calibration analysis is provided in Table 3.
Synthesis of calibration plots (excellent, good, scattered, overestimation, or underestimation of survival).
PPI: Palliative Prognostic Index; PS-PPI: Performance Status-Based Palliative Prognostic Index; PaP: Palliative Prognostic Score; D-PaP: Palliative Prognostic Score with Delirium.
A univariate analysis was executed to identify variables associated with the outcome of interest. These variables were then included in a multivariate Cox regression model to control for possible confounding variables. In this adjusted model, none of the sociodemographic, diagnostic group, or comorbidity variables were found to be statistically significant independent predictors of decreased survival, meaning they were not associated with the primary outcome, and no further adjustments were performed to the results.
Discussion
Main findings of the study
This prospective and comparative study evaluated the prognostic performance of four scales (PaP, D-PaP, PPI, and PS-PPI) and the specialist’s clinical judgment in a cohort of patients with advanced oncological disease in Colombia. The cohort, with a median age of 65, was characterized by a marked functional deterioration and a substantial symptom burden. A high short-term mortality rate was noted, with nearly 50% of patients having died by 30 days, suggesting a delayed referral to palliative care services.
What this study adds
The primary outcome of the study demonstrated that all the prognostic instruments evaluated successfully stratified patients into risk groups with statistically significant disparities in survival. With respect to global discriminatory capacity, the specialist’s clinical judgment, the PaP, and the D-PaP exhibited a C-index above 0.8 (ideal), whereas the PPI and PS-PPI performed at a lower level, yet one considered good (C-index >0.75). The calibration analysis suggested greater precision for short- and medium-term predictions (7 and 30 days) and a widespread inclination to overestimate 90-day survival. Furthermore, the specialist’s clinical judgment notably underestimated survival at 7 days. Lastly, the multivariate analysis failed to identify the associated variables as independent predictors of mortality.
The specialist’s clinical judgment demonstrated an excellent discrimination capacity (C-index = 0.811), validating its pivotal role in prognostic assessment. Despite this, imperfect calibration was observed, with a tendency to overestimate 90-day survival, a bias consistent with the published literature, and, as a novel finding, a short-term underestimation, a phenomenon hypothesized to stem from a conservative clinical approach in severely deteriorated patients. 8 Notably, the participating specialists had less than 5 years of experience. According to the literature, early-career professionals are particularly susceptible to prognostic variability and thus derive the greatest benefit from integrating objective scales. 9 These results reaffirm the imperative to supplement clinical experience with objective instruments to optimize their calibration. The performance of the multidimensional scales was consistent with existing evidence: the PaP (C-index = 0.808) and the D-PaP (C-index = 0.807) exhibited high accuracy.2,3,11,12 A unique contribution of this study is the formal analysis of their calibration, which was found to be good for up to 30 days. On the other hand, the PPI and PS-PPI were instruments with good performance, although with a lower discriminatory capacity compared to the other scales, a finding that aligns with the literature.2,3,12,13 However, the value of these latter tools resides in their independence from paraclinical data and expert judgment, thereby facilitating their implementation in nonspecialized environments. 12
Our findings align with the European Society for Medical Oncology (ESMO) Clinical Practice Guidelines, which recommend integrating multivariable prognostic models, such as the PaP, to complement clinical judgment in patients with an expected survival of weeks to months. 14 Although ESMO warns of a general tendency toward survival overestimation, our study identified a distinct pattern of short-term underestimation (7 days) among specialists, likely reflecting a conservative clinical stance in cases of severe functional deterioration. 14 Furthermore, the excellent discriminatory capacity (C-index >0.8) of the PaP and D-PaP scales in this Colombian cohort validates the applicability of international standards within the local clinical context. Finally, the high 30-day mortality rate (47.9%) underscores the systemic issue of late referral emphasized by the guidelines, suggesting that validated scales should serve not only as tools for individual prognostication but also as vital metrics to drive policy changes and improve early palliative care integration in the region. 14
The external validity of the study must be interpreted in the context of a literature developed mainly in Asian populations. 3 By providing evidence on a Colombian cohort, this research has high applicability for similar sociodemographic profiles in Latin America. However, the extrapolation of these results to regions with different characteristics requires caution. Another fundamental contribution of this research is that it addresses the scarcity of comparative evidence and the reported heterogeneity in international studies, which highlights the need to study local populations.6,15 Clinically, for specialized palliative care units, it is confirmed that the combination of expert judgment with the PaP and D-PaP constitutes the gold standard. Even so, it is crucial to consider nonspecialized environments, where scales like the PPI and PS-PPI offer highly applicable alternatives for centers with limited access to specialists or paraclinical resources. Finally, the brief observed survival period is an indicator of delayed referral to palliative care, which represents a public health implication of paramount importance for healthcare systems in Latin America.
Strengths and limitations of the study
The study’s strengths include its prospective cohort design, which minimizes information and recall biases; its status as the first comparative evaluation of these instruments within a Latin American context; and a low loss-to-follow-up rate (<10%), which reinforces internal validity. Nonetheless, the interpretation of the findings must consider the methodological limitations intrinsic to the design. These include the use of nonprobabilistic sampling, whose representativeness was addressed to be improved by the consecutive inclusion of patients; the evaluation by a limited number of specialists, which, while improving consistency, restricted the analysis of inter-observer variability; and finally, an inherent risk of observation bias, as clinicians were aware their performance was being monitored.
Regarding clinical variables, although pain prevalence is highly relevant, it was excluded from the final analysis. This decision was made because a standardized assessment tool could not be uniformly administered to all patients due to neurological deterioration, delirium, or terminal-stage conditions in a subset of the cohort. Therefore, to ensure methodological rigor, we prioritized data reliability and restricted the analysis to the variables strictly required for the validated prognostic scales.
Conclusion
In the context of specialized palliative care in Latin America, this study supports the adoption of a synergistic approach to prognostication. While the specialist’s clinical judgment remains the cornerstone of assessment, its susceptibility to calibration bias, specifically short-term underestimation and long-term overestimation, underscores the necessity of complementing it with objective instruments, such as the PaP or D-PaP scores, to achieve maximum accuracy. Conversely, in resource-limited or nonspecialized settings, the PPI and PS-PPI emerge as valuable, efficient alternatives that function independently of paraclinical data. Ultimately, the high prevalence of short-term mortality observed in this cohort highlights a critical systemic delay in access to palliative care. Consequently, validated prognostic scales should be implemented not merely as tools for individual accuracy, but as objective criteria to guide health policy and trigger earlier referral to supportive care services across the region.
Supplemental Material
sj-xlsx-1-pcr-10.1177_26323524261423214 – Supplemental material for Performance of clinical judgment and prognostic scales for survival prediction in palliative care: A prospective cohort study
Supplemental material, sj-xlsx-1-pcr-10.1177_26323524261423214 for Performance of clinical judgment and prognostic scales for survival prediction in palliative care: A prospective cohort study by Andres Felipe Mantilla Santamaria, Angela Julieth Sandoval Anaya, Jhon Faberth Tellez Camargo, Linnel Estefania Padilla Guerrero, Hector Julio Melendez Florez and Omar Fernando Gomezese Ribero in Palliative Care and Social Practice
Footnotes
Acknowledgements
The authors gratefully acknowledge Universidad Industrial de Santander, Hospital Internacional de Colombia, and Painfree SAS for providing the institutional support and protected time necessary to conduct this study. We also extend our sincere appreciation to the palliative care specialists for their contribution in providing the clinical prognostic estimates essential for this study, and to the general practitioners and nursing staff of the Palliative Care Unit at Hospital Internacional de Colombia for their valuable assistance in the identification of eligible participants and their support throughout the data collection process.
ORCID iDs
Ethical considerations
This study was conducted in accordance with the ethical principles for medical research involving human subjects established in the Declaration of Helsinki. The research protocol was reviewed and approved by the Ethical Institutional Review Boards of the Universidad Industrial de Santander (Code: 4110, Act 21, August 9, 2024) and the Hospital Internacional de Colombia (Code: CEI-2024-08050, Act 633, June 27, 2024).
Consent to participate
All participants were provided with a detailed explanation of the study’s objectives, procedures, and ethical considerations. Written informed consent was obtained from all participants or their legal guardians before their inclusion in the study. The confidentiality of patient information was strictly protected. All data were anonymized by assigning a unique code to each participant in the final database to ensure privacy.
Consent for publication
Written informed consent was obtained from all participants or their legal guardians regarding the publication of the data and results derived from this study. The consent document explicitly included authorization for the publication of anonymized data and analysis resulting from their participation.
Author contributions
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
