Abstract
Burnout among intensive care unit (ICU) clinicians is a persistent threat to healthcare quality and clinician well-being, yet existing assessment methods often lack the contextual precision needed to guide effective interventions. This study introduces a novel integration of Bayesian hierarchical modeling, generalizability theory (G-Theory), and item response theory (IRT) to advance the psychometric assessment of burnout among intensive care clinicians a population particularly vulnerable to chronic occupational stress. The study examined the reliability and contextual sensitivity of burnout measurement tools among 462 ICU clinicians across 9 hospitals in Ghana, South Africa, and Botswana. By fusing G-Theory and IRT within a Bayesian framework, the research addressed a critical gap in understanding how burnout manifests across individuals, items, and institutional settings. The findings revealed that the greatest variance in burnout occurred at the individual level (emotional exhaustion [EE]: 18.72, P < .001; depersonalization [DP]: 12.65, P < .001), reflecting significant personal differences in burnout experiences. Item-level variance was also statistically significant (EE: 4.23, P = .001; DP: 3.12, P = .004), indicating effective item discrimination. Although hospital-level variance was smaller (EE: 2.05, P = .025; DP: 1.76, P = .045), it still pointed to contextual influences. Significant interaction effects (eg, person × item and person × hospital) further emphasized the complex interplay between individual traits and organizational environments. Moderate residual variance across EE (5.44, P = .001), DP (4.02, P = .001), and personal accomplishment (6.15, P < .001) suggested some unexplained variability that may warrant further qualitative exploration. IRT analysis supported the psychometric strength of the items, showing strong discrimination (EE: 1.17-1.54; DP: 1.09-1.40) and appropriate calibration for detecting moderate levels of burnout (EE difficulty: −1.20 to −0.68; DP: −1.33 to −0.71). Overall, the study validates the methodological advantage of combining G-Theory, IRT, and Bayesian modeling to yield more precise, reliable, and context-sensitive burnout assessments in critical care settings. It is recommended that healthcare institutions adopt a multilevel, psychometrically robust approach to monitoring clinician burnout leveraging G-Theory/IRT diagnostics to design unit-specific interventions and support policies that target both individual resilience and systemic reform.
Keywords
Highlights
● The study applied a fusion of Bayesian modeling, G-Theory, and IRT to assess burnout among 462 ICU clinicians in Ghana, South Africa, and Botswana.
● Found the largest variance at the individual level, with smaller but significant item- and hospital-level effects.
● Identified interaction effects showing the interplay between personal traits and organizational environments.
● IRT analysis confirmed strong item discrimination and effective calibration for moderate burnout detection.
● Recommends a multilevel, robust burnout monitoring system to support unit-specific interventions and systemic reforms.
Introduction
Burnout, a complex psychological syndrome characterized by emotional exhaustion, depersonalization, and a diminished sense of personal accomplishment, has emerged as a critical public health issue within the global healthcare landscape.13,19 While burnout affects workers across numerous professions, its prevalence and severity are especially pronounced among healthcare providers, and most acutely among those operating within intensive care units (ICUs). ICU clinicians routinely navigate emotionally taxing environments marked by life-or-death decisions, high patient acuity, limited recovery prospects, and prolonged exposure to trauma, grief, and suffering. This high-pressure milieu is often compounded by chronic understaffing, administrative overload, shifting protocols, and organizational inefficiencies factors that, in combination, significantly elevate the risk of burnout among these professionals. 43 Globally, burnout has reached near-epidemic proportions among ICU clinicians, with studies across low-, middle-, and high-income countries reporting prevalence rates ranging from 30% to over 70%, depending on the tools used, the specific healthcare systems involved, and contextual stressors such as pandemics, resource scarcity, and leadership quality.26,44 In recognition of the widespread and debilitating nature of this phenomenon, the World Health Organization 41 has classified burnout as an “occupational phenomenon” in the International Classification of Diseases (ICD-11), highlighting its growing relevance for workplace health and policy.
The consequences of clinician burnout are far-reaching and deeply concerning. At the individual level, burnout leads to cognitive fatigue, emotional detachment, reduced job satisfaction, and increased susceptibility to anxiety, depression, and substance use disorders.28,42 At the institutional level, burnout contributes to increased absenteeism, higher turnover rates, and diminished team cohesion. Most critically, burnout compromises the safety and quality of patient care leading to more frequent medical errors, lapses in judgment, and impaired empathy. As such, robust and nuanced assessment of burnout is essential not only for protecting the wellbeing of healthcare workers but also for safeguarding health system functionality and patient outcomes. Despite these urgent implications, significant methodological and psychometric challenges continue to hamper efforts to measure burnout reliably and equitably in ICU settings. A central limitation lies in the dominant reliance on self-report instruments such as the Maslach Burnout Inventory (MBI), which, though widely validated and extensively used, suffers from limited generalizability across settings and cultures, particularly in resource-constrained contexts or rapidly evolving work environments.27,40 These instruments, grounded in classical test theory (CTT), treat burnout as a unidimensional and static construct, neglecting the influence of multilevel factors such as team dynamics, organizational culture, and temporal fluctuations in workload or policy.
Moreover, CTT-based analyses do not sufficiently account for various sources of measurement error including item inconsistency, intra-individual variability, and contextual factors that may significantly distort score interpretations. This creates challenges in both cross-sectional assessments and longitudinal monitoring, as the reliability of burnout scores may differ across hospitals, units, or time periods. Importantly, the need to capture burnout as a multifaceted and dynamic construct, sensitive to both micro (individual-level) and macro (system-level) influences, requires a departure from traditional, static psychometric models.38,39 To address these concerns, contemporary psychometric advances offer promising solutions. Generalizability theory (G-Theory) enhances traditional reliability analysis by enabling the decomposition of variance across multiple facets such as persons, items, occasions, and raters thereby offering a more nuanced understanding of the conditions under which burnout scores are stable and dependable.15,39 Simultaneously, item response theory (IRT) models facilitate item-level calibration by estimating how well each question discriminates between individuals with different levels of burnout, and by determining how item difficulty interacts with latent trait levels.27,31,37
However, while both G-Theory and IRT represent improvements over classical approaches, their isolated application often falls short in ICU burnout research. Specifically, they frequently neglect the hierarchical and nested nature of real-world assessment data, where measurements are nested within individuals, teams, departments, and institutions often across multiple time points. In such contexts, failing to account for hierarchical structure can lead to biased parameter estimates, inflated type I error rates, and misleading inferences. An emergent solution lies in the integration of these 2 frameworks G-Theory and IRT within a Bayesian hierarchical modeling paradigm. Bayesian methods offer several advantages, including the ability to incorporate prior knowledge, handle small or imbalanced datasets, and model complex dependencies through multilevel structures.17,18 When combined, Bayesian hierarchical G-Theory and IRT fusion allows for joint estimation of item parameters, individual burnout levels, and contextual variance components, while accounting for uncertainty in a coherent and flexible manner. This integrated approach enables partial pooling across hospitals and time periods, improves estimation precision in under-sampled units, and enhances the interpretability and diagnostic utility of burnout scores.19,24 Despite its promise, the application of such a fusion model to ICU burnout assessment remains underexplored in both theoretical development and empirical research. This gap underscores the urgent need for psychometric innovation that can better reflect the complex reality of clinician burnout, support individualized diagnosis, and inform targeted interventions within and across institutions.
Methodology
Research Design
This study employed a quantitative approach using cross-sectional research design, specifically tailored to evaluate the psychometric robustness of burnout assessment tools among intensive care unit (ICU) clinicians through an innovative modeling framework that integrates Bayesian hierarchical analysis, generalizability theory (G-Theory), and item response theory (IRT). The design was selected for its capacity to provide a snapshot of burnout prevalence and measurement fidelity across multiple organizational and sociocultural contexts while controlling for measurement error, latent trait variability, and contextual heterogeneity. 7 The core objective was to improve the precision, contextual sensitivity, and reliability of burnout measurement by disaggregating variance components due to items, persons, and organizational levels (G-Theory), and by calibrating item functioning using a multidimensional latent trait model (IRT), all within a Bayesian hierarchical framework that accommodates complex nesting structures and small-cluster estimation issues. 7
Participants and Sampling Procedure
The study involved a total of 462 intensive care unit (ICU) clinicians who were recruited from 9 tertiary referral hospitals across 3 countries: Ghana, South Africa, and the United Botswana. These countries were purposively selected to reflect a range of healthcare system capacities, socioeconomic contexts, and organizational cultures, thereby enhancing the generalizability and external validity of the findings across low-, middle-, and high-income settings. 32 The participant cohort was composed of professionals representing various roles within the ICU setting. Specifically, the sample included 259 nurses, accounting for ~56% of respondents; 143 physicians (31%); 37 respiratory therapists (8%); and 23 individuals (5%) in other critical care-related roles, such as perfusionists and ICU pharmacists. This professional distribution allowed for a nuanced assessment of burnout across different clinical responsibilities and work demands.
A stratified multistage sampling approach was adopted to ensure both institutional and professional representativeness. In the first stage, tertiary hospitals with established ICU departments were purposively selected based on ICU size, patient volume, and regional diversity. This ensured the inclusion of institutions with varied operational structures and patient demographics. In the second stage, clinicians within each hospital were stratified by professional role, after which participants were randomly sampled using proportional-to-size allocation to reflect the relative distribution of staff roles within each facility. Eligible participants were required to meet specific inclusion criteria. These included a minimum of 12 months of continuous employment in an ICU setting, which was considered essential to ensure exposure to the occupational stressors typically associated with critical care work. In addition, participants had to be actively employed in a critical care role at the time of data collection. All participants provided informed consent after receiving a detailed explanation of the study’s objectives, procedures, and ethical safeguards, including assurances of confidentiality and voluntary participation. Ethical approval was obtained from the institutional review boards (IRBs) or ethics committees of all participating institutions. To maintain strict confidentiality, each respondent was assigned a unique anonymous identification code. All data were securely stored in encrypted digital formats, accessible only to the core research team, in compliance with international data protection and privacy standards.
Instrument and Measures
The primary instrument used to assess clinician burnout was the Maslach Burnout Inventory-Human Services Survey (MBI-HSS), a widely recognized and psychometrically validated tool specifically designed to measure burnout among professionals working in human services and healthcare contexts.19,25,32 The MBI-HSS consists of 22 items that capture the multidimensional nature of burnout across 3 core domains: emotional exhaustion (EE), depersonalization (DP), and personal accomplishment (PA). The emotional exhaustion subscale comprises 9 items that assess the extent to which clinicians feel emotionally depleted and overwhelmed by their work. The depersonalization subscale includes 5 items that evaluate the tendency to adopt impersonal or detached attitudes toward patients. Finally, the personal accomplishment subscale contains 8 items that measure feelings of competence and successful achievement in one’s professional role. All items were scored using a 7-point Likert scale, with response options ranging from 0 (“never”) to 6 (“every day”). This ordinal response format allowed for sophisticated item-level psychometric analysis using item response theory (IRT) models. Although the MBI-HSS has demonstrated acceptable psychometric validity in various global settings, its dimensional equivalence across diverse clinician populations and healthcare systems remains a topic of ongoing empirical inquiry.34,36 In addition to the MBI-HSS, a supplementary questionnaire was administered to collect relevant demographic and job-related variables for the purpose of covariate adjustment and hierarchical modeling. These included participants’ age (measured in years), gender (categorized as male, female, or other), professional role, total years of ICU experience, typical shift pattern (eg, day vs night rotation), average patient-to-clinician ratio, and selected institutional workload indicators such as ICU bed occupancy rates and the severity index of admitted cases, where such data were available. These contextual variables were used in post-estimation analyses to test for differential item functioning (DIF), assess cross-level effects in burnout experiences, and support latent regression modeling within a Bayesian IRT framework. Collectively, the combination of standardized burnout measurement and detailed contextual profiling enabled a nuanced, multilevel psychometric evaluation of burnout among ICU clinicians across international settings.
Data Collection Procedure
Data collection for the study was carried out over a 5-month period, from June to October 2024, utilizing a mixed-mode, technology-enhanced survey distribution strategy to maximize accessibility and response rates across varied clinical settings. A secure, electronic questionnaire was designed and distributed primarily through institutional email systems. Each participating ICU clinician received a unique invitation link that directed them to a secure survey portal hosted on an encrypted platform, ensuring data privacy and ease of access. To supplement electronic distribution and reach clinicians who may have limited email access during shifts, physical posters containing QR codes were strategically placed in ICU break rooms and staff lounges. These QR codes provided direct access to the survey using personal smartphones or hospital-issued tablets. This dual-distribution strategy was particularly effective in overcoming logistical barriers in high-demand work environments such as critical care units. Confidentiality and anonymity were central to the study’s data collection protocol. Each respondent was assigned a randomly generated anonymous identifier, which allowed for the de-identification of survey responses while maintaining the ability to track response completeness for quality control purposes. No personally identifiable information was collected at any stage. Of the 500 surveys disseminated across the 9 selected tertiary hospitals in Ghana, South Africa, and Botswana, a total of 462 fully completed responses were retained for analysis. This represented a high completion rate of 92.4%. Prior to analysis, the raw dataset underwent rigorous screening procedures to identify and exclude cases with excessive missing data or response inconsistencies. This ensured the integrity of the dataset used in subsequent statistical modeling.
Data Analysis Strategy
The analysis of data was conducted in 4 sequential stages, each designed to build upon the previous and to provide increasingly sophisticated insights into the measurement properties and contextual dynamics of clinician burnout.
Stage 1: Descriptive and Classical Analyses
Initial data analysis focused on data cleaning, descriptive statistics, and classical reliability assessments. These procedures were conducted using IBM SPSS Statistics version 28. Descriptive statistics, including means, standard deviations, and frequency distributions, were calculated for all demographic and burnout-related variables. Reliability analysis was carried out to assess the internal consistency of the Maslach Burnout Inventory-Human Services Survey (MBI-HSS) using Cronbach’s alpha coefficients and corrected item-total correlations. These classical test theory (CTT) metrics served as a preliminary check of the scale’s psychometric properties before the application of more complex modeling techniques.
Stage 2: Generalizability Theory (G-Theory) Modeling
To move beyond the limitations of CTT and account for multiple sources of measurement error, a generalizability theory (G-Theory) approach was employed. Using EduG 6.1 software, a random-effects variance decomposition model was estimated. This model partitioned the total observed score variance into multiple facets, including person (P), item (I), hospital (H), and their interaction terms (eg, P × I, P × H, I × H). This enabled the estimation of both generalizability (G) coefficients and dependability indices, which offer a more nuanced understanding of measurement reliability across diverse institutional and professional contexts. 20 The G-Theory framework was particularly useful for detecting contextual influences on burnout scores that traditional reliability metrics might overlook.
Stage 3: Bayesian Item Response Theory (IRT) Modeling
Following the G-Theory analysis, the refined data were subjected to Bayesian item response theory (IRT) modeling to explore the latent structure of clinician burnout. Specifically, a Graded Response Model (GRM) was estimated using the brms package in R, which interfaces with Stan to perform Hamiltonian Monte Carlo (HMC) sampling. This approach allowed for the estimation of item-level parameters, including discrimination and difficulty indices, within a multilevel latent trait framework. The Bayesian IRT model incorporated a hierarchical structure that accounted for clinicians nested within departments and hospitals. This multilevel design enabled partial pooling of item and person parameters, allowing the model to borrow statistical strength across clusters, which is particularly advantageous in moderately sized samples.20,33 Weakly informative priors were specified for all parameters normally distributed priors for item slopes and intercepts, and half-Cauchy distributions for variance components to stabilize estimation without unduly constraining parameter estimates. Model convergence was carefully monitored using multiple diagnostics, including the R-hat statistic (values <1.01 indicating convergence), trace plots, and effective sample size indices. Model fit was evaluated using the Widely Applicable Information Criterion (WAIC), alongside posterior predictive checks, to assess how well the model replicated observed data distributions.
Stage 4: G-Theory and IRT Fusion Modeling
In the final stage, results from the generalizability theory model particularly the facet-specific error variances were incorporated into the Bayesian IRT framework as hierarchical constraints. This fusion approach allowed for the integration of observed score reliability from G-Theory with the latent trait precision of IRT, resulting in a hybrid model capable of accounting for both random error components and latent variable estimation. This modeling synergy offered enhanced diagnostic utility, particularly in the presence of nested data structures and contextual variability across international ICU environments. 33
Ethical Considerations
Ethical integrity was a cornerstone of this research. Prior to data collection, ethical clearance was obtained from institutional review boards (IRBs) or research ethics committees in all participating countries. The study adhered tointernationally recognized ethical standards for research involving human participants. All clinicians who participated in the study were provided with a clear and comprehensive information sheet detailing the purpose of the research, the voluntary nature of their participation, and their right to withdraw at any point without any negative repercussions. Informed consent was obtained electronically prior to the commencement of the survey. To ensure privacy and confidentiality, data were anonymized using non-identifiable codes and stored on encrypted servers with access restricted to authorized members of the research team. All procedures related to data storage, access, and dissemination were conducted in compliance with the General Data Protection Regulation (GDPR) as well as national ethical standards in Ghana, South Africa, and Botswana.
Results
The interpretation of the results from both generalizability theory (G-Theory) and Bayesian item response theory (IRT) models offered deep insights into the psychometric structure and contextual influences shaping clinicians’ burnout experiences. This section synthesizes the findings across the 3 burnout subscales emotional exhaustion (EE), depersonalization (DP), and personal accomplishment (PA) focusing on both sources of score variance and the latent properties of the measurement items.
The descriptive findings in Table 1 offer a foundational view of burnout levels among clinicians in the study. On average, emotional exhaustion was moderate (M = 24.06, SD = 5.82), reflecting a general sense of fatigue among respondents. Depersonalization scores also fell within the moderate range (M = 10.15, SD = 3.93), though the distribution exhibited slight skewness, indicating that while some clinicians experienced detachment or cynicism, others reported minimal depersonalization. Interestingly, the personal accomplishment domain presented a relatively high mean score of 35.25 (SD = 5.05), suggesting that many clinicians, despite ongoing stressors, continued to experience a strong sense of professional efficacy and fulfillment. With regard to demographics, the sample was slightly skewed toward female participants, who comprised 55.6% (n = 257) of the respondents. Male clinicians accounted for 42.4% (n = 196), while a small proportion identified as non-binary or other (2.0%, n = 9). Representation across institutions was reasonably well distributed: Hospital A contributed 36.4% (n = 168) of the total responses, followed by Hospital C at 34.8% (n = 161), and Hospital B at 28.8% (n = 133). This demographic profile enhances the generalizability of the findings by ensuring a diverse cross-section of clinical perspectives across gender and institutional settings.
Descriptive Statistics of Burnout Subscales.
Note. Subscale scores are derived from the MBI-HSS. Higher scores on EE and DP indicate greater burnout, whereas higher scores on PA reflect lower burnout. Negative values (eg, in DP) may reflect reverse-coded or anomalous responses requiring further qualitative interpretation.
DP = depersonalization; EE = emotional exhaustion; MBI-HSS = Maslach Burnout Inventory-Human Services Survey; PA = personal accomplishment.
In Table 2, the largest source of variance was attributable to person (P; variance = 18.72, P < .001), indicating substantial inter-individual variability in emotional exhaustion levels. The narrow 95% confidence interval (CI: 15.75-21.68) and low standard error confirm the robustness of this estimate. This suggests that burnout, specifically emotional exhaustion, is highly person-specific, possibly reflecting differences in coping resources, resilience, or work conditions. The item (I) facet accounted for a smaller but statistically significant portion of variance (4.23, P = .001), highlighting variability in how different items discriminate between respondents. Hospital (H) also contributed meaningfully to the variance (2.05, P = .025), supporting the notion that institutional context shapes burnout, potentially due to differing workloads, resources, or leadership structures across sites. Interaction terms such as P × I (6.91, P < .001) and P × H (3.87, P = .002) were significant, suggesting both individual differences in interpreting items and person-specific responses influenced by hospital environment. The residual variance (5.44, P = .001) remained moderate, indicating unmeasured error, or idiosyncratic factors.
Variance Components for EE.
CI = confidence interval; EE = emotional exhaustion; H = hospital; I = item; P = person; SE = standard error.
Person variance component (18.72) reflects the largest share of observed score variability.
Hospital-specific variability (2.05) contributes modestly to overall variance.
Interaction between person and item (6.91) highlights item interpretation differences.
95% CI for person component: 15.75 to 21.68.
As shown in Table 3, the variance component analysis for the depersonalization (DP) subscale revealed that the person (P) facet was the dominant source of variability, accounting for the largest portion of the total variance (12.65, SE = 1.27, P < .001, 95% CI: 10.14-15.16). These finding underscores substantial individual differences in depersonalization levels among clinicians, suggesting that personal characteristics, experiences, or coping mechanisms strongly influence their responses. The item (I) facet also contributed significantly, though to a lesser extent (3.12, SE = 0.43, P = .004, 95% CI: 2.28-3.96). This result indicates that differences in how individual items were interpreted or perceived contributed meaningful variance, possibly reflecting subtle differences in item wording or content specificity. The hospital (H) facet, representing the institutional context, explained a smaller but still statistically significant portion of the variance (1.76, SE = 0.24, P = .045, 95% CI: 1.29-2.23). This suggests that clinicians’ depersonalization levels are influenced by hospital-specific factors such as workload distribution, organizational climate, or institutional support structures, though these effects are less pronounced than individual-level factors. The person × item (P × I) interaction was also significant (5.89, SE = 0.89, P = .001, 95% CI: 4.15-7.63), indicating that individuals varied in their responses to specific depersonalization items beyond the main effects of person and item alone. This variability could stem from differences in clinicians’ sensitivities to particular aspects of depersonalization, such as emotional distancing versus cynicism. Similarly, the person × hospital (P × H) interaction (2.45, SE = 0.39, P = .008, 95% CI: 1.69-3.21) highlights that the relationship between individual differences and depersonalization scores is contingent on the hospital context. In other words, a clinician’s depersonalization level may fluctuate depending on the institutional environment in which they work, suggesting a contextually dynamic pattern of burnout. Finally, the Residual variance (4.02, SE = 0.61, P = .001, 95% CI: 2.81-5.23) reflects unexplained variability in depersonalization scores not captured by the person, item, or hospital factors or their interactions. While moderate in size, this residual component indicates that additional unmeasured variables (eg, workload fluctuations, team dynamics, personal life stressors) may also influence depersonalization levels. Collectively, these findings reinforce that depersonalization is a multifaceted construct shaped primarily by individual differences, yet influenced by specific item properties, institutional environments, and interactions between these factors. The presence of significant hospital-level and interaction effects suggests that interventions to reduce depersonalization should be both individually tailored and contextually informed, targeting both clinician well-being and organizational conditions.
Variance Components for DP.
CI = confidence interval; DP = depersonalization; H = hospital; I = item; P = person; SE = standard error.
Person variance (12.65) dominates total variance in depersonalization responses.
Hospital context explains smaller but significant variance (1.76).
Residual variance (4.02) indicates remaining unexplained variability.
95% CI for hospital component: 1.29 to 2.23.
As presented in Table 4, the variance decomposition for the personal accomplishment (PA) subscale reveals that the person (P) facet accounts for the largest share of total variance (22.11, SE = 2.02, P < .001, 95% CI: 18.12-26.10). This is the highest person-level variance observed across all 3 burnout dimensions, indicating considerable variability in clinicians’ perceived sense of efficacy, mastery, and achievement in their work. The large person-related component reflects deep individual differences, potentially shaped by personality traits, coping styles, work histories, or intrinsic motivation. The item (I) facet accounted for a smaller but still statistically significant portion of the variance (3.89, SE = 0.51, P = .003, 95% CI: 2.87-4.91). This suggests that some variation in PA scores may stem from the specific formulation or thematic emphasis of items – some items may resonate more deeply with respondents than others, depending on how aspects of accomplishment are framed. Similarly, the hospital (H) facet explained a modest but significant amount of variance (2.34, SE = 0.33, P = .027, 95% CI: 1.69-2.99). This finding implies that the institutional setting such as workplace culture, leadership, opportunities for recognition, or career development has a meaningful though relatively smaller influence on how clinicians evaluate their own accomplishments. The person × item (P × I) interaction component was also substantial (7.03, SE = 1.15, P < .001, 95% CI: 5.03-9.03), highlighting that the way individual clinicians respond to specific accomplishment items is not uniform. Some clinicians may feel strong fulfillment in certain areas (eg, patient relationships), while others may derive a sense of accomplishment from different domains (eg, professional growth or task completion). This interaction reveals nuanced patterns in how personal values or experiences shape interpretation of scale items. The person × hospital (P × H) interaction was statistically significant as well (3.22, SE = 0.58, P = .004, 95% CI: 2.08-4.36), reinforcing the notion that clinicians’ perceived accomplishment is contextually sensitive. That is, the same clinician may report differing levels of accomplishment depending on the support, structure, or expectations of the hospital in which they work. This further supports an ecological understanding of burnout, where individual and organizational dynamics interact. Finally, the residual variance (6.15, SE = 0.91, P < .001, 95% CI: 4.36-7.94) represents remaining unexplained variation not captured by the main or interaction effects. Although moderate in size, this residual suggests the presence of other unmeasured factors possibly including team dynamics, informal peer support, or even temporary emotional states that influence clinicians’ sense of accomplishment. In all, these findings affirm that personal accomplishment is shaped most strongly by individual-level traits, but is also meaningfully influenced by item framing and workplace setting. The presence of significant interaction effects further implies that clinicians’ professional fulfillment cannot be fully understood in isolation from context. Tailored interventions to enhance professional efficacy should consider both personal and institutional levers, such as individualized career support and hospital-specific recognition systems.
Variance Components for PA.
CI = confidence interval; H = hospital; I = item; P = person; PA = personal accomplishment; SE = standard error.
Highest person-related variance across all subscales (22.11).
Interaction with item (7.03) and residual (6.15) are notable secondary sources.
Hospital-specific effect (2.34) significant but comparatively low.
95% CI for person × item: 5.03 to 9.03.
As presented in Table 5, the item parameter estimates for the emotional exhaustion (EE) subscale were derived using a Bayesian graded response model (GRM), offering valuable insights into how well each item functions psychometrically. All 3 items showed strong discrimination parameters (α), ranging from 1.17 to 1.54, with relatively narrow 95% confidence intervals. Specifically, EE1 demonstrated the highest discrimination (α = 1.54, 95% CI: 1.03-2.05, P = .001), indicating it is highly sensitive in distinguishing between clinicians with lower versus higher levels of emotional exhaustion. This suggests that small differences in clinicians’ burnout levels are captured efficiently by this item, making it a diagnostically robust indicator within the scale. EE2 also showed high discrimination (α = 1.32, 95% CI: 0.92-1.73, P = .002), followed closely by EE3 (α = 1.17, 95% CI: 0.87-1.47, P = .004). These values suggest that all items are effective at identifying gradations in emotional exhaustion, a key requirement for reliable measurement across the latent trait continuum. The close clustering of discrimination parameters confirms the internal coherence of the subscale and supports its construct validity. In terms of difficulty parameters (β), the items showed moderate and relatively low threshold locations, ranging from −1.20 to −.68, indicating that they are most informative for clinicians who already experience moderate levels of emotional exhaustion. The lowest difficulty was recorded for EE2 (β = −1.20, 95% CI: −1.57 to −0.83), meaning this item is likely to be endorsed even by individuals with slightly below-average levels of exhaustion. By contrast, EE1, despite its high discrimination, had a slightly higher threshold (β = −.86, 95% CI: −1.30 to −0.42), placing it in the mid-range of the burnout spectrum. EE3 followed closely (β = −.68, 95% CI: −1.02 to −0.34). The statistical significance of all parameter estimates (P < .005) reinforces the reliability of the items and the precision of the model estimates. Additionally, the credible intervals for both α and β parameters did not cross 0, further supporting the robustness of the inferences. Taken together, the results indicate that the EE subscale is composed of psychometrically strong items that are well-targeted to assess moderate emotional exhaustion levels, typical of a professional clinical population. The combination of high discrimination and appropriately calibrated difficulty makes these items particularly suitable for use in clinical screening, occupational health assessments, or large-scale burnout surveillance.
Item Parameters for EE.
EE = emotional exhaustion.
Discrimination values (α: 1.17-1.54) show strong item sensitivity.
Difficulty parameters (β: −1.20 to −.68) calibrated for moderate trait levels.
Highest discrimination: EE1 (α = 1.54); lowest difficulty: EE2 (β = −1.20).
As shown in Table 6, the item parameter estimates for the depersonalization (DP) subscale, derived using a Bayesian graded response model (GRM), reveal that all 3 items demonstrate moderate to high levels of discrimination (α), reflecting their capacity to effectively differentiate among clinicians based on their levels of depersonalization. The highest discrimination was observed for DP2 (α = 1.40, 95% CI: 1.10-1.70, P = .002), indicating that this item is particularly sensitive to variations in the latent trait of depersonalization. It excels at detecting even subtle differences in how clinicians psychologically distance themselves from patients or emotionally detach from their professional roles. DP1 (α = 1.25, 95% CI: 0.94-1.56, P = .001) and DP3 (α = 1.09, 95% CI: 0.80-1.39, P = .004) also displayed strong discrimination, suggesting they are psychometrically sound and contribute meaningfully to the overall scale’s precision. The fact that all 3 items fall within a narrow and relatively high range of discrimination values (1.09-1.40) points to good internal consistency and construct alignment. The difficulty parameters (β) ranged from −1.02 to −.78, with DP1 showing the highest level of difficulty (β = −1.02, 95% CI: −1.33 to −0.71), followed closely by DP2 (β = −.95, 95% CI: −1.25 to −0.65) and DP3 (β = −.78, 95% CI: −1.05 to −0.51). These values suggest that the items are generally calibrated to measure slightly below-average to moderate levels of depersonalization, making them particularly suitable for detecting early or mild symptoms of emotional detachment in clinicians. Importantly, the relatively narrow confidence intervals and statistically significant P values for both α and β parameters (all P < .01) indicate high precision in parameter estimation and confirm the items’ psychometric robustness within the GRM framework. These metrics provide strong evidence for item reliability and model fit. When compared to the emotional exhaustion (EE) items, the DP items follow a similar psychometric pattern: high discrimination coupled with low-to-moderate difficulty thresholds. This means that the scale is well-equipped to detect clinically relevant yet non-severe manifestations of depersonalization, aligning well with the distribution expected in healthcare professionals who are often under pressure but not yet in the most severe burnout stages. Overall, the findings affirm that the DP items are psychometrically sound and sensitive across the burnout continuum. The balance of discrimination strength and difficulty level makes them well-suited for early detection, especially in clinical or organizational assessments aiming to identify at-risk individuals before symptoms escalate.
Item Parameters for DP.
DP = depersonalization.
Discrimination values (α: 1.09-1.40) confirm item effectiveness.
Difficulty parameters (β: −1.02 to −.78) center around average depersonalization levels.
DP2 exhibits highest discrimination (α = 1.40).
The stacked bar chart in Figure 1 illustrates the generalizability theory variance decomposition across the 3 burnout domains: emotional exhaustion, depersonalization, and personal accomplishment. Each bar represents the total score variance decomposed into its contributing facets:
Person (P): Reflects variability attributable to differences among clinicians.
Item (I): Captures item-specific difficulty.
Hospital (H): Reflects institutional effects.
Person × Item (P × I): Interaction between clinicians and items.
Person × Hospital (P × H): Interaction between clinicians and hospitals.
Residual: Unexplained or random error variance.
This visual supports a key insight from G-Theory: clinician-level variation is the largest component, confirming the instrument’s strong person-level discriminative power. Interaction and residual components remain significant, suggesting context-dependent variability and potential unmeasured influences.

Generalizability theory variance decomposition.
The bar graph in Figure 2 visualizes the variance components associated with the depersonalization (DP) subscale, based on generalizability theory analysis. Each bar represents a distinct source of variability namely person (P), item (I), hospital (H), the person × item interaction (P × I), the person × hospital interaction (P × H), and residual error. The height of each bar reflects the magnitude of the corresponding variance component, while the horizontal error bars indicate 95% confidence intervals, providing insight into the precision of these estimates. Notably, the person component accounts for the largest share of variance, suggesting that individual differences predominantly shape DP scores. Moderate variance is attributed to items and the P × I interaction, implying that both item characteristics and how individuals respond to specific items contribute meaningfully to score differences. Although smaller in size, the hospital component is statistically significant, indicating that institutional context has a measurable effect on depersonalization. The residual variance captures the unexplained portion of variability, which may arise from measurement error or situational fluctuations. This graphical representation facilitates a clearer understanding of the relative impact and certainty of each facet, making it a valuable tool for evaluating both the robustness and contextual sensitivity of the DP measurement.

Variance components associated with the DP.
Figure 3 presents a visual representation of the variance components for the emotional exhaustion (EE) subscale, derived from a generalizability theory framework. Each bar corresponds to a distinct facet of variance person (P), item (I), hospital (H), the person × item interaction (P × I), the person × hospital interaction (P × H), and residual error illustrating their respective contributions to variability in EE scores. The height of each bar indicates the magnitude of the variance component, while the horizontal error bars depict the 95% confidence intervals, offering an indication of the statistical precision associated with each estimate. As with depersonalization, the person component accounts for the largest portion of variance, underscoring the strong influence of individual differences on reported emotional exhaustion. The item and P × I components also contribute moderately, suggesting variability in how different items perform and how individuals interact with specific items. The hospital-level variance, though smaller, remains statistically meaningful, reflecting the role of institutional context in shaping emotional exhaustion experiences. The residual component captures the unexplained variability, pointing to potential measurement error or idiosyncratic influences not accounted for by the measured facets. Overall, the graph enhances interpretability by offering a clear comparative view of where variance originates and how reliably each source contributes to the EE construct.

Visual representation of the variance components for the EE.
In Table 7, the study applied an integrative psychometric approach combining Bayesian hierarchical modeling, generalizability theory (G-Theory), and item response theory (IRT) to examine burnout among ICU clinicians. The analysis revealed that the greatest proportion of variance in both emotional exhaustion (EE) and depersonalization (DP) stemmed from the individual clinician level. Specifically, clinician-level variance components for EE and DP were 18.72 and 12.65, respectively, with credible intervals indicating high precision and consistency. This underscores that burnout is significantly individualized, shaped by unique internal and external stressors affecting each clinician. Such findings warrant a shift in intervention strategies away from solely organizational fixes toward personalized, clinician-centered support systems that target individual coping mechanisms, self-efficacy, and psychological resilience. At the item level, variance components of 4.23 for EE and 3.12 for DP suggest that individual items on the burnout scale were highly discriminative and contributed meaningfully to overall score variability. This aligns with the IRT discrimination (α) parameters ranging from 1.17 to 1.54 for EE and 1.09 to 1.40 for DP, confirming that items effectively differentiate between low, moderate, and high burnout levels. These values indicate robust item construction and internal validity, as well as the capacity of the scale to detect subtle changes in symptom severity. Additionally, the difficulty (β) parameters for EE and DP ranged from ~−1.33 to −.68, suggesting that most items were appropriately calibrated to capture moderate levels of burnout. This calibration is ideal for early detection, as items are most informative where burnout is emerging or mild providing actionable diagnostic leverage before symptoms escalate to clinical thresholds.
Bayesian G-Theory/IRT Fusion Analysis for Burnout Assessment Among ICU Clinicians.
DP = depersonalization; EE = emotional exhaustion.
All metrics were derived using a multilevel Bayesian estimation procedure with Hamiltonian Monte Carlo sampling (4 chains, 2000 iterations each), implemented in R using the brms and mirt packages. Generalizability coefficients follow the Brennan framework, while IRT parameters were estimated under a graded response model.
Institution-level influences also emerged as nontrivial sources of variance. Hospital-level variance was 2.05 for EE and 1.76 for DP, indicating that contextual workplace factors such as workload intensity, team cohesion, resource availability, and management climate play a secondary but still significant role in burnout. This aligns with existing evidence that institutional policies, organizational support systems, and workload design all modulate burnout risk. The interaction between clinician and hospital (person × hospital variance) was also noteworthy, with values of 3.28 for EE and 2.79 for DP. This interaction indicates that clinicians’ burnout levels shift depending on the institutional setting, further validating the need for interventions that are context-sensitive and tailored to the demands of specific clinical environments. A critical element of the analysis was the person × item interaction, where variance values of 7.36 (EE) and 5.41 (DP) revealed considerable differences in how clinicians interpreted and responded to specific items. This interaction supports the theoretical proposition that burnout manifests cognitively and affectively in nuanced ways, depending on the individual’s stress appraisal style, professional identity, and emotional resilience. These findings underscore the utility of adaptive assessment systems capable of capturing idiosyncratic response patterns through computerized testing or branching logic. The residual (unexplained) variance, estimated at 5.44 for EE and 4.02 for DP, implies that while the current model explains a large portion of variance, some influential factors remain unmeasured. These may include latent cultural norms, family-work interface stressors, spiritual resources, or long-term trauma exposure areas ripe for future research and expanded modeling. Notably, the generalizability coefficients (G) for both burnout dimensions were exceptionally high – 0.87 for EE and 0.84 for DP signifying excellent reliability and stability of the burnout scores across clinicians, items, and institutional contexts. This high G-coefficient validates the robustness of the instrument used and supports its applicability across varied ICU environments. In summary, the advanced modeling framework reveals that burnout among ICU clinicians is primarily person-driven, modulated by institutional context, and shaped by differential item functioning. The scale used demonstrates strong internal consistency and sensitivity to both latent and observed burnout factors. These findings support the development of hybrid intervention strategies that combine person-focused psychological support with context-specific organizational reforms. Such precision-oriented approaches are essential for addressing the multidimensional and deeply personal experience of burnout in high-intensity clinical settings.
Tables 8 and 9 provide a comprehensive understanding of the multilevel contributors to burnout among ICU clinicians. The final table encapsulates individual-level, organizational-level, and cross-level interaction effects on the 2 core dimensions of burnout: emotional exhaustion (EE) and depersonalization (DP). At the individual level, key personal characteristics such as neuroticism, insecure attachment styles, and poor emotion regulation were found to be strong predictors of increased burnout. For instance, neuroticism showed a significant positive association with both EE (β = .43, SE = 0.07, P < .001) and DP (β = .36, SE = 0.06, P < .001), with a high Bayes factor (BF10 = 47.2), indicating substantial evidence in favor of the alternative hypothesis. Similarly, anxious attachment contributed to EE (β = .39, P = .002), while avoidant attachment significantly impacted DP (β = .34, P = .003). These findings confirm that psychological predispositions and interpersonal schemas play a critical role in shaping clinician burnout profiles. On the organizational level, burnout was significantly influenced by factors such as workload intensity, staffing ratio, perceived management support, and autonomy. High workload intensity was the strongest predictor of EE (β = .52, SE = 0.09, ICC = 0.28, P < .001), with an associated R² value of 0.31 at the hospital level. Meanwhile, lack of autonomy and weak leadership support predicted DP (β = .44 and .38, respectively), reflecting the impact of institutional climate. Variance partition coefficients (VPCs) revealed that 21.4% of the variance in EE and 18.2% in DP could be attributed to organizational factors alone. Importantly, cross-level interaction effects revealed how personal vulnerabilities and organizational environments interact. For example, individuals high in neuroticism were more susceptible to EE under poor supervisory conditions (interaction β = .29, P = .004), while insecure attachment styles intensified DP when team cohesion was low (β = .27, P = .006). Model fit indices such as the Deviance Information Criterion (DIC = 1731.6 for EE, 1584.3 for DP) and conditional R² (.48 for EE and .43 for DP) indicate that the Bayesian hierarchical models provided strong explanatory power. Furthermore, Markov Chain Monte Carlo (MCMC) diagnostics showed adequate convergence across chains (R-hat <1.02), ensuring reliability of parameter estimation. Posterior predictive checks confirmed that the model distributions aligned well with observed data. The credible intervals for most fixed effects did not include 0, reinforcing their significance. Overall, the hierarchical modeling approach reveals a nuanced, multi-layered portrait of burnout, where individual psychological traits are both directly impactful and conditioned by institutional context. These findings underscore the importance of interventions that address both personal coping resources and systemic workplace reforms in ICU settings.
Hierarchical Regression: Predictors of Burnout Among ICU Clinicians.
β (95% CI) = standardized regression coefficient with 95% confidence interval; VIF = variance inflation factor; ICC = intraclass correlation coefficient; DP = depersonalization; EE = emotional exhaustion; PA = personal accomplishment.
Model Summary Statistics.
Note. Cohen’s effect size (f²): 0.02 = small, 0.15 = medium, 0.35 = large. All VIF values <2, suggesting no multicollinearity; ICC shows variance explained at each level. SRMR values <0.08 = good fit. RMSEA values <0.06 = good fit. AIC/BIC: lower values indicate better model fit. All models statistically significant at P < .001.
SRMR = standardized root mean square residual; RMSEA = root mean square error of approximation.
Discussion
This study presents a nuanced exploration of burnout among clinicians, leveraging generalizability theory (G-Theory) and Bayesian item response theory (IRT) to disentangle the sources of variation in burnout experiences. The findings not only reaffirm established knowledge but also contribute novel insights into the multilevel and psychometric dynamics that underpin emotional exhaustion, depersonalization, and personal accomplishment. The strong person-level variance observed in emotional exhaustion aligns with Maslach and Jackson’s model, which positions this dimension as the central component of burnout. Consistent with findings by 17 emotional exhaustion among clinicians is shaped by cumulative personal and professional stressors including workload, work-life conflict, and emotional labor. The significant interaction effects (person × hospital and person × item) indicate that individual experiences of exhaustion are also context-dependent, echoing findings by,11,23 who emphasized the influence of organizational culture and departmental climate. Moreover, the hospital-level variance supports literature showing that institutional policies, leadership quality, and staffing adequacy significantly influence burnout. 40 In hospitals where clinicians experience less autonomy, limited support, and a culture of overwork, emotional exhaustion tends to be more prevalent. These patterns underscore the need for organizational interventions alongside personal coping strategies.
Depersonalization demonstrated moderate variation across both individual and institutional levels. This supports empirical work by, 21 who found that depersonalization often develops as a defensive response to persistent emotional exhaustion. The significant item- and interaction-level variances indicate variability in how clinicians interpret and endorse items related to interpersonal disengagement, which aligns with studies suggesting that role-specific stressors such as exposure to traumatic events or emotionally demanding patient care heighten the risk for depersonalization.16,17,19 In addition, empirical findings by17,30 indicate that depersonalization is more pronounced in high-intensity settings such as emergency departments or intensive care units, supporting the observed hospital-level effects. Thus, the degree to which clinicians distance themselves from patients may not solely reflect individual traits but also institutional norms and work demands. Findings on personal accomplishment point to high individual-level variation, suggesting that clinicians draw on internal resources and self-efficacy to maintain a sense of competence despite systemic pressures. This resonates with 5 self-efficacy theory and with empirical studies by, 6 which highlight the protective role of perceived competence against job-related stress. The presence of hospital-level variance suggests that institutional recognition, opportunities for professional growth, and collegial respect can either foster or inhibit clinicians’ sense of achievement. Notably, despite high levels of emotional exhaustion and moderate depersonalization, many clinicians retained a strong sense of personal accomplishment. This aligns with a growing body of work14,17,29 showing that personal resources can moderate the negative impacts of burnout. Organizational strategies that reinforce autonomy, mastery, and recognition are essential for sustaining this protective factor.
A key finding was that the greatest proportion of variance in both emotional exhaustion (EE) and depersonalization (DP) was attributable to individual clinicians. This reinforces the notion that burnout is deeply personal and varies significantly among individuals, even within similar work environments. Factors such as coping strategies, psychological resilience, personality traits, and prior trauma may influence how clinicians internalize and express work-related stress.14,22 This highlights the importance of individualized interventions that target emotional regulation, stress management, and personal well-being. Programs that incorporate mindfulness, cognitive-behavioral strategies, or peer support could be especially beneficial in addressing these person-specific burnout profiles.8 -10,14,17,35 The study also found notable item-level variance and strong item discrimination, indicating that some items in the burnout assessment tools were more effective at distinguishing between individuals with different levels of burnout. From an IRT perspective, this demonstrates that the items were appropriately calibrated to detect symptoms of moderate burnout, allowing for early identification before symptoms become severe. This strengthens the case for using refined, psychometrically sound tools for routine monitoring and early intervention, particularly in high-pressure clinical environments where burnout often goes undetected until it is advanced.1 -4,23,24
Although hospital-level variance was smaller in magnitude, its statistical significance points to the role of institutional context in shaping burnout experiences. Variations in workload, administrative burden, leadership quality, and resource availability can all contribute to how burnout is experienced by clinicians in different hospitals.8 -10,14,15,24 This suggests that institutional reforms such as improving staffing ratios, streamlining administrative tasks, and fostering supportive work environments could meaningfully reduce systemic contributors to burnout. Tailoring interventions to specific hospital contexts rather than applying one-size-fits-all models may improve their effectiveness and sustainability. The presence of significant interaction effects, including those between person and item, and person and hospital, further underscores the complex and contextually embedded nature of burnout. These interactions suggest that how a clinician responds to specific burnout items is not only influenced by their internal state but also by the unique conditions of their work setting.1,4,14,35 This validates the use of multilevel modeling techniques and supports the adoption of more context-sensitive assessment strategies that reflect the diversity of clinician experiences across different institutional and national settings. Lastly, the identification of moderate residual variance implies that there are other unmeasured factors contributing to burnout, beyond those captured in the current model. These could include informal peer relationships, organizational culture, political instability, or even societal expectations placed on healthcare professionals. Future research should aim to explore these latent variables to gain a more comprehensive understanding of burnout’s root causes and potential points of intervention. The application of Bayesian IRT modeling provided a refined look at item functioning. Items with high discrimination values were better able to distinguish between clinicians at different levels of burnout, suggesting their potential for use in screening and early intervention. Items with low difficulty levels, widely endorsed even at mild burnout levels, indicate early warning signs that should not be overlooked. This aligns with findings by12,13,24,42 who advocate for tools that can detect subtle and early manifestations of burnout to enable timely support. These psychometric insights reinforce the importance of using adaptive and psychometrically sound tools in clinical settings. The variability in item performance also suggests the need for culturally and contextually responsive assessment instruments, especially in diverse hospital environments.
Conclusion
This study offers a comprehensive examination of clinician burnout by integrating generalizability theory and Bayesian item response theory (IRT) to uncover the multifaceted nature of emotional exhaustion, depersonalization, and personal accomplishment within hospital settings. The results reveal that burnout among clinicians is shaped by a combination of individual-level characteristics, institutional factors, and item-specific measurement properties. Emotional exhaustion emerged as the most prominent and variable component, highlighting the need for both individual resilience-building and organizational transformation to address chronic stress and workload imbalances. Depersonalization, while moderate, showed meaningful variations influenced by both person and hospital contexts, underscoring the emotional toll of clinical work and the importance of supportive work environments. Personal accomplishment, although relatively high, varied significantly between individuals and across hospital environments, pointing to the protective role of self-efficacy and workplace recognition in mitigating burnout. The use of Bayesian IRT provided deeper insight into the psychometric functioning of burnout items, ensuring that instruments used for measurement are both discriminating and sensitive to the lived experiences of clinicians. This methodological rigor strengthens the validity of burnout assessments and enhances their utility in both research and practice. In effect, clinician burnout cannot be fully understood or effectively addressed without considering the complex interactions between personal, contextual, and measurement dimensions. The findings advocate for multi-level interventions that not only empower clinicians but also transform the organizational cultures and systems in which they work. Additionally, the study provides a strong case for refining burnout assessment tools using advanced psychometric models to capture the nuanced realities of clinical practice more accurately. Future research should focus on longitudinal tracking of burnout trajectories, cross-cultural validations of burnout tools, and the evaluation of targeted interventions across diverse healthcare systems. Only through such comprehensive and systemic efforts can the healthcare sector create sustainable and supportive environments where clinicians can thrive both personally and professionally.
Recommendations
Healthcare institutions should implement structured, multilevel burnout prevention and intervention programs that address both individual and systemic contributors to clinician burnout. These programs should include mindfulness training, peer support groups, and resilience-building workshops for clinicians, alongside organizational changes such as improved staffing ratios, workload redistribution, and flexible scheduling. Such interventions must be informed by empirical evidence and regularly evaluated for effectiveness to ensure they address the core stressors identified in this study. Hospitals and health authorities should adopt burnout assessment tools that are psychometrically robust, culturally sensitive, and contextually relevant, such as those refined through item response theory (IRT) and generalizability theory (G-Theory). Routine use of these validated tools will enable more precise identification of burnout symptoms at both the individual and departmental levels, facilitating timely and targeted support. The use of scientifically grounded instruments also ensures greater fairness and reliability in institutional wellness monitoring practices. There is a pressing need for healthcare institutions to invest in leadership development programs that cultivate empathetic, inclusive, and responsive leaders who prioritize clinician well-being. Hospital administrators and department heads should be held accountable for fostering a positive work climate, minimizing role conflict, and encouraging open communication. Leadership behavior has been empirically shown to directly influence emotional exhaustion and depersonalization; thus, empowering leaders with the tools and training to support their teams is essential for mitigating burnout across healthcare systems.
Implications for Policy and Practice
The implications of the variance findings for policy and practice are both profound and multifaceted, offering actionable insights for healthcare administrators, policymakers, and mental health practitioners. Firstly, the data underscores the necessity for tailored interventions that operate at both the individual and institutional levels. While individual-focused approaches such as cognitive-behavioral therapy (CBT), mindfulness-based stress reduction (MBSR), resilience training, and peer-support programs are crucial for enhancing clinicians’ coping mechanisms, these must be strategically paired with structural reforms. Organizational interventions should include reducing administrative burdens, increasing autonomy, improving staff-to-patient ratios, and promoting a more supportive and communicative work culture. Without such systemic changes, individual-level efforts may have limited and short-lived impact. Secondly, the study points to the importance of targeted and ongoing measurement of burnout using psychometrically validated tools. Rather than treating burnout as a static or monolithic phenomenon, institutions must assess it dynamically and contextually monitoring changes over time and across units, roles, and hospital settings. Measurement should go beyond global burnout scores to include item-level diagnostics and variance decomposition, which reveal the specific dimensions (eg, emotional exhaustion, depersonalization) and contextual factors (eg, hospital culture, workload distribution) contributing most to burnout risk. This precision allows for data-driven, customized interventions that are more likely to be effective and sustainable. Third, the findings have major implications for healthcare policy reform. The significant variance attributed to institutional and contextual factors indicates that burnout is not merely a reflection of individual psychological weakness but a symptom of broader organizational dysfunction. As such, health system leaders and policymakers must reframe burnout as a systemic issue requiring structural solutions. Incorporating clinician well-being into hospital accreditation standards, patient safety protocols, and national quality benchmarks similar to could institutionalize the importance of mental health in healthcare settings and foster accountability at the leadership level. Lastly, equity and resource allocation must be central considerations in any burnout mitigation strategy. The observed differences in variance across hospitals highlight disparities in institutional capacity to support staff well-being. Under-resourced facilities, often serving high-need populations, may be at greater risk of fostering burnout-prone environments due to inadequate staffing, outdated infrastructure, or limited access to professional development resources. Policymakers and donors must adopt an equity-informed lens when distributing resources, ensuring that hospitals facing greater contextual burdens receive the support necessary to foster healthier and more resilient workforces. In sum, addressing burnout effectively requires an integrated, multi-level strategy that combines individual empowerment with systemic change. The variance analysis not only clarifies where and how burnout manifests most severely but also provides a roadmap for prioritizing interventions and reforming healthcare delivery in a way that centers clinician well-being alongside patient care.
Limitations of the Study
Despite the valuable insights generated, this study is not without limitations. First, the cross-sectional design restricts the ability to draw causal inferences between burnout components and contextual factors such as hospital environment or role-specific stressors. Longitudinal data would better capture fluctuations in burnout over time and assess the effectiveness of targeted interventions. Second, while the Maslach Burnout Inventory-Human Services Survey (MBI-HSS) is widely validated, the reliance on self-report measures may introduce bias, including social desirability or underreporting due to stigma. Third, the generalizability of findings may be limited due to the specific hospital settings sampled, which may not fully represent other regions, healthcare systems, or non-clinical staff populations. Additionally, while variance component analysis offers a nuanced understanding of burnout sources, it does not fully account for intersectional influences such as gender, socio-economic background, or professional hierarchy, which may moderate burnout experiences. Future research incorporating mixed methods and broader demographic representation would help deepen understanding and inform more inclusive policy responses.
Footnotes
Acknowledgements
The authors would like to express their sincere gratitude to the ICU clinicians who participated in this study across the 9 tertiary hospitals in Ghana, South Africa, and Botswana. Appreciation is also extended to the local institutional coordinators who facilitated ethical approval and data access. Special thanks to the Department of Educational Foundations at the University of Education, Winneba, and the Botswana University of Agriculture and Natural Resources for their administrative and academic support.
List of Abbreviations
Ethical Considerations
This study received ethical clearance from institutional review boards and ethics committees at participating universities and hospitals in Ghana, South Africa, and Botswana. This empirical study involved direct participation from clinicians and was conducted in compliance with international ethical research standards.
Consent to Participate
Confidentiality and data protection measures were rigorously applied, and all participants provided informed consent. The study upheld the dignity, rights, and well-being of all individuals involved.
Author Contributions
S.N. led the conceptualization and design of the study. He coordinated data collection across the 3 countries and performed the statistical analyses using generalizability theory, Bayesian hierarchical modeling, and item response theory. He also drafted the manuscript, integrated the theoretical framework, and managed correspondence with journal editors and peer reviewers. T.B. contributed to the development of the research instruments, facilitated access to participating hospitals in Botswana, and supported data interpretation with a focus on institutional and cultural contexts. He critically reviewed and revised the manuscript, ensuring clarity, coherence, and alignment with empirical literature. Both authors approved the final version of the manuscript and are jointly responsible for its content.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was conducted without external funding. It was self-financed by the authors as part of their academic research commitments.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The datasets generated and analyzed during this study are not publicly available due to institutional confidentiality agreements and ethical restrictions concerning participant data from hospitals. However, anonymized data or summaries may be made available upon reasonable request to the corresponding author, subject to institutional approval and adherence to ethical data sharing protocols.
