Abstract
The current study presents a reliability generalization of the Problem-Solving Inventory (PSI), utilizing the comprehensive Reliability Generalization Meta-Analysis (REGEMA) checklist to ensure a thorough and methodical approach. The PSI, a tool designed to assess individuals’ perceptions of their problem-solving abilities, consists of a total scale and three subscales: problem-solving confidence (PSC), approach-avoidance style (AAS), and personal control (PC). Each subscale evaluates distinct facets of problem-solving appraisal. From an initial pool of 2,196 articles, 123 met the inclusion criteria and were analyzed using a varying-coefficient model to account for the dynamic nature of reliability coefficients across studies. The meta-analysis revealed that the PSI total scores consistently demonstrated excellent reliability, as did the PSC and AAS subscales. Key predictors of reliability for the PSI and PSC included standard deviation, mean age of the sample, and sample type, whereas mean age and the language of inventory administration were key predictors for the PC subscale. The AAS scale’s reliability was notably influenced by the standard deviation of the scores, sample size, and proportion of women in the sample. These insights underscore the critical role of demographic and methodological variables in evaluating an instrument’s reliability across varied contexts. The study findings reinforce the importance of a nuanced approach to psychological measurement with an awareness of how demographic, sample, and cultural factors influence the reliability of psychometric tools.
Plain Language Summary
The current study focuses on a statistical synthesis of the reliability of the Problem Solving Inventory (PSI) across 123 studies. The PSI assesses individuals’ perceptions of their problem-solving abilities, and consists of a total scale and three subscales: problem-solving confidence (PSC), approach-avoidance style (AAS), and personal control (PC). Each subscale evaluates a different aspect of problem-solving appraisal. The statitical integration of the reliabilities across studies revealed that the PSI total scores, the PSC and AAS consistently demonstrated excellent reliability. It was further found that the reliability of the PSI and the PSC was influenced by the standard deviation, mean age of the sample, and sample type, whereas mean age and the language of inventory administration were key predictors for the PC subscale. The AAS scale’s reliability was influenced by the standard deviation of the scores, sample size, and proportion of women in the sample.
Keywords
Introduction
The current study conducted a reliability generalization of the Problem Solving Inventory (Heppner, 1988) using the Reliability Generalization Meta-Analysis Checklist (Sánchez-Meca et al., 2021). Problem solving is defined as the complex interaction between cognitive, affective, and behavioral processes for the purposes of adapting to internal or external stressors or life demands (Heppner, 1988; Heppner et al., 1985). It is a self-directed process aimed at identifying potential solutions to challenges or stressors encountered in daily life (D’Zurilla & Maydeu-Olivares, 1995). A range of measures have been developed to assess different facets of problem-solving. For example, the Social Problem-Solving Inventory (SPSI) was developed to assess how individuals manage everyday social challenges. The scale has been validated in a range of settings and has demonstrated adequate internal consistency reliability (Gál et al., 2022; Schepers et al., 2023). Similarly, the Interpersonal Problem-Solving Scale assesses the individuals ability to navigate complex interpersonal interactions through the use of different coping strategies (Do et al., 2022). The scale has been used among different age groups and in varying settings and found to be valid and reliable (Mercan & Uysal, 2023; Nguyen et al., 2021).
The current study focuses on the Problem Solving Inventory, or PSI (Heppner, 1988), which was designed to assess individuals’ appraisals or perceptions of their problem-solving ability. It is a 32-item self-reported inventory that consists of a total scale and 3 subscales: problem-solving confidence (PSC), approach-avoidance style (AAS), and personal control (PC). The PSC subscale reflects the individual’s belief and trust in their capacity to solve problems and is indicative of problem-solving self-efficacy, while the AAS subscale is a motivational construct that reflects the individual’s general inclination to either approach or avoid problems. The latter influences problem-solving behaviors and the effective use of available resources in negotiating demanding life situations. The PC subscale reflects an individual’s belief that they are in control of their emotions and behavior while engaging in problem solving (Heppner, 1988; Heppner et al., 1983). More than two decades after development of the PSI, Heppner et al. (2004) reviewed over 120 empirical studies utilizing the tool and concluded that it served as a reliable assessment of problem-solving appraisal and was associated with indicators of mental health and well-being.
The PSI continues to be a popular instrument. It has recently been administered in the context of the COVID-19 pandemic to assess problem-solving appraisals among healthcare workers and university students and their association with various mental health and well-being indices (Korkmaz et al., 2020; Padmanabhanunni & Pretorius, 2024; Saeedyan et al., 2022). The PSI has also been used to assess the role of problem solving in blood glucose management among children with diabetes mellitus (Mutluer et al., 2023) and to examine the effectiveness of a customized cognitive-behavioral therapy intervention among mothers of children with autism spectrum disorder (Abdelaziz et al., 2024). Furthermore, the scale has been used to assess the effect of a crossword puzzle activity on nursing students’ problem-solving and clinical decision-making skills (Kaynak et al., 2023).
The psychometric properties of the PSI, including its factor structure, have been examined across various studies involving a wide array of population groups. For example, Teo et al. (2021) assessed the psychometric properties of the PSI with a Southeast Asian sample and found that a three-factor solution had the best fit. Micoogullari and Ekmekci (2018) examined the reliability and validity of the PSI among Turkish athletes and reported a three-factor solution as having the best fit. Pretorius et al. (2023) investigated the feasibility of reducing the items in the PSI through the application of Rasch and Mokken analyses and classical test theory. Their findings validated the three-dimensional structure within the abbreviated version of the subscales among a sample of South African university students.
Reliability is a central attribute in evaluating the psychometric properties of a psychological measuring instrument. Sánchez-Meca et al. (2021) expressed the view that there is a widespread but incorrect assumption that reliability is a fixed characteristic of a measurement tool and this is evidenced by researchers merely referencing previously reported reliability of a scale and thus implying that the scale is equally reliable in their study. Pretorius (2021) underscores that reliability is not an innate quality of the test itself, rather reliability pertains to the scores derived from applying a test to a particular group of participants under certain conditions. Hence, reliability can be influenced by contextual, linguistic, educational, and cultural factors. Culture, for example, can influence the understanding and interpretation of items and the manner in which individuals respond to an instrument. This is especially pertinent in the case of instruments like the PSI, which has been translated and used across diverse cultural settings. Translation alone may not ensure conceptual equivalence, as idiomatic expressions, culturally specific norms, and problem-solving strategies may differ across contexts. For example, certain items in the PSI may assume an individualistic orientation toward problem-solving thus emphasizing personal control and independent decision-making which may not align with collectivist values that prioritize relational interdependence and group harmony. If translated versions of the PSI fail to capture these nuances, there is a risk that respondents from collectivist cultures may interpret items differently or find them less applicable, thereby introducing measurement bias and reducing reliability.
Molnár et al. (2022) argue that members of individualistic cultures may perform more effectively on rule-based problems whereas those with collectivistic cultural orientations may be more adept at context-related problems. This is ascribed to the prioritization of group related goals and consultative decision making in collectivistic cultures. Furthermore, these authors highlight the possibility of country-level differences in the teaching of problem solving within the schooling environment, which may then impact on responses to an instrument. In essence, constructs that hold meaning in one context may not resonate or may be interpreted differently in another. This can lead to bias, reduced reliability, and lack of validity (Molnár et al., 2022; Sánchez-Meca et al., 2021).
Since reliability can fluctuate between different applications of a test in varying population groups, reliability generalization (RG) meta-analysis has been suggested as an important avenue to statistically integrate reliability estimates (Sánchez-Meca et al., 2021; Vacha-Haase, 1998). Through an RG meta-analysis, it is possible to aggregate individual studies that have utilized a particular test and provided a reliability estimate based on the data collected, calculate the mean reliability of test scores, and explore variations in the measurement error of test scores across various contexts, samples, and target groups (López-Ibáñez et al., 2024). Additionally, RG meta-analysis assists in identifying potential moderating factors that could influence the reliability of a test, such as the characteristics of the sample population (e.g., age, gender, cultural background), the domains being assessed (e.g., cognitive skills, emotional traits), and the time intervals between test administrations (e.g., test-retest reliability over short or long periods; Lindsey et al., 2017; López-Ibáñez et al., 2024).
To date, numerous RG meta-analyses have been conducted in the field of psychology, focusing on various measurement tools such as the Internet Gaming Disorder Scale (Gisbert-Pérez et al., 2022), Impact of Events Scale (Hale et al., 2011), Revised Child and Adolescent Depression Scale (Piqueras et al., 2017), Obsessive-Compulsive Disorder Inventories (Sandoval-Lentisco et al., 2023), and Insomnia Severity Index (Cerri et al., 2023). These analyses underscore the utility of RG studies to provide comprehensive evaluations of measurement instruments across diverse psychological constructs, highlighting their crucial role in enhancing the accuracy and reliability of psychological assessments.
In the last few decades, there has been a proliferation of guidelines on the reporting of meta-analyses, with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) being among the most popular (Moher et al., 2009). Other commonly used checklists include the Meta-Analysis Reporting Standards (MARS), the Measurement Tool to Assess Systematic Reviews (AMSTAR), and the Consensus-Based Standards for the Selection of Health Status Measurement Instrument (COSMIN; Cooper, 2008; Shea et al., 2007; Terwee et al., 2012). Sánchez-Meca et al. (2021) developed the Reliability Generalization Meta-Analysis Checklist (REGEMA) to address shortcomings in existing guidelines for the reporting of meta-analytic studies, specifically those related to the phenomenon of reliability induction. The term “reliability induction” refers to the tendency among researchers to reference reliability coefficients from previous studies that have used the same instrument and infer that their data are equally reliable, rather than provide a reliability estimate derived from the data collected in their study.
Shields and Caruso (2004) distinguish between two primary mechanisms of reliability induction, by report and by omission. Reliability induction by report refers to the process of inferring reliability indirectly using statistical techniques or by drawing comparisons with studies of similar methodological design and subject matter. Reliability induction by omission consists of neglecting to report reliability despite the use of a psychometric instrument in the study. By addressing both types of reliability induction, the REGEMA checklist ensures that meta-analysts are equipped to handle the diverse ways in which reliability information can be presented or omitted in the literature. The checklist also emphasizes the importance of assessing the quality of the included studies, identifying potential moderating variables that may explain variability in reliability estimates, and conducting sensitivity analyses to evaluate the robustness of the findings (Sánchez-Meca et al., 2021). Since its development, the REGEMA checklist has been used in a range of RG meta-analytic studies of measuring instruments (Esparza-Reig et al., 2021; Reig-Aleixandre et al., 2023; Yüksel Doğan et al., 2023). This study aims to contribute to the existing literature by providing a comprehensive overview of the reliability of the PSI across different demographic, cultural, and contextual backgrounds.
Method
We conducted the reliability generalization in accordance with the Reliability Generalization Meta-Analysis (REGEMA) guidelines. This comprehensive approach entailed a systematic review and synthesis of existing research to evaluate the overall reliability of the measurement tool across various studies.
Search Strategy
We used the following search terms: “problem solving inventory” OR “problem solving appraisal.” The inclusion of the abbreviation for the problem-solving inventory, “PSI,” and the name of the author of the PSI, “Heppner,” resulted in too wide a search and returned ambiguous results, which were therefore discarded. We conducted the literature search using the following databases: PubMed, PsycArticles, Science Direct, Scopus, Taylor & Francis, and Web of Science. The original search period was from 1982 (the publication of the development study of the PSI) to October 2022. However, given the number of articles retrieved and the length of time it took to screen, review, and code these articles, we embarked on a second search for the period from October 2022 to November 2023.
Eligibility Criteria and Screening Process
Only articles published in English from January 1982 to November 2023 were considered. Another inclusion criterion was that the PSI and/or subscales of the PSI were used in the study. The two authors independently screened the abstracts of all the articles retrieved to determine whether it appeared that the article met the inclusion criteria based on the abstract. Full-text versions of the articles were retrieved for all abstracts that appeared to meet the inclusion criteria. Where the two authors disagreed about the relevance of an abstract, the full-text version of the disputed abstract was also retrieved.
Coding of Study Characteristics
Following recommendations from previous reliability generalization studies, several study characteristics were coded to statistically examine their potential influence on Cronbach’s alpha. Sample size, mean age, and proportion of female respondents in the study were recorded as continuous variables. The language in which the PSI was administered was coded as 0 = translated, 1 = English. The sample type was coded as 0 = not students, 1 = students. Each author independently coded the final selected articles. The two columns for each variable from the two authors were placed side-by-side in an Excel spreadsheet and subtracted from each other. A non-zero result indicated a difference in coding between the two authors. In these cases, the article was read jointly by the two authors to resolve the difference in coding.
Statistical Analysis
We used descriptive statistics to summarize the characteristics of the studies and a frequency distribution to indicate whether Cronbach’s alpha was reported in the studies or induced for the PSI and its three subscales. We also presented the distribution of reliabilities of the PSI and subscales visually using stem-and-leaf plots. All these analyses were conducted using IBM SPSS Statistics version 28 for Windows (IBM Corp., Armonk, NY, USA).
For the meta-analysis, we used a varying-coefficient model. Traditionally, two models have dominated the field of meta-analysis: the fixed-effect model, which assumes effect size homogeneity in all studies included; and the random-effect model, which assumes the studies in the meta-analysis are randomly sampled from a well-defined population (Bonett, 2010). Conversely, the varying-coefficient model does not require the unrealistic assumptions of the fixed-effect and random-effect models (Bonett & Calin-Jageman, 2024; Krizan, 2010). The varying-coefficient model has been shown to perform well under conditions of heterogeneity of reliability coefficients and nonrandom sampling of studies. In addition, using Monte Carlo simulations, Bonett (2010) demonstrated that confidence intervals based on fixed-effect models may be too narrow and confidence intervals based on random-effect models may be too wide, compared to the varying coefficient model. Since internal consistency coefficients are in an r2-type metric, Henson and Thompson (2002) posit that the transformation of the reliability coefficient (e.g., Fisher’s r-to-z) is not required. Thus, the untransformed alpha coefficients were used in the meta-analysis. To conduct a meta-analysis of the reliability coefficients of the PSI and its subscales, we used the “vcmeta” package (Bonett & Calin-Jageman, 2024) in R (R Development Core Team, 2020).
We used regression analyses with Cronbach’s alpha as the criterion variable and certain study characteristics as predictors to determine the extent to which these study characteristics predicted the reliability coefficient. To support the interpretation of the contribution of each predictor, we calculated structure coefficients and squared structure coefficients and conducted a commonality analysis. Structure coefficients (rs) are the correlation between the predictor and the predicted criterion score, while a squared structure coefficient (rs2) describes the proportion of variance in the R2 effect explained by the predictor (Kraha et al., 2012). Commonality analysis enables partitioning of the explained variance of the R2 into variance that is uniquely explained by a single predictor and variance that is explained by two or more predictors. All of these analyses were conducted using the “yhat” package in R (Nimon et al., 2023).
Results
The two searches and mining of references resulted in 2,196 articles available for screening. The contribution of the various databases to the 2,196 articles is reflected in the REGEMA flowchart in Figure 1. Of these 2,196 articles, 735 were duplicates, which were excluded from the analysis. Based on screening of the abstracts, an additional 760 articles were excluded for various reasons (e.g., no use of PSI, systematic review or meta-analysis, etc.). Thus, after abstract screening, we extracted the full text of 701 articles. Of these 701, an additional 335 were excluded for various reasons (e.g., short or different version of PSI, not English, etc.). In total, we found 366 articles that used the PSI total score and/or subscale scores. Of these 366 studies, only 123 reported Cronbach’s alpha; thus, these 123 studies were included in the meta-analysis. The studies that reported Cronbach’s alpha for the PSI and its subscales are referenced in Supplemental Appendix 2.

REGEMA flowchart for selection of studies for the total PSI scale.
The REGEMA flowcharts for the PSC, PC, and AAS subscales are shown in Supplemental Appendix 1. In total, 199 studies did not use the PSC subscale; thus, 167 records were available for analysis. However, one study only reported the reliability of the subscale as a range. Only 61 studies reported Cronbach’s alpha, and these studies were used in the meta-analysis. A total of 206 studies did not use the PC subscale, and one study reported the reliabilities as a range. Thus, 159 studies were available to examine whether reliability was reported for the PC subscale, and only 54 studies reported a reliability coefficient. A total of 207 studies did not use the AAS subscale, and one study reported reliability as a range. Thus, 158 studies were examined to determine whether reliability was reported for the AAS subscale, and only 53 reported a reliability coefficient.
Description of the Characteristics of the Studies
There were 50,970 participants included in studies that used the PSI (M = 414.4, SD = 1,404.0), 37,325 in studies that used the PSC subscale (M = 611.9, SD = 1,967.9), 36,017 in studies that used the PC subscale (M = 667.0, SD = 2,086.9), and 35,563 in studies that used the AAS subscale (M = 671.0, SD = 2,106.7). The mean age for the PSI was 26.1 years (SD = 11.5), and the mean ages for the PSC, PC, and AAS subscales were 26.7 (SD = 11.5), 26.4 (SD = 11.0), and 27.0 years (SD = 11.7), respectively. For the total scale and all subscales, the proportion of women in the samples ranged from 0.61 to 0.64.
More than half (56%) of the studies that used the total scale administered a translated version of the PSI, whereas fewer than half of studies used a translated version of the PSC (29.8%), PC (34.0%), and AAS (32.7%) subscales. In the majority of the studies, the samples predominantly consisted of students (PSI = 64.2%, PSC = 66.7%, PC = 67.9%, AAS = 65.4%). Translated versions of the PSI included Turkish (n = 121), French (n = 7), Italian (n = 7), Chinese (n = 6), Korean (n = 4), Norwegian (n = 3), Farsi (n = 2), Spanish (n = 1), German (n = 1), Greek (n = 1), Japanese (n = 1), Arabic (n = 1), Malay (n = 1), Dutch (n = 1), and Afrikaans (n = 1).
Reliability Reporting
The frequency of studies reporting Cronbach’s alpha and reliability induction for the PSI and its subscales are reported in Table 1.
Frequency of Reliability Reporting and Reliability Induction for the PSI and Its Subscales.
Note. PSI = Problem Solving Inventory.
Table 1 reflects that the practice of referencing previously reported reliability (reliability induction by reporting) of the PSI and its subscales is the most common form of reliability reporting. Only 33.6% of studies that used the PSI total score reported reliability for the scores obtained in their study, and 36.9% of studies did so in the case of the PSI subscales.
The reported Cronbach’s alpha values in the various studies for the PSI and its subscales are presented as stem-and-leaf plots in Figure 2. Figure 2 reflects that the majority of studies reported reliability coefficients greater than .70 (n = 118) for total scale scores, with a significant concentration of alphas above .80 (n = 104, 85%). There were five studies that were exceptions, reporting reliability coefficients lower than .71 (.50–.69). Similarly, the majority of studies reported reliabilities for the PSC subscale scores greater than .70 (n = 58), with 42 of the 61 reliabilities (68.8%) greater than .80. The reliabilities of the AAS subscale scores demonstrated a similar pattern, with only one study reporting a reliability coefficient less than .70, 52 of 53 studies reporting coefficients greater than .70, and 60% of studies reporting coefficients greater than .80. The reviewed studies generally reported lower reliability scores for the PC subscale, with 20 of 54 studies reporting reliability coefficients less than .70, 34 reporting coefficients greater than .70, and only 5 studies reporting a coefficient greater than .80.

Stem-and-leaf plots of reliability coefficients for the PSI and its subscales. (A) Total scale; (B) problem-solving confidence; (C) personal control; (D) approach-avoidance style.
With respect to other forms of reliability, 4 studies reported test-retest reliabilities: Ji et al. (2018 = .56), Tumkaya et al. (2009 = .70), Dixon et al. (1993 = .70), and Heppner and Petersen (1982 = .89). The mean test-retest reliability for the total scale was 0.75 (SD = 0.15). Two studies reported construct reliability (CR): Abdollahi et al. (2018: CR = .86) and Abdollahi et al. (2014: CR = .74). Only one study reported a split-half reliability coefficient of .68 (Cheng & Lam, 1997).
Meta-Analysis of Cronbach’s Alpha
The results of a meta-analysis (varying-coefficient model) of the reliability scores of the PSI and its subscales are reported in Table 2.
Results of Meta-Analysis of the Cronbach’s Alpha Scores of the PSI and Its Subscales.
Note. PSI = Problem Solving Inventory; PSC = Problem-Solving Confidence; PC = Personal Control; AAS = Approach-Avoidance Style.
Table 2 reflects that, to some extent, the combined effect sizes mirror the patterns of reliabilities found in the stem-and-leaf plots in that the PSI, PSC, and AAS scores had a combined effect size greater than .80 while the combined effect for the PC scores was .70. The confidence interval width was very narrow for all the scales, indicating a relatively precise estimate of the combined effect size. The forest plots for the PSI and the three subscales are reported in Supplemental Appendix 3.
Predicting Variability in Reported Cronbach’s Alpha for the PSI and Its Subscales
The results of the regression analysis predicting the reliability of the PSI, PSC, PC, and AAS scores are reported in Table 3. These results include structure coefficients, proportions of variance uniquely explained by a single predictor, and proportions of variance explained by the predictor and one or more other predictors (common). The full commonality analyses are reported in Supplemental Appendix 4.
Reliability Generalization Regression Results for the PSI and Its Subscales.
Note. rs = structure coefficient; rs2 = structure coefficient; unique = proportion of criterion variance explained uniquely by the predictor; common = proportion of criterion variance explained by the predictor and one or more other predictors.
With respect to the total scale (PSI), Table 3 indicates that the standard deviation of the PSI, with a squared structure coefficient of 0.98, explained 98% of the variance in R2, namely .44. Of this 44%, 31.9% was uniquely explained by the standard deviation, and 11.77% was explained by the standard deviation together with one or more other predictors. The mean age of the sample and sample type had the second and third highest squared structure coefficients, explaining 6% and 7% of the variance in R2, respectively. However, the commonality analysis indicated that these factors uniquely explained only 0.35% and 0.01% of the variance in R2, respectively, while they explained 6.86% and 5.93% of the variance together with one or more other predictors.
For the PSC, the mean age of the sample, sample type, and standard deviation had the strongest squared structure coefficients, explaining 50%, 38%, and 24% of the variance in R2, respectively. Slightly more than half (4.77%) of the variance in R2 was uniquely explained by the standard deviation, while 3.20% was explained by the standard deviation together with one or more other predictors. For sample type and mean age, the squared structure coefficient indicated that these two predictors accounted for 38% and 50% of the variance in R2 (i.e., .13 and .17, respectively). However, of these explained variances, 12.56% and 11.83% was accounted for by sample type and mean age, respectively, together with one or more other predictors, while the variance uniquely explained by sample type and mean age was 0.43% and 4.94%, respectively.
For PC, the mean age of the sample and language had the strongest squared structure coefficients, explaining 66% and 21% of the variance in R2, respectively. For both of these factors, the majority of the variance was uniquely explained by mean age (25.66%) and language (13.86%). For AAS, the standard deviation, sample size, and proportion of women were the strongest predictors and explained 28%, 17%, and 12% of the variance in R2, respectively. Both the standard deviation and proportion of women uniquely explained the majority of the variance in R2 (14.55% and 13.48%, respectively), while for sample size, the majority of the variance in R2 (13.44%) was explained by sample size together with one or more other predictors.
Discussion
Problem-solving ability is a critical resource in navigating daily stressors and life challenges. The PSI is the most popular instrument for assessing perceived problem-solving ability. It has been used in a wide array of studies and the psychometric properties of the instrument have been well-established (Heppner et al., 2004; Pretorius et al., 2023). However, to date, no studies have undertaken a reliability generalization analysis of the PSI. The current study addressed this gap in the literature and employed the REGEMA checklist as a guiding framework.
There were several significant findings. The PSI total scores, as well as the PSC and AAS, demonstrated excellent reliability across studies. The majority of studies reported reliability coefficients greater than .70 (n = 118) for total scale scores, with a significant concentration of alphas greater than .80 (n = 104, 85%). Similarly, the majority of studies reported reliabilities for the PSC subscale scores greater than .70 (n = 58), with 42 of the 61 reliabilities greater than .80 (68.8%). The reliabilities of the AAS subscale scores demonstrated a similar pattern, with only one study reporting a reliability coefficient less than 0.70, 52 of 53 coefficients greater than 0.70, and 60% of coefficients greater than 0.80.
The reviewed studies generally reported lower reliability for PC subscale scores, with 20 of 54 studies reporting reliability coefficients less than 0.70, 34 studies reporting coefficients greater than 0.70, and 5 studies reporting coefficients greater than 0.80. The variability in alpha coefficients underscores the reality that reliability is a characteristic of a set of scores and not of an instrument. The combined Cronbach’s alphas observed were .85 for the PSI, .82 for the PSC, .70 for the PC, and .81 for the AAS, indicating a generally high level of reliability across these scales, with PSI and PSC showing slightly higher reliability than PC and AAS.
Upon further examination, we identified the strongest predictors of reliability coefficients for each component, offering an understanding of the factors that influence measurement consistency. For the PSI, standard deviation, mean age of the sample, and sample type emerged as the strongest predictors of reliability coefficients. This finding suggests that the variability within the sample, the average age of participants, and the specific characteristics of the group being studied (e.g., students vs. non-students) significantly impact the reliability of the PSI scores. Similarly, for the PSC, mean age and sample type were identified as strong predictors, along with standard deviation. This consistency with the PSI findings highlights the importance of demographic factors and sample diversity in determining the reliability of problem-solving confidence measures.
The reliability of the PC scores was strongly predicted by mean age and the language in which the inventory was administered. This finding indicates that perceived personal control of problem solving varies significantly with age and may be influenced by linguistic factors, possibly due to translation issues or cultural differences in interpreting the scale items. For the AAS, standard deviation, sample size, and the proportion of women in the sample were the strongest predictors of reliability coefficients. This finding suggests that the consistency of the AAS is influenced by the variability within the sample, the overall size of the sample, and gender composition, indicating that these factors may affect individuals’ responses to items assessing their approach or avoidance tendencies in problem solving.
These findings underscore the high reliability of the PSI and its subscales, as well as the complex interaction of demographic, methodological, and sample-related factors in influencing the consistency of these psychological measures. Understanding these predictors is crucial for future research and application of the PSI, as this knowledge can be used to guide the selection of samples and interpretation of results in diverse settings.
It is noteworthy that similar variables have been found to impact reliability and predictability in other RG meta-analytical studies concerning different instruments. In a reliability generalization of the Dimensional Obsessive-Compulsive Scale, Lopez-Nicolas et al. (2021) found that standard deviation, mean score, and sample characteristics impacted the consistency of the scale. Similarly, in an RG meta-analysis of the Adult Prosocial Behavior Scale, Badenes-Ribera et al. (2023) concluded that a variety of factors, including the language of the study, the target population, the proportion of male participants in the sample, and the geographical location, were associated with variations in the internal consistency of the scale. Furthermore, Vicent et al. (2019) reported that study characteristics—including ethnicity, language, standard deviation, and mean age—had a statistically significant association with reliability coefficients in their RG meta-analysis of the Child and Adolescent Perfectionism Scale. It is crucial to acknowledge that the influence of such variables on reliability has not been universally reported across all studies (Demir et al., 2024). This variation underscores the complexity of psychological measurement and suggests that the impact of these factors may vary depending on the specific characteristics of the instrument, the context in which it is used, and the populations being studied. These varied findings also highlight the need for caution in generalizing about predictors of reliability.
In the current study, the identification of the strongest predictors of reliability coefficients for each scale has implications for both research and practice. For PSI and PSC, standard deviation, mean age, and sample type emerged as significant predictors, underscoring the importance of demographic characteristics and sample variability in influencing the reliability of these measures. The findings suggest that age and specific characteristics of the sample (e.g., students vs. non-students) can significantly impact the consistency of responses. For PC, mean age and language were the strongest predictors, indicating that cultural and linguistic factors, along with age, play crucial roles in how individuals perceive and report their sense of personal control in problem-solving situations. The AAS scale’s reliability was most strongly predicted by standard deviation, sample size, and the proportion of women in the sample. This finding highlights the influence of gender and the distribution of responses within a sample on the measurement’s reliability, pointing to the nuanced ways in which individuals’ approach or avoidance of problem solving may be assessed and interpreted. Existing studies have cautioned against overly simplistic or reductionistic interpretations of gender differences in problem-solving abilities. This is due to the role of culture, socialization and differential access to resources. Subscription to traditional gender roles, for example, may result in women being socialized to adopt less assertive approaches (Dou et al., 2021; Golshiri et al., 2023; Ramírez-Uclés & Ramírez-Uclés, 2020). This could potentially influence how consistently the AAS scale captures approach-avoidance tendencies across genders.
The results of this reliability generalization analysis offer important contributions to the existing literature on the PSI and its subscales. The use of the REGEMA checklist and the large sample size, which included data from 123 studies drawn from an initial pool of 2,196 articles, strengthens the reliability of the findings and the conclusions that were derived. The high reliability coefficients associated with the PSI and its subscales confirm their utility in psychological assessment and highlight areas for further investigation and refinement. In addition, this study goes beyond the simple reporting of reliability estimates. Instead, by identifying key predictors (e.g., standard deviation, mean age, language of administration) that influence the reliability of the PSI and its subscales, the study expands the existing knowledge base on the robustness of the instrument. It is recommended that researchers and practitioners consider these predictors when designing studies, interpreting results, and implementing the PSI in various contexts. In sum, these findings underscore the importance of a nuanced approach to psychological measurement that acknowledges the interaction of demographic factors, sample characteristics, and cultural factors in influencing the reliability of psychometric instruments.
The study has important practical implications for practitioners and researchers. It is essential that the reliability of the PSI be empirically assessed within each new population or setting, rather than assuming equivalence based on prior studies. When using translated versions of the PSI, researchers should ensure that both the linguistic and cultural equivalence is established through adaptation processes, including back-translation, expert review, and interviewing with members of the target population. Practitioners should interpret PSI scores with caution when working with culturally diverse groups, recognizing that certain items may reflect culturally specific assumptions about problem-solving that do not hold uniformly across groups. Finally, future scale development and adaptation efforts should incorporate qualitative research to explore how problem-solving is conceptualized in various cultural contexts, thereby guiding the refinement or development of culturally sensitive measurement tools.
Despite the strengths of this study, there were several limitations that need to be noted. The meta-analysis relied on the reliability coefficients reported by the original studies. It is probable that methodological inconsistences across studies could influence these estimates. It is recommended that future research strive for methodological consistency in the reporting of reliabilities. This would facilitate increased accuracy in cross-cultural comparisons in future studies on reliability generalization. The inclusion of studies using different samples has the potential to lead to heterogeneity. This can limit the ability to derive more specific conclusions about certain subpopulations. It is recommended that future research examine the reliability of the PSI within specific populations. Not all studies that were included reported comprehensive demographic data, which limits the ability to explore the predictors of generalizability. Future studies need to consistently report key variables including sample characteristics and methods of administration. This would enable greater accuracy in research using RG meta-analysis
Conclusion
The results of this reliability generalization analysis offer valuable contributions to the body of knowledge surrounding the PSI and its subscales. This reliability generalization meta-analysis confirms the strong internal consistency of the PSI and its subscales, particularly the total scale, PSC, and AAS, with most studies reporting alpha coefficients above .80. In contrast, the PC subscale showed more variability, with a notable proportion of coefficients falling below .70. Key predictors of reliability included sample characteristics (e.g., standard deviation, mean age, and sample type for PSI and PSC; language and age for PC; and standard deviation, gender composition, and sample size for AAS). These findings underscore the importance of contextual and demographic factors in shaping score reliability and highlight the need for researchers to empirically assess reliability in each new application. The study offers a comprehensive understanding of measurement consistency across diverse settings and provides a roadmap for future use, adaptation, and interpretation of the PSI.
Supplemental Material
sj-docx-1-sgo-10.1177_21582440251361978 – Supplemental material for Reliability Generalization of the Problem Solving Inventory: A Meta-Analysis of Cronbach’s Alpha With a Varying-Coefficient Model
Supplemental material, sj-docx-1-sgo-10.1177_21582440251361978 for Reliability Generalization of the Problem Solving Inventory: A Meta-Analysis of Cronbach’s Alpha With a Varying-Coefficient Model by Tyrone B. Pretorius and Anita Padmanabhanunni in SAGE Open
Footnotes
Ethical Considerations
Ethics do not apply.
Consent to Participate
Informed consent does not apply.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
All data generated or analyzed during this study are included in this published article and its Supplemental Information Files.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
