Sage Journals: Discover world-class research

Abstract

The current study sought out to advance the Social Self-Efficacy and Social Outcome Expectations scale using multiple approaches to scale development. Data from 583 undergraduate students were used in two scale development approaches: Classic Test Theory (CTT) and Item Response Theory (IRT). Confirmatory factor analysis suggested a 2-factor structure that aligns with the theoretically based domains for SEOES items and supports previously proposed models of this scale from CTT and psychometric analyses. The IRT analysis indicated that the SEOES items have greater measurement precision at measuring lower levels of the latent constructs. Future research directions are provided and practice implications are discussed.

Keywords

social self-efficacy outcome expectations scale development item response theory classic test theory

Further Validation of the Social Efficacy and Social Outcome Expectations Scale

Bandura’s (1977, 1997) self-efficacy theory is widely researched across various domains, such as social relationships (Smith & Betz, 2000; Wright, Wright et al., 2013), career decision-making (Betz & Taylor, 2001), coping (Chesney et al., 2006), and academics (Robbins et al., 2004) to name just a few. Bandura’s concept of self-efficacy has also been instrumental in guiding theory development (Lent et al., 1994). Bandura (1997) articulated self-efficacy as individuals’ cognitive judgments related to their belief about their skills and abilities in a specific domain. Utilizing this definition, researchers seek out ways to determine how self-efficacy in a domain influences various aspect of individuals’ lives such as their overall life satisfaction (Wright & Perrone, 2010), academic success and persistence (Robbins et al., 2004; Wright, Jenkins-Guarnieri et al., 2013), (Wright, Jenkins-Guarnieri et al., 2013), career success (Spurk & Abele, 2014), and many other areas (Brown & Lent, 2016).

An area of self-efficacy that is of growing interest related to scale development is social self-efficacy (Wright, Wright et al., 2013). Sherer et al. (1982) discussed the importance of measuring self-efficacy and developed the Self-Efficacy Scale (SES) that consisted of two subscales: general self-efficacy and social self-efficacy. Subsequently, Smith and Betz (2000) continued to explore the concept of social self-efficacy and develop the Perceived Social Self-Efficacy scale (PSSE). Both the SES (Sherer et al., 1982) and PSSE (Smith & Betz, 2000) are limited to only examining one of the two theorized domains of self-efficacy, which include self-efficacy expectations and outcome expectations according to Bandura (1977, 1997). Others have published scales examining both self-efficacy and outcome expectations, but in the context of motivation. For example, Pintrich et al. (1991; 1993) developed the Motivated Strategies for Learning Questionnaire (MSLQ) that examines 15 domains, including a motivational subscale that focuses on self-efficacy and performance expectations. Although this comprehensive multi-dimensional scale is related to academic learning and performance for college students, it does not measure aspects of an individual’s cognitive beliefs in their abilities in relationships or their outcome expectations from engaging in relationships.

Responding to this need for a scale based on both areas of Bandura’s Self-Efficacy Theory (Bandura, 1977, 1997) that would allow for a more accurate representation of the constructs (DeVellis, 2012), Wright, Wright et al. (2013) developed the Social Self-Efficacy and Outcome Expectations scale (SEOES) that included both facets of self-efficacy theory. This original scale examines individuals’ perceived confidence in their abilities in social relationships (i.e., social self-efficacy) and the expected outcomes as a result from their behavior in relationships (i.e., outcome expectations; Wright, Wright et al., 2013). The scale has been previously translated and used internationally (e.g., Turkish; Akin & Akkaya, 2015) and it is our hope that a more precise measure of the construct will be beneficial for international scholars and practitioners as they examine social self-efficacy and outcome expectations. The original scale, as well as the aforementioned social self-efficacy scales' measurement precisions were limited due to being developed using Classic Test Theory (CTT) (i.e., exploratory and confirmatory factor analysis) and may be improved using more current scale development techniques, such as Item Response Theory (IRT). Therefore, we decided to further validate the SEOES by implementing both confirmatory factor analysis and IRT techniques. We also followed the recommended practices and approach outlined by Worthington and Whittaker (2006) to ensure the methodological rigor of our factor analyses. We utilized IRT analyses to examine the SEOES in order to determine the effectiveness that the scale’s items have in measuring the full spectrum of the latent variables of social self-efficacy and social outcome expectations; thus, being able to identify individuals that have a higher degree of the constructs, as well as at lower levels of the constructs.

Measurement Approaches

Both exploratory and confirmatory factor analyses are guided by the principles of Classic Test Theory. CTT is based on the premise that a group of items can be combined together to represent an overall latent construct (DeVellis, 2012) that is specific to the measure (Reise et al., 2005), and focuses on the properties of an entire scale (Mallinckrodt et al., 2016). The individuals’ scores are theorized to represent their true score and random error that consist across all items (DeVellis, 2012). This classic test method has dominated the measurement of social and psychological constructs (DeVellis, 2012; Reise et al., 2005). However, there is currently a strong movement to develop scales using methods beyond CTT, such as approaches that implement Item Response Theory (Harvey, 2016).

According to the IRT method, each individual item represents the characteristics of the latent variable, whereas the classic test method assumes that the latent variable affects all items equally (DeVellis, 2012). Using IRT, the relationship between response on a scale item and the underlying latent variable along its continuum is represented by the item-characteristic curve (ICC) (DeVellis, 2012). The slope of the ICC and its location along the continuum reflect how well the item differentiates individuals at different levels of the construct assessed (Harvey, 2016). Specifically, the x-axis indicates the continuum of the latent construct based on a mean of 0 and standard deviation of 1 (θ, Theta), and the y-axis indicates the probability of endorsement (Thomas, 2011). For example, an ICC with robust slope for an item at the lower range of θ (e.g., below 0) would suggest that the item is able to better assess the construct at the lower end of the construct’s continuum. Each individual item’s ICC can be examined in this way. The Category Response Curve (CRC) can also be used when examining the multiple response categories (e.g., Likert responses) for an item by determining the probability of endorsing each response category of that item, and ideally the item’s CRC will have distinct peaks (Nguyen et al., 2014).

The ability to analyze item-level parameters and each item’s functioning along the spectrum of a latent construct being assessed provides additional information beyond the classic test theory approaches, such as confirmatory factor analysis. Another useful concept in IRT is the item information function (IIF). The IIF indicates the degree of measurement precision or amount of information provided across the trait spectrum of the construct (Mallinckrodt et al., 2016). Increases in IIF are similar to higher degrees of reliability and lower levels of standard error of measurement from a CTT perspective (Thomas, 2011). Therefore, a collection of items measuring a similar area of the latent construct will help increase precision of this aspect of the construct (Mallinckrodt et al., 2016). Similarly, items that have higher discrimination values are able to better differentiate between people at different levels of the construct (Thomas, 2011). Thus, a strength of the IRT approach allows researchers to establish if the measure’s items are more sensitive at determining individuals that score at the low or high end of the latent continuum (Mallinckrodt et al., 2016); a more detailed discussion of the specific IRT principles used in the current study are further described in the method section.

Purpose of the Study

The present study sought to understand how the SEOES’s items function along the continuum of social self-efficacy and social outcome expectations among individuals. As recommended by Mallinckrodt et al. (2016), we fist used CFA procedures and then implemented IRT. This approach allowed for us to first confirm the overall factor structure and then examine how well each individual item represented the overall latent variables. Accordingly, our purpose of the study was to provide a more refined understanding of the SEOES (Wright, Wright et al., 2013) that may allow future researchers, educators, and practitioners to more accurately identify individuals’ levels of social self-efficacy and outcome expectations along the latent continuum beyond the current scales that are used (e.g., Sherer et al., 1982; Smith & Betz, 2000). It is our hope that further examining the SEOES will also provide researchers with a well validated measure of the constructs of social self-efficacy and social outcome expectations based on Bandura’s (1977, 1997) self-efficacy theory that can be readily translated and used internationally. This may help to inform research directions and the design of psychological and educational interventions.

Methods

Participants and Procedures

The study’s sample was comprised of 389 female and 194 male undergraduate college students from a medium size university in the Rocky Mountain region of the United States. The university offers more than 200 undergraduate and graduate programs. The mean age of the sample was 19 and a large portion of the participants were freshman (67%) followed by sophomores (19%), juniors (11%), and seniors (4%). The majority of participants were White (74%) and approximately half of the sample were not in a dating relationship. Once the first author’s institution granted IRB approval, participants were recruited from undergraduate psychology courses and were part of a larger scale study that focused on career development and thus completed multiple instruments; data from the current study using the SEOES (Wright, Wright et al., 2013) measure has not been previously analyzed and was only used in the current study. If needed for class, research credit for participation was awarded to the participants. Data were collected using the first author’s university online survey collection tool. A link to the study’s survey was provided online to all undergraduate psychology students.

Instrument

The Social Efficacy and Social Outcome Expectations scale (SEOES; Wright, Wright et al., 2013) was used in the present study. The original SEOES version consists of 18 total items comprised of two subscales: Social Efficacy subscale (SE; 12 items) and Outcome Expectations (OE; 6 items). Participants respond to items using a 5-point Likert type scale ranging from 1 (strongly disagree) to 5 (strongly agree) and higher scores indicate a greater degree of endorsement of the construct. Using a sample of undergraduate students, Wright, Wright et al. (2013) determined Cronbach’s alpha coefficients based on the subscales’ scores were strong for both the Social Efficacy subscale (α = .965) and the Outcome Expectations subscale (α = .913). Cronbach’s alpha reliability estimates for the Social Efficacy and the Outcomes Expectations subscales calculated from our full sample’s data were .936 and .882, respectively (total score α = .942). The Pearson correlation coefficient between the subscales was .641. The items on the SE subscale had a mean score of 4.1155 and a standard deviation of .62,179 (standard error of mean = .02,575); the combined total mean score was 49.3859 (SD = 7.46,149; standard error of mean = .30,902). The items on the OE subscale had a mean score of 4.2461 and a standard deviation of .60,785 (standard error of mean = .02,517); the combined total mean score was 25.4768 (SD = 3.64,711 standard error of mean = .30,902). Convergent validity of the SEOES was established based on the high correlations between the SEOES and the Perceived Social Self-Efficacy scale (Smith & Betz, 2000), which is a scale that also measures social self-efficacy.

Data Analyses

All analyses were conducted using R software (Release 3.0.2; R Core Team, 2013). Due to R being open source software, we confirmed our analyses using Stata (version 14.1; StataCorp, 2015). We began by calculating descriptive and psychometric statistics, including internal consistency reliability estimates, using the Psychometric package (Fletcher, 2010). Then we designed a confirmatory factor analysis (CFA) on the two-factor SEOES model from Wright, Wright et al.’s (2013a) original scale development research; analyzed the CFA model using the lavaan package (Version .5–16; Rosseel, 2012) that was programmed to mimic Mplus statistical algorithms. Item response data were evaluated for normality using guidelines for skew (<|2|) and kurtosis (<|7|) statistics (Kline, 2011), and we decided to treat the Likert-type data as continuous and utilized Maximum Likelihood (ML) estimation given the relative normality in our data along with a Satorra–Bentler χ² (Satorra & Bentler, 1988) for a robust method (DiStefano, 2002). We evaluated multiple global indicators of fit and misfit for models following recommendations (Beauducel & Wittmann, 2005; Hu & Bentler, 1999; Kline, 2011), utilizing the following cutoff guidelines: CFI and NNFI close to .95, SRMR close to .08, and the RMSEA close to .06 (Beauducel & Wittmann, 2005; Hu & Bentler, 1999). We also evaluated component fit by inspecting unstandardized parameter estimates for statistical significance as well as plausibility (e.g., direction and strength of relationship) given our hypothesized model and the original scale development research (Kline, 2011). Our sample size of 583 was considered sufficient for the CFA based on recommended minimum guidelines of 10:1 case to estimated parameter ratios (Jackson, 2003; Kline, 2011). We analyzed the two-factor model with 18 items loading into two co-varying factors of Social Efficacy (SE; 12 items) and Outcome Expectations (OE; 6 items) (see Table 1). All residuals were treated as uncorrelated and estimated as part of the model, and we set the metric by restricting the first indicator of each factor to unity (Kline, 2011). We also planned to utilize modification indices (MI) along with item R² statistics for any respecifications that appeared warranted; any subsequent nested models were evaluated using modified chi-squared difference tests (Kline, 2011).

Table 1.

SEOES Scale Items, Descriptive Statistics, and IRT Parameters.

Item	Mean	SD	ITC	α (std. err.)	β _i1	β _i2	β _i3	β _i4
1^a	4.20	.81	.83	3.90 (.30)	−2.92	−1.78	−1.33	.21
2^a	4.16	.81	.84	4.19 (.32)	−2.88	−1.79	−1.22	.27
3^a	4.05	.90	.75	2.04 (.14)	−3.45	−1.93	−1.13	.45
4^a	4.16	.81	.81	3.29 (.24)	−3.25	−1.92	−1.22	.28
5^a	3.86	.98	.67	1.61 (.11)	−3.15	−1.83	−1.03	.85
6^a	4.18	.73	.84	4.09 (.33)	−2.76	−2.01	−1.42	.38
7^a	4.10	.83	.74	2.31 (.16)	−3.11	−1.99	−1.32	.49
8^a	4.14	.67	.67	1.80 (.14)	−4.31	−2.95	−1.75	.75
9^a	4.18	.76	.78	2.84 (.20)	−3.69	−2.23	−1.28	.31
10^a	4.15	.78	.72	2.18 (.15)	−3.70	−2.38	−1.35	.38
11^a	4.10	.84	.71	1.77 (.13)	−3.45	−2.37	−1.39	.52
12^a	4.09	.79	.83	3.24 (.24)	−2.91	−2.00	−1.27	.50
13^b	4.25	.78	.76	2.11 (.17)	−3.16	−2.52	−1.52	.21
14^b	4.16	.86	.83	2.83 (.23)	−2.75	−2.01	−1.13	.21
15^b	4.36	.70	.81	3.17 (.27)	−3.11	−2.44	−1.57	−.01
16^b	4.36	.72	.80	3.28 (.29)	−3.10	−2.31	−1.54	−.02
17^b	4.34	.68	.81	3.26 (.28)	−3.09	−2.49	−1.63	.07
18^b	4.00	.84	.74	1.89 (.14)	−3.35	−2.35	−1.06	.67

Note. N = 583. See Appendix for item text. SD = standard deviation; ITC = item-total correlation. α = coefficient; values listed under α are discrimination parameters and the standard errors are reported in parentheses; β = estimates of the location/difficulty parameters for each item/Likert scale response threshold.

^aSocial efficacy expectations subscale (items 1–12).

^bSocial outcome expectations subscale (items 13–18).

We then employed IRT analyses to further evaluate item properties and functioning in more detail using another approach to scale development that has been shown to compliment Classical Test Theory’s (CTT) psychometric and CFA procedures; this practice has also been supported by others (e.g., Mallinckrodt et al., 2016). IRT conceptualizes the latent trait being measured by scale items as a continuum spanning the full range of the construct under examination, for example, ranging from very low levels of social efficacy to very high levels of this trait. This continuum is commonly labeled Theta, represented by the θ symbol, and is set up like Z scores with a mean of 0, SD of 1, and a theoretically normal distribution (DeMars, 2010). The level of this trait that a specific person manifests can be thought of as a person’s location on the θ spectrum and will determine the likelihood of that person selecting a measurement scale item or an item’s specific Likert scale response category, and IRT analyzes these relationships between item response patterns and latent trait levels (Ostini & Nering, 2006). Thus, items in IRT are positioned on the θ continuum as well, as each item can provide information about different ranges of the latent trait spectrum and are therefore assigned a location parameter on θ, symbolized as β (DeMars, 2010). For example, a person with very low social efficacy will have a very low θ level, and therefore will be very unlikely to respond affirmatively (e.g., “True” response vs. “False”) to a social efficacy scale item with a β location in the middle to upper θ range that best captures moderately high to higher levels of social efficacy (e.g., an item reading “I am confident in my abilities to communicate effectively in conversations”). Scale items are also assigned another parameter, often labeled the discrimination parameter α, that indicates “how well an item discriminates among people along the trait continuum” or an item’s ability to “tell people apart with respect to the amount of a trait that they have” (Ostini & Nering, 2006, p. 4). In stark contrast to CTT, IRT parameters are conceptualized theoretically as invariant, in that they are not sample-specific and remain the same even in different samples of people because they are properties of the items (DeMars, 2010). However, in practice, there are often estimation errors due to differences in the samples used to estimate item parameters (DeMars, 2010).

Given the ordinal nature of our Likert-type scale data, we decided to apply the Graded Response Model (GRM) (Samejima, 1969) for IRT modeling of each SEOES subscale using the ltm package (Rizopoulos, 2006) which employs a marginal maximum likelihood estimation (MMLE) method (21 integration points; non-adaptive Gauss–Hermite quadrature). This GRM model treats ordered responses as a series of dichotomous choices separated by boundaries or thresholds (Ostini & Nering, 2006), for example, 4 thresholds (labeled j) between the 5 choices in a 5-point Likert-type response scale. A threshold represents the θ continuum location where respondents have an equal probability of choosing the two adjacent response options (DeMars, 2010). In this way, these ordered categories reflect a person’s decision-making involving “a cumulative process of successively accepting and then rejecting categories, where rejecting a category is defined as being more attracted to the next category, until a category is reached where the probability of attraction is greater than the probability of rejection” (Ostini & Nering, 2006, p. 63). Thus, location parameters are modeled separately for each of these thresholds (β_ij) based on the response categories for each item, along with the item’s α parameter. Items with a larger range of thresholds cover a greater amount of item difficulty because they are able to more accurately predict if the construct level is correctly identified by the item response. Items and their thresholds can be evaluated graphically by creating Category Response Curves (CRCs), also known as Operating Characteristic Curves, that reflect the probability of choosing a Likert scale response category as a function of θ level with location and shape determined by item parameters (α_i, β_ij) (Embretson & Reise, 2000), see Figure 1. Items and subscales can also be evaluated graphically through information functions, which reflect measurement precision and how it varies across the θ spectrum, and information is inversely related to the standard error of measurement which can vary as a function of θ as well (DeMars, 2010), see Figure 2. We considered the preceding CFA analyses as sufficient tests of the unidimensionality assumption for each set of subscale items underlying IRT (DeMars, 2010), and our sample size exceeded the 500 minimum recommended by (Reise & Yu, 1990). We also examined indicators of overall IRT model fit by analyzing residuals based on two-way and three-way combinations of responses with a general cutoff guideline of 4 (Bartholomew & Tzamourani, 1999).

Figure 1.

Outcome expectations subscale items category response curves.

Figure 2.

Item and test information functions for the social efficacy and outcome expectations subscales.

Results

For the CFA analysis, global indicators suggested adequate overall fit to the data, χ²_S-B (134) = 430.33, CFI_S-B = .93, NNFI_S-B = .92, SRMR = .049, RMSEA_S-B = .062 (90% CI .057–.067), and component fit also appeared reasonable with statistically significant model parameters that fell within expectations for direction and strength of relationship (see Table 2). The model χ² was statistically significant indicating misfit, however, this indicator has been shown to be highly sensitive to multiple factors in models and so we also utilized Iacobucci’s (2010) guideline of χ²/df < approximately 3 and our results were close to this value (430.33/134 = 3.21).

Table 2.

Unstandardized and Standardized Coefficients for the Final 18-Item Model.

Item	Latent construct; Subscale	B	SE	β	R²	Standardized Error Variances
1 1SE	1	1	—	.840	.706	.294
2 1SE_A	1	1.019*	.039	.856	.732	.268
3 1SE_B	1	.932*	.048	.702	.493	.507
4 1SE_C	1	.970*	.040	.813	.661	.339
5 1SE_D	1	.862*	.055	.597	.356	.644
6 1SE_E	1	.913*	.035	.845	.714	.286
7 1SE_F	1	.871*	.044	.716	.513	.487
8 1SE_G	1	.614*	.037	.626	.392	.608
9 1SE_H	1	.857*	.039	.769	.591	.409
10 1SE_I	1	.804*	.041	.699	.489	.511
11 1SE_J	1	.815*	.046	.658	.434	.566
12 1SE_K	1	.961*	.039	.826	.683	.317
13 2OE	2	1	—	.716	.513	.487
14 2OE_A	2	1.191*	.068	.771	.594	.406
15 2OE_B	2	1.014*	.056	.800	.641	.359
16 2OE_C	2	1.009*	.056	.783	.612	.388
17 2OE_D	2	.968*	.054	.786	.617	.383
18 2OE_E	2	.976*	.066	.647	.418	.582
F1-F2	—	.261*	.024	.692	—	—

Note. * indicates p < .01. Latent construct 1 = “Social Efficacy Expectations”; Latent construct 2 = “Social Outcome Expectations”.

For IRT analyses utilizing the GRM, the unconstrained models demonstrated improved fit compared to constrained models for both the self-efficacy (SE) item subset, χ² (11) = 234.06, p < .001, and the outcome expectations (OE) item subset, χ² (5) = 50.73, p < .001, and so unconstrained models were selected. Our analysis of model fit for the SE model indicated multiple residuals (over 10) exceeding the recommended cutoff for pairs and triplets of items, while the OE model had only 3 two-way margin residuals exceeding 4 but 12 three-way margin residuals exceeding this cutoff. It is important to consider that the χ² statistic in model fit applications can be especially sensitive to sample size (LaHuis et al., 2011), the SE model contained a relatively high number of items, and our sample size can be deemed relatively small by IRT standards. Despite these caveats, our results suggested that the two models may not demonstrate adequate fit, with the SE model exhibiting poorer fit than the OE model. Thus, we interpreted these models with caution, but we are able to utilize the IRT results for descriptive purposes related to the aspects of the latent continuum most represented and for comparison with future research.

The results from our IRT analyses suggested a wide spread for the β parameters along θ for all items in both subscales (see Table 1), with lowest category thresholds from −4.31 (β_8,1) to −2.75 (β_14,1) and the highest category thresholds reaching from −.02 (β_16,4) up to .85 (β_5,4). The Category Response Curves (CRC, Figure 1) for items and Test Information Functions (TIF, Figure 2) for the two subscales graphically reflect that β values were primarily located in the negative end of the latent trait spectrum, with very few positive β values (see Table 1) and category response curves (CRC) on the positive end of the latent continuum (see Figure 1). These results suggest that scale items best capture lower levels of social efficacy and outcome expectations with diminishing precision and information provided at increasing θ levels and thus higher levels of these traits. The larger range of estimates suggests the items have good ability to discriminate the differences between individuals at these lower ranges of the traits’ spectrums. The information functions shown in Figure 2 demonstrate how the information captured by items varies by θ, as items functioned better with lower levels of these latent traits and thus captured more information in this area of the trait spectrums. Furthermore, the information functions for some items and the subscales also show a prominent drop in information provided centered just below 0 on the latent trait spectrum, and reflect increased error and decreased accuracy of information in measuring this range of the spectrum for these latent traits. For example, the OE TIF (see Figure 2) reflects good item functioning from about −3 to −1.5 and from −.5 to .5 along θ. Additionally, item information curves which create the Test Information Function (TIF) demonstrated no significant overlap with other items across the latent trait spectrum, and suggested no problematic redundancy in information provided by subscale items. Analyzing the CRCs closely for the outcome expectations subscale (see Figure 1) suggested generally well-functioning response options; however, item 13 shows a response curve for option 2 that is almost completely subsumed under the 1 and 3 category response curves and suggested that response choice 2 may not have functioned effectively. A similar analysis of the self-efficacy items suggested the 1 category for items 8, 9, and 10 functioned ineffectively, as it was subsumed under the 2 category and thus did not operate distinctly from its adjacent category. Item slope parameters (α) could be considered generally very high at above 1.70 with one value in the high range (1.35–1.69) at 1.61 (Item 5) (Baker, 2001). Based on the discrimination parameters, β values, category response curves, and item information curves, our results suggested that most items were relatively strong in their ability to discriminate between people of different social efficacy and outcome expectation levels, with improved precision at measuring lower levels of the latent traits.

Discussion

The results from our study suggest that the SEOES continues to maintain the same two-factor structure as originally proposed and previously demonstrated (Wright, Wright et al., 2013), which has also been replicated in a translated version of the scale internationally (Turkish; Bakioglu & Turkum, 2017). Based on the confirmatory factor analysis results, both subscales were structurally represented by adequate model fit (Kline, 2011). These Classical Test Theory methods were supplemented by Item Response Theory (IRT) analyses, which is a more current test theory approach for scale development (Reise et al., 2005). According to IRT, our results suggested that most items were relatively strong in their ability to discriminate between people of different social efficacy and outcome expectation levels. Specifically, the larger range of estimates we found (see β values in Table 1) indicates the items have a better ability to discriminate the differences between any two individuals by correctly predicting the probability of endorsing the different levels of the construct. Based on test information functions (TIF, see Figure 2), the scale’s items have greater precision and provide more information at lower levels of perceived social self-efficacy and social outcome expectations, with lower performance at higher levels of these constructs.

These findings suggest that the current scale can be most effective at assessing individuals with lower levels of social self-efficacy and social outcome expectations and is less precise at measuring the more positive aspects of the latent variable (i.e., higher self-efficacy and outcome expectations). Researchers may want to consider our findings when using the scale, depending on the type of research questions they are investigating. As with our scale, other scales developed using IRT have difficulty in measuring positive aspects of the latent variables and more accurately measure lower ends of the continuum. For example, the Experience in Close Relationships – Revised scale (Fralely et al., 2000) has more difficulty assessing higher levels of the latent variable secure attachment and is more precise at assessing lower aspects of insecure attachment. Future research could explore the possible reasons related to the items having more difficulty at assessing positive aspects of social self-efficacy and outcome expectations (e.g., item wording).

Practice Implications

The items from the SEOES can be very useful at accurately measuring lower aspects of social self-efficacy and outcome expectations. Although there is less precision with this scale at identifying higher levels of social self-efficacy, scale results can be used to help identify individuals who may benefit from increasing their social self-efficacy. Given the broad range of interventions that can be designed to increase self-efficacy, as well as Bandura’s (1997) four areas that influence self-efficacy (i.e., performance attainments, vicarious learning, physiological effects, and social persuasion), we believe educators and psychologists can utilize the SEOES to identify individuals that have lower social self-efficacy and social outcome expectations and design interventions accordingly. For example, Thompson and Dahling (2012) found that learning experiences (based on Bandura’s four sources of self-efficacy) directly influence self-efficacy and outcome expectations. Therefore, psychologists and educators could help individuals engage in positive learning experiences by identify others that demonstrate strong social self-efficacy skills and encourage them to observe and recognize (i.e., vicarious learning) the characteristics related to others with strong social self-efficacy. According to Bandura’s theory (1997), engaging in this positive learning experience of vicarious learning will increase their social self-efficacy. In turn, this may be very beneficial for educators to consider, given the strong associations between social support and prosocial behaviors, as well as positive educational outcomes (Heerde & Hemphill, 2018).

Practitioners may also decide to utilize the SEOES to assist in their understanding related to the effectiveness of their work with clients. Based on the change of standard deviation scores, practitioners can reference classification criteria that have been identified (i.e., recovered, remitted, improved, deteriorated; Wise, 2004). Clinicians can utilize the reliable change index (RCI) formula to determine if there is a reliable change in their clients’ scores based on the pre-test and post-test scores. This information can be used to consider if the change is clinically significant, and these methods could be applied to outcome measurement using the SEOES. Wise (2004) provides a thorough overview and formulas related to effective methods to analyze psychotherapy outcomes (i.e., determining clinical significance and calculating RCI).

Limitations and Future Directions for Research

The generalizability of our results presents a limitation to our study. The sample consisted largely of White female undergraduate participants. Using this sample, our confirmatory factor analysis results demonstrated an adequate model fit. However, the model fit from the original development sample used was slightly improved (Wright, Wright et al., 2013). Although measurement invariances are unlikely based on individuals with similar demographics, our study was limited based on the current sample and future researchers could examine our scale with a more diverse sample to test for possible measurement invariances (Millsap & Olivera-Aguilar, 2012). Our study had an adequate number of participants to perform CFA (Kline, 2011), but the sample was relatively small for scale development procedures based on IRT using a graded model (Reise & Yu, 1990) and researchers should consider this limitation.

Another limitation to our study emerged from the results based on the IRT analysis. According to our IRT findings, the SEOES items do not appear to be as precise at measuring the higher levels of social self-efficacy in comparison to measuring lower levels of the construct. It is not uncommon to have an unequal distribution of accuracy at assessing the latent variable (Fraley et al., 2000). Subsequently, future researchers may want to explore scale development procedures by modifying the SEOES items in ways that may help assess higher levels of social self-efficacy.

We used a graded response model (GRM) because we were examining categorical Likert-type data (StataCorp, 2015), and the GRM approach is an extension of a two-parameter logistic model (2PLM; StataCorp, 2015; Fraley et al., 2000). However, there are other types of IRT models that could be utilized based on the purpose of the scale being developed, and future researchers may want to use these other models when appropriate. For example, a 2PL model could focus on examining item difficulty (Thomas, 2011), whereas a 3PL model helps account for guessing (Thomas, 2011) and a 4PL can address careless responding (Mallinckrodt et al., 2016).

Conclusions

The current study sought out to further validate the SEOES (Wright, Wright et al., 2013) using both classic test theory and item response theory approaches. The results support the two-factor structure of the SEOES in a sample of emerging adults with the two theoretically based subscales assessing social self-efficacy and social outcome expectations. However, IRT analysis found that the SEOES has better measurement precision at identifying lower levels of these latent constructs. Therefore, the SEOES may be more appropriate when researchers, educators, and practitioners are interested in assessing individuals who are concerned about lower levels of social self-efficacy.

Footnotes

Acknowledgements

We would like to thank Dorothy Wright for her contribution to the original scale development and for her thoughts on this manuscript.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Stephen L. Wright

Michael A. Jenkins-Guarnieri

Appendix

References

Akin

Akkaya

(2015). The validity and reliability study for the Turkish version of the social efficacy and social outcome expectations scale. Journal of Faculty of Education, 4(1), 204–213). https://doi.org/10.14686/BUEFAD.2015111025

Baker

F. B.

(2001). The basics of item response theory. In ERIC clearinghouse on assessment and evaluation (2nd ed.). http://edres.org/irt/baker/

Bakioglu

Sibel

(2017). Psychometric properties of adaptation of the social efficacy and outcome expectations scale to Turkish. European Journal of Educational Research, 6(2), 213-223. https://doi.org/10.12973/eu-jer.6.2.213

Bandura

(1977). Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review, 84(2), 191-215. https://doi.org/10.1037//0033-295x.84.2.191

Bandura

(1997). Self-efficacy: The exercise of control. Freeman.

Bartholomew

D. J.

Tzamourani

(1999). The goodness of fit of latent trait models in attitude measurement. Sociological Methods and Research, 27(4), 525-546. https://doi.org/10.1177/0049124199027004003

Beauducel

Wittmann

W. W.

(2005). Simulation study on fit indexes in CFA based on data with slightly distorted simple structure. Structural Equation Modeling: A Multidisciplinary Journal, 12(1), 41-75. https://doi.org/10.1207/s15328007sem1201_3

Betz

N. E.

Taylor

K. M.

(2001). Manual for the career decision self-efficacy Scale and CDMSE–short form. Ohio State University, Department of Psychology.

Brown

S. D.

Lent

R. W.

(2016). Vocational psychology: Agency, equity, and well-being. Annual Review of Psychology, 67, 541-565. https://doi.org/10.1146/annurev-psych-122414-033237

10.

Chesney

M. A.

Neilands

T. B.

Chambers

D. B.

Taylor

J. M.

Folkman

(2006). A validity and reliability study of the Coping Self-Efficacy scale. British Journal of Health Psychology, 11(Pt 3), 421-437. https://doi.org/10.1348/135910705X53155

11.

DeMars

(2010). Item response theory. Oxford University Press.

12.

Devellis

. (2012) Scale Development Theory and Applications (3rd ed.). Sage, New York.

13.

DiStefano

(2002). The impact of categorization with confirmatory factor analysis. Structural Equation Modeling: A Multidisciplinary Journal, 9(3), 327-346. https://doi.org/10.1207/s15328007sem0903_2

14.

Embretson

S. E.

Reise

S. P.

(2000). Item response theory for psychologists. Lawrence Erlbaum Associates.

15.

Fletcher

T. D.

(2010). Psychometric: Applied psychometric theory. R package version 2.2. http://CRAN.R-project.org/package=psychometric

16.

Fraley

R. C.

Waller

N. G.

Brennan

K. A

. (2000). An item response theory analysis of self-report measures of adult attachment. Journal of Personality and Social Psychology, 78, 350–365. https://doi.org/10.1037//0022-3514.78.2.350

17.

Harvey

R. J.

(2016). Improving measurement via item response theory: Great idea, but hold the rasch. The Counseling Psychologist, 44(2), 195-204. https://doi.org/10.1177/0011000015615427

18.

Heerde

Hemphill

S. A.

(2018). Examination of associations between informal help- seeking behavior, social support, and adolescent psychosocial outcomes: A meta-analysis. Developmental Review, 47, 44-62. https://doi.org/10.1016/j.dr.2017.10.001

19.

Bentler

P. M.

(1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6, 1-55. https://doi.org/10.1080/10705519909540118

20.

Iacobucci

(2010). Structural equations modeling: Fit indices, sample size, and advanced topics. Journal of Consumer Psychology, 20(1), 90-98. https://doi.org/10.1016/j.jcps.2009.09.003

21.

Jackson

D. L.

(2003). Revisiting sample size and number of parameter estimates: Some support for the N:Q hypothesis. Structural Equation Modeling: A Multidisciplinary Journal, 10(1), 128-141. https://doi.org/10.1207/s15328007sem1001_6

22.

Kline

R. B.

(2011). Principles and practice of structural equation modeling (3rd ed.). Guilford Press. www.csa.com

23.

LaHuis

D. M.

Clark

O’Brien

(2011). An examination of item response theory item fit indices for the graded response model. Organizational Research Methods, 14(1), 10-23. https://doi.org/10.1177/1094428109350930

24.

Lent

R. W.

Brown

S. D.

Hackett

. (1994). Toward a unifying social cognitive theory of career and academic interest, choice and performance [Monograph]. Journal of Vocational Behavior, 45, 79–122.

25.

Mallinckrodt

Miles

J. R.

Recabarren

D. A.

(2016). Using focus groups and rasch item response theory to improve instrument development response theory to improve instrument development. The Counseling Psychologist, 44(2), 146-194. https://doi.org/10.1177/0011000015596437

26.

Millsap

R. E.

Olivera-Aguilar

(2012). Investigating measurement invariance using confirmatory factor analysis. In Hoyle

R. H.

(Ed), Handbook of structural equation modeling (pp. 380-392). Guilford Press.

27.

Nguyen

T. H.

Han

H. R.

Kim

M. T.

Chan

K. S.

(2014). An introduction to item response theory for patient-reported outcome measurement. The patient, 7(1), 23-35. https://doi.org/10.1007/s40271-013-0041-0

28.

Ostini

Nering

M. L.

(2006). Polytomous item response theory models. Sage Publications.

29.

Pintrich

P. R.

Smith

D. A. F.

Garcia

McKeachie

W. J

. (1991). A manual for the use of the Motivated Strategies for Learning Questionnaire (MSLQ). Ann Arbor, MI: National Center for Research to Improve Postsecondary Teaching and Learning.

30.

Pintrich

Smith

Garcia

McKeachie

. (1993). Reliability and predictive validity of the Motivated Strategies for Learning Questionnaire (MSLQ). Educational and Psychological Measurement, 53, 167–199.

31.

R Core Team . (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing.

32.

Reise

Ainsworth

Haviland

(2005). Item response theory: fundamentals, applications, and promise in psychological research. Current Directions in Psychological Science Science, 14(2), 95-101. https://doi.org/10.1111/j.0963-7214.2005.00342.x

33.

Reise

S. P.

(1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27(2), 133-144. https://doi.org/10.1111/j.1745-3984.1990.tb00738.x

34.

Rizopoulos

(2006). Ltm: An R package for latent variable modeling and item response theory analysis. Journal of Statistical Software, 17(5), 1-25.

35.

Robbins

S. B.

Lauver

Davis

Langley

Carlstrom

(2004). Do psychosocial and study skill factors predict college outcomes? A meta-analysis. Psychological Bulletin, 130(2), 261-288. https://doi.org/10.1037/0033-2909.130.2.261

36.

Rosseel

(2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1-36. https://doi.org/10.18637/jss.v048.i02

37.

Samejima

. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, No 17.

38.

Satorra

Bentler

P. M.

(1988). Scaling corrections for chi-square statistics in covariance structure analysis. American Statistical Association, Proceedings of the Economics and Statistics Section, 308-313.

39.

Sherer

Maddux

J. E.

Mercandante

Prentice-Dunn

Jacobs

Rogers

(1982). The self-efficacy scale: Construction and validation. Psychological Reports, 51(2), 663-671. https://doi.org/10.2466/pr0.1982.51.2.663

40.

Smith

H. M.

Betz

N. E.

(2000). Development and validation of a scale of perceived social self-efficacy. Journal of Career Assessment, 8(3), 283-301. https://doi.org/10.1177/106907270000800306

41.

Spurk

Abele

A. E.

(2014). Synchronous and time-lagged effects between occupational self-efficacy and objective and subjective career success: Findings from a four-wave and 9-year longitudinal study. Journal of Vocational Behavior, 84(2), 119–132. https://doi.org/10.1016/j.jvb.2013.12.002

42.

StataCorp . 2015. Stata: Release 14. Statistical Software. College Station, TX: StataCorp LP.

43.

Thomas

(2011). The value of item response theory in clinical assessment: A review. Assessment, 18(3), 291–307. https://doi.org/10.1177/1073191110374797

44.

Thompson

Dahling

(2012). Perceived social status and learning experiences in social cognitive career theory cognitive career theory. Journal of Vocational Behavior, 80(2), 351–361. https://doi.org/10.1016/j.jvb.2011.10.001

45.

Wise

E. A.

(2004). Methods for analyzing psychotherapy outcomes: A review of clinical significance, reliable change, and recommendations for future directions. Journal of Personality Assessment, 82(1), 50–59. https://doi.org/10.1207/s15327752jpa8201_10

46.

Worthington

R. L.

Whittaker

T. A.

(2006). Scale development research: A content analysis and recommendations for best practices. The Counseling Psychologist, 34(6), 806–838. https://doi.org/10.1177/0011000006288127

47.

Wright

S. L.

Jenkins-Guarnieri

M. A.

Murdock

J. L.

(2013b). Career development among first year college students: College self-efficacy, student persistence, and academic success. Journal of Career Development, 40(4), 292–310. https://doi.org/10.1177/0894845312455509

48.

Wright

S. L.

Perrone

K. M.

(2010). An examination of the role of attachment and efficacy in life satisfaction. The Counseling Psychologist, 38(6), 796–823. https://doi.org/10.1177/0011000009359204

49.

Wright

S. L.

Wright

D. A.

Jenkins-Guarnieri

M. A.

(2013a). Development of the social efficacy and social outcome expectations scale. Measurement and Evaluation in Counseling and Development, 46(3), 218–231. https://doi.org/10.1177/0748175613484042