Sage Journals: Discover world-class research

Abstract

Efficient and intuitive interpretive frameworks for social-emotional learning (SEL) measures are necessary for identifying student needs and informing programming decisions across multitiered systems of support in schools. Though familiar to educators and often used with standardized tests of academic achievement, criterion-referenced frameworks are less common in SEL assessment. As such, the current study examined the psychometric evidence for scores from one such framework, the Competency-Referenced Performance Framework, which was developed to inform universal screening decisions based on the SSIS SEL Brief Scales (Elliott et al., 2020). Specifically, we evaluated stability, test-criterion relationships with academic outcomes, and treatment sensitivity of the CRPF using data from an efficacy trial of a universal SEL program. Results provided preliminary supportive evidence for the CRPF.

Keywords

social-emotional skills SSIS SEL brief scales criterion-referenced interpretation reliability and validity evidence

Social-emotional learning (SEL), the process of developing healthy self-identities, managing emotions, achieving goals, feeling empathy, maintaining relationships, and making caring decisions, is essential to children’s school success (Collaborative for Academic, Social, and Emotional Learning [CASEL], 2022). Social-emotional learning involves knowledge, skills, and attitudes relative to five core competencies (self-awareness, self-management, social awareness, relationship skills, and responsible decision-making), which can be taught via both formal and informal instruction in schools (CASEL, 2022). Intervention programs promoting SEL at the universal level (e.g., all students in a classroom or school) have been shown to foster positive student skills, behaviors, and attitudes and reduce emotional distress in both the short- and long-term (Durlak et al., 2011; Taylor et al., 2017).

In response to growing student mental health needs exacerbated by the COVID-19 global pandemic, the U.S. Department of Education has recommended multi-tiered systems of support (MTSS) as an organizing framework to guide the provision of evidence-based services in schools (U.S. Department of Education, 2021). Such systems emphasize prevention for all students universally (Tier 1) as well as early intervention for those who need additional targeted and individualized support at Tiers 2 and 3 (Hughes et al., 2020). Within such frameworks, students are moved along a continuum of SEL programming (i.e., universal/primary, secondary, and intensive/tertiary) based on skill strengths and needs, responsiveness to intervention, and consideration of risk and protective factors (Hines et al., 2022). To successfully use tiered systems for SEL allocation, schools require psychometrically sound measures to identify students in need of intervention and monitor progress (McKown, 2019). Even though several SEL measures are currently available, few are fully aligned with CASEL’s SEL framework or sufficiently brief for universal screening (Anthony et al., 2021).

Unfortunately, there are considerable challenges to implementing effective MTSS in real-world school settings, including the use and interpretation of assessment data to make valid decisions (Hagermoser Sanetti & Collier-Meek, 2019). In practice, school teams often have difficulty knowing how to employ data to screen for student needs, monitor student progress, and assess intervention effectiveness (Burns et al., 2008). Interpreting the results of screening and progress monitoring instruments and then using the information to align students with services can be a daunting and complex task, and errors in decision making can result in financial and time losses for schools and opportunity costs for students (VanDerHeyden et al., 2018).

Norm-referenced score interpretation (e.g., percentile ranks) is often used in screening and summative assessment of SEL (DiPerna et al., 2020), which relies on comparing a score to that of a reference group to make sense of performance (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME], 2014). In contrast, criterion-referenced scores, commonly used in state and other standards-based achievement tests, are interpreted relative to well-defined standards that have been previously established (Clifford, 2016; Neil et al., 1999). Proficiency categories (e.g., basic, proficient, and advanced), defined by performance of specific skills and/or behaviors, can be used by educators to understand an individual’s or group of students’ current level of skill performance, determine need for intervention, and evaluate movement across categories after exposure to evidence-based instruction (Gross et al., 2019). As such, criterion-referenced performance levels tend to be more intuitive to stakeholders and often preferred when communicating assessment results (e.g., Hart et al., 2020). Because student scores are not referenced to the performance of other students, but instead compared to a clearly defined set of expectations, criterion-referenced assessment can reveal that all students, some students, or no students meet predetermined standards, informing subsequent instructional decision-making (Clifford, 2016). Criterion-referenced scores can therefore provide useful feedback and summative information to students, families, and educators seeking to clearly understand if students are meeting desired learning outcomes and communicate the potential need for intervention (Lok et al., 2016).

In response to the growing demand for efficient and psychometrically sound measures of SEL skills, several brief SEL scales have been published during the past decade. For example, the Social Skills Improvement System–Social Emotional Learning Brief Scales (SSIS SEL Brief Scales; Elliott et al., 2020) are a family of multi-informant brief scales developed to assess skills aligned with the domains of the CASEL’s SEL framework (i.e., self-awareness, self-management, social awareness, relationship, and responsible decision-making). In addition to norm-referenced score interpretation, the SSIS SEL Brief Scales have a criterion-referenced interpretive framework, the Competency-Referenced Performance Framework (CRPF; Elliott et al., 2020). The CRPF categorizes the total scores of the SSIS SEL Brief Scales into four overall SEL competency levels (Emerging, Developing, Competent, and Advanced) based on frequency (never, seldom, often, and almost always) of performance relative to developmentally appropriate expectations from a learning progression perspective. Specifically, children in the Emerging level receive consistently low frequency ratings (never or seldom) on strength-focused items, and children in the Advanced level have consistently high ratings of almost always. Cut scores corresponding with these proficiency levels were established based on data from a large nationally representative sample of K-12 students as well as reviews by school professionals and SEL experts (Elliott et al., 2020). However, no studies to date have examined psychometric evidence for the CRPF levels using an independent sample.

The Standards for Educational and Psychological Testing (Standards; AERA, APA, & NCME, 2014) specify that evidence of reliability and validity of test scores be provided to support their proposed use. With respect to SEL assessment, the psychometric evidence for a measure should be evaluated with respect to whether the scores it provides will guide intended uses (e.g., screening) and assist in reaching conclusions about student SEL competency levels (Buros Center for Testing, 2020). Screening requires examining broad outcomes to assess levels of risk and identify student needs (Kettler et al., 2014). McKown (2019) suggested criteria for evaluating SEL assessments includes temporal stability for score reliability, correlations with other relevant variables, and evidence that students exposed to high-quality instruction improve more than a control group. Similarly, Gross et al. (2019) described scores from sound screening instruments as demonstrating evidence including reliability, responsiveness to change, and generalized utility. Maintaining technical rigor while maximizing efficiency and utility are important considerations for developing SEL screening assessments for use in research and practice (Kim et al., 2022).

For criterion-referenced score interpretation, reliability evidence can be established using decision consistency indexes, such as percentage of correct decisions and Cohen’s kappa, across replications of the same testing procedure (Standards, p. 40). One way to replicate the same testing procedure is to repeat administration of the same test to the same group of examinees (i.e., test-retest). Although test-retest stability for SSIS SELb-T total scores has been found to be sufficient in previous studies (.78 in Elliott et al., 2020 and .85 in Anthony et al., 2021), stability evidence for the CRPF levels has not been evaluated to date.

Validity refers to “the degree to which all the accumulated evidence supports the intended interpretation of test scores for the proposed use” (Standards, p. 14). One relevant and common source of evidence to support the use of criterion-referenced interpretations is test-criterion relations, which can be established using correlational and experimental methods to examine concurrent and predictive evidence (Hambleton et al., 1978). A body of literature exists to support the association between SEL competence and academic outcomes for both elementary and secondary grades (Panayiotou et al., 2019), and the relationships between prosocial behaviors and academic achievement found in previous research has been characterized as moderately positive (see DiPerna et al., 2016).

Given that academic achievement is one of, if not the, primary intended outcomes of the formal schooling process, whether the CRPF levels are related concurrently and predictively to academic outcomes is an important consideration for MTSS decision-making in schools. As one example, correlations between student reported SEL competence and standardized academic test scores were found to be mediated by mental health difficulties, suggesting important relationships between SEL, academic achievement, and student mental health needs (Panayiotou et al., 2019). Currently, no published studies have examined the test-criterion relationships between the CRPF and academic outcomes, and this important form of validity evidence warrants examination.

Lastly, as the SSIS SEL Brief Scales are intended to be used for universal screening and progress monitoring of SEL skills within MTSS, evidence of treatment sensitivity (which is essentially the experimental method that Hambleton et al., 1978 described) would be vital to support the use of scores for assigning students to tiers of interventions, progress monitoring of intervention responsiveness, and evaluating of intervention efficacy. For example, the SSIS-Classwide Intervention Program (SSIS-CIP; Elliott & Gresham, 2007) is a universal SEL program used in schools to promote social skill development and positive behavior. Via 10 core units and 30 lessons delivered by the classroom teacher, the program focuses on instruction of foundational social skills such as listening to others, paying attention to your work, and asking for help. Although previous evaluations have demonstrated that teacher ratings using the SSIS Rating Scale–Teacher form (SSIS RS-T; Gresham & Elliott, 2008) are responsive to the SSIS-CIP intervention (DiPerna et al., 2015; 2018), the sensitivity of the CRPF to skill changes resulting from such universal interventions has not yet been studied.

Thus, the current study is guided by three research questions:

(1) How stable are students’ CRPF levels over time (i.e., stability)?

(2) Are students’ CRPF levels related to their academic outcomes (i.e., test-criterion relationships with academic outcomes)?

(3) Are students’ CRPF levels sensitive to intervention (i.e., treatment sensitivity)?

Although initial reliability and validity evidence support the use of SSIS SELb-T norm-referenced total scores for lower-stakes decisions (Anthony et al., 2021), stability, test-criterion relationships with academic outcomes, and treatment sensitivity of the CRPF levels have not been thoroughly examined to date. Based on previous findings with total scores from the SSIS SELb-T, the CRPF levels are expected to be moderately stable in the absence of intervention (Anthony et al., 2021), positively related to academic outcomes (e.g., Panayiotou et al., 2019), and relatively sensitive to skill change resulting from intervention (e.g., DiPerna et al., 2016; 2018).

Method

To generate psychometric evidence to evaluate the CRPF, we conducted a secondary analysis of data from a previous efficacy trial of a school-based SEL intervention. Specifically, evidence of reliability (test-retest stability) and validity (test-criterion relationships with external criteria as well as treatment sensitivity) appropriate for criterion-reference score levels were examined.

Sample

Data for this study were drawn from a multi-year efficacy trial of the SSIS-Classwide Intervention Program (SSIS-CIP; Elliott & Gresham, 2007) in the Mid-Atlantic region of the United States (See DiPerna et al., 2015; 2016 for more information). The original study used a multisite cluster randomized controlled trial design in which second-grade classrooms (clusters) were randomly assigned to treatment conditions (intervention or business-as-usual [BAU; standard practices employed by school if a research study was not being conducted]) within seven schools (sites). It included four cohorts of students who were enrolled in participating Grade 2 classrooms across four successive years. Follow-up data were collected for two additional years in the same schools during which schools were free to use the SSIS-CIP program on a voluntary basis.

A total of 641 students (354 treatment, 287 BAU) from 54 classes (31 treatment, 23 BAU) constituted the total analytic sample for the current study. A small urban district (5 schools; nearly 100% of students eligible for free or reduced-price lunch) and a small rural district (2 schools; 42% of students eligible for free or reduced-price lunch) participated. Table 1 presents the demographic details of the total analytic sample. Within this student sample, about 17% were Black, 5% Hispanic, 19% received at least one form of supplemental services (Title I, instructional support, tutoring, Response to Intervention, or other) at school, and 8% received one or more special education services (mostly speech and learning disabilities). A little over half of the student sample (53%) were girls. To eliminate potential confounds resulting from exposure to SEL intervention, only data from participants in the BAU condition were used to address the first two research questions about stability (based on pretest and posttest with an approximate 3-month interval) and test-criterion relationships (including concurrent relationship using baseline data and predictive relationship using 1-year follow-up data on academic outcomes) for the CRPF levels. The treatment sensitivity question was addressed using data from the whole sample.

Table 1.

Demographic Characteristics (%) of the Analytic Sample.

Variables	Treatment Group (n = 354)	BAU Comparison Group (n = 287)
Race
Black	19.49	14.29
Asian	2.26	1.74
White	57.06	70.73
Hispanic	6.50	3.83
Other	2.54	1.05
Multi-racial	0.85	1.05
Supplemental services (≥1)	16.67	22.30
Title I	13.56	16.38
Instructional	2.82	4.53
Tutoring	5.08	5.57
Response to intervention	2.26	5.92
Other	5.65	8.01
Special education (≥1)	9.32	5.92
Speech/Language impairment	5.65	2.09
Language learning disability	2.82	1.05
Reading learning disability	5.93	3.48
Writing learning disability	3.95	3.14
Mathematics learning disability	3.95	2.79
Hearing impairment	0.28	0.70
Visual impairment	0.28	0.00
Orthopedic impairment	0.56	0.00
Unknown	0.00	0.35
Gender
Female	53.11	52.96
Male	43.50	45.64

Note. Table entries are % of column total. The %‘s may not sum to 100% within variables due to missing data or multiple category endorsements.

Measures

Three measures were used in this study, one for SEL skills and two for academic outcomes (reading and math).

SEL skills were measured by items on the SSIS SELb-T (Elliott et al., 2020). The SSIS SELb-T was developed by applying Item Response Theory to the standardization sample of the more extensive and full-length SSIS RS-T and SSIS SEL Rating Forms (Gresham & Elliott, 2017). A set of maximally efficient items was selected that maintained content coverage as well as similar psychometric properties to the original forms (see Anthony et al., 2021 for more information about the development of these brief forms). The published SSIS SELb-T (Elliott et al., 2020) includes 20 items, four for each of the five CASEL domains of self-awareness, self-management, social awareness, relationship skills, and responsible decision-making. Anthony et al. (2021) reported evidence of internal consistency (Cronbach’s $α$ = .93 for composites and .79–.87 for subscales), test-retest reliability (.84 for composites and .75–.83 for subscales), and interrater reliability (.65 for composites and .47–.65 for subscales). Validity evidence was demonstrated by the scales’ relationships with the Social Skills Rating System–Teacher Rating Scale (SSRS-T; Gresham & Elliott, 1990), the Behavior Assessment System for Children-Second Edition (BASC-2; Reynolds & Kamphaus, 2004), and Vineland Adaptive Behavior Scales, Second Edition (Sparrow et al., 2005) as indicated in Anthony et al. (2021). For this study, the total scores were computed using the SSIS SELb-T items that were originally present on the SSIS RS-T, administered during an efficacy trial of a universal SEL program. The total SSIS SELb-T scores were then transformed to four CRPF levels using cut scores reported in the SSIS SELb-T manual (Elliott et al., 2020).¹

Star Reading (Renaissance Learning, 2010) is a brief, computer adaptive test that assesses students’ reading comprehension and overall reading achievement. It includes 25 items requiring students to construct sentence meaning based on vocabulary knowledge and contextual information. Split-half reliability estimates reported by the publisher were .89 in both Grades 2 and 3. Publisher-reported validity evidence included high correlation coefficients between Star Reading scores and other standardized reading assessment scores (e.g., American College Testing, Scholastic Aptitude Test, and Iowa Test of Basic Skills). Star Reading produces scaled scores, which are transformations of the Rasch ability estimate resulting from the test, that range from 0 to 1400. The scores represent a student’s performance on a vertical developmental scale that spans the K-12 grade levels (Renaissance, 2010). For example, the first to third quartiles (Q1, Q2, and Q3) benchmarked for Grade 2 students in the fall (around baseline assessment for our sample) were 126, 224, and 322, respectively; and the corresponding quartiles for Grade 3 in fall (around the 1-year follow-up assessment for our sample) were 259, 357, and 461 (Renaissance Learning, 2015).

Star Math (Renaissance Learning, 2009) uses multiple-choice questions to assess students’ number concepts, computation, algebraic thinking and other fundamental math skills. Split-half reliability estimates provided by the publisher were .78 in both Grades 2 and 3. High positive correlations with scores from other standardized math assessments provide support for the validity of Star Math scores. Star Math scaled scores range from 0 to 1400. The first to third quartiles benchmarked for Grade 2 students in fall (around baseline assessment for our sample) were 357, 414, and 467, respectively; and the corresponding quartiles for Grade 3 in fall (around the 1-year follow-up assessment for our sample) were 443, 500, and 552 (Renaissance Learning, 2015).

Procedures

Active parent/guardian consent and student assent were obtained for all students participating in the data collection associated with the larger efficacy trial; approximately 52% of students in participating classrooms had parental consent to participate. Classrooms in both intervention and BAU conditions followed the same data collection procedures. Teachers completed an online questionnaire for each participating student that included the SSIS SELb-T items as well as other items from the full-length SSIS teacher form. Teachers also provided student demographic information such as gender, race/ethnicity, receipt of special education, or supplemental services. Trained research assistants administered the Star Reading and Math assessments. Baseline data were collected within a 4-week period (e.g., October–November) before implementation of the SSIS-CIP (Elliott & Gresham, 2007) in classrooms randomly assigned to the treatment condition. Posttest data were collected after the SSIS-CIP implementation (e.g., March–April). Follow-up Star data were collected in the year following the SSIS-CIP implementation in winter/early spring (e.g., February–March).

Teachers assigned to the intervention condition received training in using the SSIS-CIP program (see DiPerna et al., 2015, 2016 for more information). The SSIS-CIP, which focuses on social skills identified by teachers as being important for successful classroom learning, consists of 10 core units that are each taught via three lessons. Program materials include a teacher manual providing scripted lesson plans, brief video examples, and role-play activities for students. Lessons are delivered through direct instruction of skill steps, modeling and practice of steps, and monitoring and generalization activities. Fidelity of implementation of each of the lesson components was monitored via periodic real-time observations in the classroom by trained research assistants as well as weekly self-report questionnaires completed by teachers. Fidelity reported by both trained independent observers and teachers was high (97–98%). APA ethical guidelines were followed for conducting human participants research, and all procedures were approved by institutional review board.

Data Analysis

To address the first research question about stability of the SSIS SELb CRPF levels, only data from participants in the BAU condition were used because their post-test scores were not affected by the intervention. Specifically, we cross-tabulated frequencies between baseline and post-test CRPF levels for the BAU group and calculated percentages of each baseline level distributed across different CRPF levels at posttest. The association between baseline and posttest CRPF levels was estimated by multiple indexes (including the decision consistency indexes of percentage of agreement and Cohen’s kappa, phi-coefficient, and Spearman’s rho) and tested for statistical significance using chi-square. Stronger association suggests higher stability and less movement across CRPF levels over the approximate 3-month interval.

To address the second research question and examine test-criterion relationships with academic outcomes for the CRPF levels, we again used participants in the BAU condition who completed the Star measures at baseline for concurrent relationship evidence and 1-year follow up for predictive relationship evidence. The CRPF levels defined at baseline were analyzed to provide concurrent and predictive relationship evidence with academic outcomes at baseline and 1-year follow up, respectively. We first conducted a simple one-way ANOVA test to examine CRPF level differences on each of the standardized Star Math and Star Reading measures at each time point. Pairwise comparisons between CRPF level means were conducted using Tukey’s HSD to control familywise Type I error. Next, we conducted a random-intercept ANOVA model using the mixed procedure in SAS (SAS Institute Inc, 2020a) to account for clustering by classrooms and to test CRPF level mean differences on math and reading at each time point while controlling for participants’ demographic variables, including receipt of special education (1 = yes, 0 = no), receipt of supplemental services (1 = yes, 0 = no), race/ethnicity (1 = White student, 0 = racial-ethnic minority student), and gender (1 = male student, 0 = female student). Benjamini and Hochberg (1995) correction was applied to control for false discovery rate due to multiple comparisons between groups (i.e., 6 pairwise comparisons). Significant differences among CRPF levels on baseline math or reading scores would provide concurrent relationship validity evidence for the CRPF. Significant CRPF level differences on math or reading scores at 1-year follow up would provide predictive relationship validity evidence for the CRPF. In addition, we ran a random-intercept ANCOVA model using the mixed procedure in SAS (SAS Institute Inc, 2020a) on each of the follow-up math and reading measures while controlling for participants’ corresponding baseline math or reading scores and demographic variables to investigate whether the CRPF levels could predict relative change over a year in academic outcomes.

For Research Question 3 regarding the sensitivity of CRPF levels to intervention, we used data from the efficacy trial of the SSIS-CIP for second grade participants with both baseline and posttest CRPF levels. First, baseline equivalence between treatment conditions was gauged for baseline CRPF levels and each of the demographic variables using the quick and simple chi-square test of independence. We did not use more advanced methods to assess baseline equivalence as these baseline variables were all included in the analysis model and thereby controlled. Then, we ran a random-component proportional odds model with multinomial distribution and cumulative probit link to test whether students in the SSIS-CIP and BAU comparison conditions had differential probabilities of moving up or down the CRPF levels from baseline to post-test. Participants’ special education status, supplemental service status, race-ethnicity (White student or racial-ethnic minority student), and gender were included as covariates and clustering by classrooms were accounted for in the model using the glimmix procedure of SAS (SAS Institute Inc, 2020b).

Results

The listwise-deleted sample sizes (by analysis) and descriptive statistics of participants’ baseline SSIS SELb-T prorated scores are presented in Table 2 by CRPF levels. For the stability research question, there were no missing data on the CRPF levels at baseline from the BAU comparison sample but four (1.4%) missed posttest. For CRPF-academic outcome relationships, missing was mostly due to the 1-year follow-up attrition from the BAU comparison group (60 participants or 20.9% missed follow-up). Six to seven other participants (2–3%) missed baseline academic scores and another 10 (3.5%) missed demographic variables. For the treatment sensitivity question, 21 students (3.3%) were not assessed at baseline, 11 were lost to attrition at posttest (1.7%), and 55 (8.6%) were missing demographic variables.

Table 2.

Sample Sizes and Baseline SSIS SELb-T Scores by Analysi and CRPF Level.

	Stability			Test-Criterion Relationships						Treatment Sensitivity
	N = 283			Math N = 210			Reading N = 211			N = 554
	n	Mean	SD	n	Mean	SD	n	Mean	SD	n	Mean	SD
Emerging	21	19.52	4.03	15	19.53	3.58	13	19.54	3.73	28	20.11	4.12
Developing	47	30.53	2.87	32	30.75	3.11	35	30.83	3.04	94	30.61	2.70
Competent	143	40.83	4.10	112	40.67	3.96	112	40.55	3.96	299	41.14	4.14
Advanced	72	55.04	3.54	51	54.71	3.74	51	54.80	3.81	133	54.80	3.44

As expected, the SSIS SELb-T prorated total scores were different across CRPF levels, by about 10 points between adjacent levels from Emerging to Competent and 15 points between Competent and Advanced. The SSIS SELb-T score variability was quite similar across the CRPF levels. The score pattern was also similar across analysis samples.

Stability Evidence

The cross-tabulated frequencies between baseline and posttest CRPF levels for the BAU group are shown in Table 3, and percentages of each baseline level distributed to different levels at posttest are included in parentheses. At baseline about 25.4, 50.5, 16.6, and 7.4% of students were in the Advanced, Competent, Developing, and Emerging levels, respectively. The majority of the initial Advanced (75%), Competent (69%), and Emerging (57%) students remained in the same level about 3 months later. A substantial percentage of initial Competent (20%), Developing (43%), and Emerging (38%) students moved up to a higher level of competence at post-test. A relatively smaller percentage of students who were initially in the Competent (10%) and Developing (17%) levels moved down to lower levels; 25% of students in the Advanced level moved to a lower level as well. Overall, the CRPF levels were relatively stable over an approximate 3-month interval (χ² = 217.17, df = 9, p < .0001) with moderate to strong association between the baseline and posttest CRPF levels (percentage of exact agreement = 64.7, kappa = .46, phi coefficient = .88, Spearman’s rho = .683).

Table 3.

Stability of CRPF Levels for BAU Students.

Baseline CRPF Level	Posttest CRPF Level
Baseline CRPF Level	Emerging (n = 24)	Developing (n = 40)	Competent (n = 134)	Advanced (n = 84)
Emerging (n = 21)	12 (57.14%)	8 (38.10%)	1 (4.76%)	0 (0%)
Developing (n = 47)	8 (17.02%)	18 (38.30%)	20 (42.55%)	1 (2.13%)
Competent (n = 143)	3 (2.10%)	12 (8.39%)	99 (69.23%)	29 (20.28%)
Advanced (n = 72)	1 (1.39%)	2 (2.78%)	15 (20.83%)	54 (75%)

Note. Cell entries are number of students from their baseline level (% of the corresponding baseline level) moved to different levels at posttest. N = 283; χ²₉ = 217.17, p < .0001; percentage of exact agreement = 64.7, Cohen Kappa = .46, phi coefficient = .88, Spearman’s rho = .68.

Test-Criterion Relationships Evidence

Table 4 shows the SSIS SELb CRPF level differences on Star Math and Star Reading scores by measurement time, with or without adjustment for clustering and demographic variables. The simple ANOVA test for CRPF level differences at baseline was statistically significant for both math, F (3, 206) = 5.21, p = .002, and reading, F (3, 207) = 4.14, p = .007. The test for CRPF level differences at 1-year follow up was also statistically significant for both math, F (3,206) = 5.88, p = .001, and reading, F (3,207) = 3.21, p = .024. As expected, higher math and reading scores were related to higher SEL performance levels. After adjusting for clustering and demographic differences, CRPF levels remained statistically significant for math both at baseline, F (3,199) = 3.17, p = .026, and at follow up, F (3,172) = 4.55, p = .004. Specifically, students at the Emerging SEL level scored significantly lower on math (average below the Q2 benchmark for Grade 2 students at Fall) than their Advanced peers (average above Q3) at baseline and lower than all other three SEL levels at 1-year follow up (average remained below the Q2 benchmark for Grade 3 students vs. average above Q3). However, CRPF levels became statistically nonsignificant for reading at both baseline, F (3,195) = 1.61, p = .189, and follow up, F (3,191) = 1.21, p = .307, after accounting for demographic differences and clustering (although the Emerging group consistently scored below the Q2 benchmark while the other groups scored around Q2 but below Q3). Moreover, CRPF levels were not statistically significant in predicting relative change in math, F (3, 168) = 1.96, p = .122, or reading, F (3,181) = .43, p = .735 at 1-year follow up when the corresponding math or reading baseline differences were also adjusted.

Table 4.

CRPF Level Differences on Star Math and Star Reading Scaled Scores.

	Baseline			One-year Follow up			Adjusted Follow up
	Mean^a	SD	Adjusted Mean^b	Mean^a	SD	Adjusted Mean^b	Mean^b,c
Star math scaled scores
Emerging (n = 15)	386.20	100.13	404.55	483.53	82.40	492.25	519.87
Developing (n = 32)	427.94	99.04	436.42	554.22	79.61	558.54	565.08
Competent (n = 112)	444.06	70.87	439.84	562.14	86.30	562.28	562.46
Advanced (n = 51)	471.90	76.05	467.95	586.06	80.60	580.14	567.79
Star reading scaled scores
Emerging (n = 13)	151.85	69.46	182.06	301.69	113.32	335.33	377.66
Developing (n = 35)	221.37	102.62	231.33	355.97	148.58	367.19	364.11
Competent (n = 112)	224.29	105.81	222.23	378.38	141.78	374.52	380.37
Advanced (n = 51)	259.82	101.12	244.13	418.24	110.96	404.36	387.44

Note. Italic means are significantly different from bold means in the same column within domain.

^aPairwise comparison across CRPF levels within domain using Tukey’s HSD.

^bAdjusted for receipt of special education, receipt of supplemental services, race-ethnicity (White student or racial-ethnic minority student), gender, clustering by classrooms, and false discovery rate (Benjamini & Hochberg, 1995).

^cIncluding baseline.

Treatment Sensitivity

Based on chi-square tests of independence, the two treatment groups did not appear to be greatly unbalanced with respect to baseline CRPF levels (χ²₃ = 7.34, p = .062), special education status (χ²₁ = 2.60, p = .107), receipt of supplemental services (χ²₁ = 2.09, p = .148), and gender (χ²₁ =.05, p = .815). However, the treatment group had a larger proportion of racial-ethnic minority students than the BAU group (35.67% vs. 23.68%; χ²₁ = 9.26, p = .002). Results of the random-component proportional odds model showed that treatment condition was a statistically significant predictor of posttest CRPF levels after accounting for baseline CRPF levels, demographic variables, and clustering (b = .53, SE = .19, p = .007).

Table 5 shows the model-estimated conditional transition probabilities by treatment group. Students in the SSIS-CIP condition had higher probabilities of transitioning to higher CRPF levels and lower probabilities of regressing to lower CRPF levels from pretest to posttest than the BAU comparison students. In an exploratory fashion, we also examined relationships between student demographic variables and the transition probabilities given findings in the original efficacy study that student-level demographic variables, such as receipt of supplemental services and initial skill levels, were related to intervention effects (DiPerna et al., 2015; 2016). Among the included demographic variables, students’ gender and receipt of supplemental services were statistically significantly associated with transition probabilities. Holding other variables constant, female students had slightly higher estimated probabilities of transitioning to higher CRPF levels than male students (b = .23, SE = .11, p = .04), and students who did not receive supplemental services were more likely to transition to higher CRPF levels than those who did (b = .31, SE = .14, p = .03).

Table 5.

Conditional Transition Probabilities by Treatment Condition.

Baseline Levels	Posttest Levels
	Emerging (n = 26)		Developing (n = 75)		Competent (n = 255)		Advanced (n = 198)
	Treat	BAU	Treat	BAU	Treat	BAU	Treat	BAU
Emerging (n = 28)	.309	.514	.475	.393	.213	.093	.003	<.001
Developing (n = 94)	.043	.131	.291	.434	.610	.420	.056	.015
Competent (n = 299)	.002	.010	.047	.137	.591	.686	.360	.168
Advanced (n = 133)	<.001	<.001	.001	.003	.111	.239	.889	.757

Note. N = 554, number of classes = 51. Covariates include students’ special education status, supplemental service status, race-ethnicity, and gender. Treat = SSIS-CIP, BAU = Business-as-usual.

Discussion

School-based interventions to promote student social-emotional competence have become widespread during the last decade (Bryant et al., 2021). However, the development of sound assessment tools and practices to inform intervention planning have lagged (Gross et al., 2019), and data-based decision making about screening and intervention remains a challenging task for MTSS teams (VanDerHeyden et al., 2018). Efficient SEL measures with practical interpretive frameworks are essential to facilitate SEL screening in schools and inform programing decisions (DiPerna et al., 2020). The competency-referenced performance framework (CRPF) was developed for the SSIS SEL Brief Scales to facilitate screening decisions (Elliott et al., 2020). However, the psychometric evidence for performance levels yielded from this framework has not been examined with samples beyond the original standardization data. This study examined evidence of stability, test-criterion relationships with academic outcomes, and treatment sensitivity of the CRPF for SSIS SELb-T using data drawn from a large efficacy trial.

Key Findings

Results of the study showed that the SSIS SELb-T CRPF levels appeared to be relatively stable despite longer than typical time intervals between assessment administrations for stability indices. Kappa appeared somewhat low because it was affected by unequal distribution across CRPF levels. Test-retest stability for the SSIS SELb-T composite scores reported in the manual (Elliott et al., 2020) was .78 presumably for a typical 2-week interval. Stability based on the SSIS SELb-T scores for our BAU sample was comparable at .77. Therefore, the slightly lower estimates for CRPF levels (e.g., Pearson r =.7, Spearman’s rho =.68) appear to primarily be due to categorization of scores rather than longer time intervals. However, this might not hold for a longer time interval (e.g., from beginning to end of the school year) as changes might occur resulting from teachers naturally correcting social interactions and reinforcing appropriate peer-to-peer engagement in classrooms over time.

The SSIS SELb-T CRPF levels were also related to standardized math achievement scores both concurrently and predictively, with or without adjusting for clustering and demographic differences. However, the CRPF levels did not significantly relate to standardized reading scores after controlling for clustering and demographic differences, nor did they predict relative change on math or reading outcomes. Although the omnibus tests for CRPF level differences on academic outcomes were not statistically significant when baseline differences were adjusted for, the gap in math outcome between the emerging level and the other CRPF levels was notably larger at 1-year follow up compared to baseline.

The lack of statistical significance could be due to low statistical power resulting from the smaller sample size for the BAU sample when broken down by CRPF levels (particularly for the Emerging category). However, the relative frequency distributions of students in our samples across the CRPF levels are similar to the distribution of the standardization sample published in the manual and consistent with developmental expectations (Elliott et al., 2020). Specifically, the percentages of students in different CRPF levels are not expected to be equal, and the percentage in the Emerging level is expected to be relatively small. As such, the pattern of between-CRPF-level differences would not likely change dramatically if the total sample size were to increase. Given that significant difference between groups parallel different benchmark quartiles for the Star scaled scores, we are confident that the findings are not significantly negatively impacted by the relatively small sample size for one group. Nevertheless, the finding should be replicated in future research with larger samples to explore if students with emergent SEL skills might require special attention not only in the SEL domain.

Previous research has highlighted differences between early math and reading skill development, suggesting that math skill acquisition may rely more heavily on active and quality learning environments in schools (Ginsburg et al., 2008; Rimm-Kaufman et al., 2007), be more highly compromised by behavioral problems in the classroom (Miller et al., 2017), and relate positively to students’ behavioral skills (Ponitz et al., 2009). Similarly, improvements in math skills have been linked to interventions that promote self-regulation (Schmitt et al., 2015) and self-affirmation (Borman et al., 2016). Future work is necessary to further investigate if SEL constructs are more salient to math compared to reading and examine factors that may mediate such relationship.

Moreover, the SSIS SELb-T CRPF levels demonstrated sensitivity to the SSIS-CIP intervention, an important condition if the assessment is to be used to evaluate SEL programming outcomes. However, given the alignment between skills assessed by the SSIS SELb-T and skills targeted by the SSIS-CIP lessons, the treatment sensitivity results may not generalize to other SEL programs. As such, future studies are needed to determine if the CRPF is sensitive to SEL skill change resulting from the implementation of other universal SEL programs (e.g., Second Step, [Committee for Children, 2016]; Promoting Alternate Thinking Strategies [PATHS; Kusche & Greenberg, 1994]). In addition, female students and those who were not receiving supplemental services had a higher probability of transitioning to a higher CRPF level. In reviewing the demographic characteristics of students reported in a large meta-analysis of universal SEL impact studies, Rowe and Trickett (2018) found that gender was the most analyzed moderator of treatment effect, and 41% of studies that tested treatment-by-gender interactions found significant relationships. However, they noted that there was no consistent pattern in the direction of this moderation and that very few studies reported student disability status or receipt of services. Especially given the mixed results in previous research, there is need to replicate the current findings with more samples and programs in future studies.

Limitations and Future Directions

In addition to the aforementioned limitations and future directions relative to specific research questions, this study only examined the SSIS SELb Teacher form with Grade 2 students. As such, findings also may not generalize to other SSIS SELb forms completed by other informants (students or parents) or to other grade levels. Students’ behaviors might vary in different contexts (e.g., home or community vs. school), and there might be developmental differences across grades or ages (e.g., different variabilities [besides different means] that may affect strengths of inter-variable relationships). It is important to further investigate whether evidence of reliability and validity for CRPF levels is similar for other contexts, grades, or age levels.

In addition, teachers who taught the SSIS-CIP lessons also rated students’ behavior outcomes in this study, an unavoidable limitation for gauging treatment sensitivity of teacher ratings relative to a universal program that is meant to be facilitated by classroom teachers. As such, it is possible that teachers were more primed to notice SEL skills after teaching the lessons. However, given that teachers in both treatment and BAU conditions rated students at baseline using the same rating form, it is possible that both groups could have been primed to notice SEL skills after the first exposure to the rating form. One could also argue that being more primed to notice SEL skills is part of the program impact; that is, the program could have changed teachers’ perceptions about SEL (e.g., Domitrovich et al., 2016). Future investigation could include a qualitative component to better understand the program impact on teachers and their perceptions of student behaviors.

Finally, it is important to acknowledge limitations associated with the items used to obtain the CRPF levels in this study. First, because there were three new items introduced for the Self-Awareness scale of the SSIS SELb measures subsequent to data collection for the efficacy trial, scores for only 17 of the 20 SELb items were available in the current database and prorated to obtain the total scores. This approach implicitly assumes that the missing items would function similarly to the available items, which may or may not be tenable. Second, Self-Awareness was underrepresented because the three unavailable items were all from this subscale. Although the CRPF levels were based on cut scores for the composite rather than subscale scores and Spearman-Brown’s stepped-up reliability was high for the composite, classification of participants might nonetheless be different when all items are completed. As such, it is crucial to replicate the current study with the intact SSIS SELb-T form and larger independent samples.

Implications and Conclusions

As the demand for efficient and informative SEL assessments continues to grow in schools, the availability of criterion-referenced interpretative frameworks such as the SSIS SELb CRPF may help improve the practical utility of scores from SEL assessments. Understanding student functioning relative to a performance level is an approach with which teachers, families, and even students are already familiar. The use of criterion-referenced score interpretation frameworks is common in standardized tests of academic achievement such as those used in statewide student assessment systems. As such, stakeholders are likely to find such scores easier to understand, interpret, and act upon compared to the norm-referenced approaches most commonly used in SEL assessments today. Furthermore, performance level interpretation may assist in making MTSS intervention decision-rules that are intuitive, consistently applied, and easy to document and track. When considering adoption of an assessment for universal SEL screening in schools, members of MTSS teams have been encouraged to consider the reliability and validity of the scores yielded by the measure. We further encourage them to also consider the type(s) of interpretation framework(s) offered by an SEL measure and how they may be used by school personnel and understood by other stakeholders. In addition, developers of SEL assessments should continue to investigate the potential use of criterion-referenced frameworks to advance efficient, informative, and useful SEL assessment decisions for students.

Results of this study provide initial independent evidence relative to the use of the SSIS SELb-T CRPF for universal SEL screening and monitoring student response to implementation of the SSIS-CIP universal program in schools. Although future investigations with additional samples and intervention programs are needed, the current results complement and expand upon prior findings of reliability and validity evidence for SSIS SELb scores using data drawn from the standardization sample (Anthony et al., 2021).

Statement of Potential Conflicts of Interest: Pui-Wa Lei and James C. DiPerna are authors of the SSIS SEL Brief Scales and receive a royalty from the publisher.

The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305A090438 to The Pennsylvania State University. The opinions expressed are those of the authors and do not represent the views of the Institute or the U.S. Department of Education.

Footnotes

Declaration of Conflicting Interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Pui-Wa Lei and James C. DiPerna are authors of the SSIS SEL Brief Scales and receive a royalty from the publisher.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Institute of Education Sciences (R305A090438).

ORCID iDs

Pui-Wa Lei

Susan Crandall Hart

Notes

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014). Standards for educational and psychological testing. AERA, APA, & NCME.

Anthony

C. J.

Elliott

DiPerna

J. C.

Lei

P.-W.

(2021). Initial development and validation of the social skills improvement system – social and emotional learning brief scales-teacher form. Journal of Psychoeducational Assessment, 39(1), 166–181. https://doi.org/10.1177/0734282920953240

Benjamini

Hochberg

(1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B (Methodological), 57(1), 289–300.

Bond

(1996). Norm- and criterion-referenced testing. Practical Assessment, Research & Evaluation, 5(2). https://doi.org/10.7275/dy7r-2x18

Borman

G. D.

Grigg

Hanselman

(2016). An effort to close achievement gaps at scale through self-affirmation. Educational Evaluation and Policy Analysis, 38(1), 21–42. https://doi.org/10.3102/0162373715581709

Bryant

Mainelli

Crowley

Glennen

Edzie

(2021). Finding your place 2021: Social emotional learning takes center stage in K-12. Tyton partners. https://tytonpartners.com/post-pandemic-schooling-emphasizing-social-emotional-learning/

Burns

M. K.

Peters

Noell

G. H.

(2008). Using performance feedback to enhance implementation fidelity of the problem-solving team process. Journal of School Psychology, 46(5), 537–550. https://doi.org/10.1016/j.jsp.2008.04.001

Buros Center for Testing–Spencer Foundation Project Scholars (2020). Social-emotional learning assessment technical guidebook. https://buros.org/sel-assessment-technical-guidebook

Clifford

(2016). A rationale for criterion-referenced proficiency testing. Foreign Language Annals, 49(2), 224–234. https://doi.org/10.1111/flan.12201

10.

Collaborative for Academic, Social, and Emotional Learning (2022). What is the CASEL Framework? https://casel.org/fundamentals-of-sel/what-is-the-casel-framework/

11.

Committee for Children (2016). Second Step social-emotional programming.

12.

DiPerna

J. C.

Frey

J. R.

Hart

S. C.

(2020). Social-emotional learning. In Worrell

F. C.

Hughes

T. L.

(Eds), The cambridge handbook of applied school psychology (pp. 428–449). Cambridge University Press. https://doi.org/10.1017/9781108235532

13.

DiPerna

J. C.

Lei

Bellinger

Cheng

(2015). Efficacy of the social skills improvement system classwide intervention program (SSIS-CIP) primary version. School Psychology Quarterly, 30(1), 123–141. https://doi.org/10.1037/e615512013-001

14.

DiPerna

J. C.

Lei

Bellinger

Cheng

(2016). Effects of a universal positive classroom behavior program on student learning. Psychology in the Schools, 53(2), 189–203. https://doi.org/10.1002/pits.21891

15.

DiPerna

J. C.

Lei

Cheng

Hart

S. C.

Bellinger

(2018). A cluster randomized trial of the social skills improvement system-classwide intervention program (SSIS-CIP) in first grade. Journal of Educational Psychology, 110(1), 1–16. https://doi.org/10.1037/edu0000191

16.

Domitrovich

C. E.

Bradshaw

C. P.

Berg

J. K.

Pas

E. T.

Becker

K. D.

Musci

Embry

D. D.

Ialongo

(2016). How do school-based prevention programs impact teachers? Findings from a randomized trial of an integrated classroom management and social-emotional program. Prevention Science, 17, 325–337. https://doi.org/10.1007/s11121-015-0618-z

17.

Durlak

J. A.

Weissberg

R. P.

Dymnicki

A. B.

Taylor

R. D.

Schellinger

K. B.

(2011). The impact of enhancing students’ social and emotional learning: A meta-analysis of school-based universal interventions. Child Development, 82(1), 405–432. https://doi.org/10.1111/j.1467-8624.2010.01564.x

18.

Elliott

S. N.

Anthony

C. J.

DiPerna

J. C.

Lei

P.-W.

Gresham

F. M.

(2020). SSIS SEL Brief Scales: User guide & technical manual. SAIL CoLab.

19.

Elliott

S. N.

Gresham

F. M.

(2007). Social skills improvement system: Classwide intervention program. Pearson.

20.

Ginsburg

H. P.

Lee

J. S.

Boyd

J. S.

(2008). Mathematics education for young children: What it is and how to promote it. Social Policy Report, 22(1), 1–24. https://doi.org/10.1002/j.2379-3988.2008.tb00054.x

21.

Gresham

F. M.

Elliott

S. N.

(1990). Social skills rating system. American Guidance Service.

22.

Gresham

F. M.

Elliott

S. N.

(2008). Social skills improvement system: Rating scales. Pearson.

23.

Gresham

F. M.

Elliott

S. N.

(2017). Social skills improvement system - social emotional learning edition rating forms. Pearson.

24.

Gross

T. J.

Farmer

R. L.

Ochs

S. E.

(2019). Evidence-based assessment: Best practices, customary practices, and recommendations for field-based assessment. Contemporary School Psychology, 23(3), 304–326. https://doi.org/10.1007/s40688-018-0186-x

25.

Hagermoser Sanetti

L. M.

Collier-Meek

M. A.

(2019). Increasing implementation science literacy to address the research-to-practice gap in school psychology. Journal of School Psychology, 76(October), 33–47. https://doi.org/10.1016/j.jsp.2019.07.008

26.

Hambleton

R. K.

Swaminathan

Algina

Coulson

D. B.

(1978). Criterion-referenced testing and measurement: A review of technical issues and developments. Review of Educational Research, 48(1), 1–47. https://doi.org/10.3102/00346543048001001

27.

Hart

S. C.

DiPerna

J. C.

Lei

P. W.

Cheng

(2020). Nothing lost, something gained? Impact of a universal social-emotional learning program on future state test performance. Educational Researcher, 49(1), 5–19. https://doi.org/10.3102/0013189X19898721

28.

Hines

E. M.

Mayes

R. D.

Harris

P. C.

Vega

(2022). Using a culturally responsive MTSS approach to prepare black males for postsecondary opportunities. School Psychology Review. Advance online publication. https://doi.org/10.1080/2372966X.2021.2018917

29.

Hughes

T. L.

Hess

Jones

Worrell

F. C.

(2020). From traditional practice to tiered comprehensive services for all: Developing a responsive school culture for the future. School Psychology, 35(6), 428–439. https://doi.org/10.1037/spq0000410

30.

Jones

D. E.

Greenberg

Crowley

(2015). Early social-emotional functioning and public health: The relationship between kindergarten social competence and future wellness. American Journal of Public Health, 105(11), 2283–2290. https://doi.org/10.2105/AJPH.2015.302630

31.

Kettler

R. J.

Glover

T. A.

Albers

C. A.

Feeney-Kettler

K. A.

(2014). An introduction to universal screening in educational settings. In Kettler

R.J.

Glover

T.A.

Albers

C.A.

Feeney-Kettler

K.A.

(Eds), Universal screening in educational settings: Evidence-based decision making for schools (pp. 3–16). American Psychological Association.

32.

Kim

E. K.

Anthony

C. J.

Chafouleas

S. M.

(2022). Social, emotional, and behavioral assessment within tiered decision-making frameworks: Advancing research through reflections on the past decade. School Psychology Review, 51(1), 1–5. https://doi.org/10.1080/2372966X.2021.1907221

33.

Kusche

Greenberg

(1994). PATHS: Promoting alternative thinking Strategies. Developmental Research Programs Inc.

34.

Lok

McNaught

Young

(2016). Criterion-referenced and norm-referenced assessments: Compatibility and complementarity. Assessment and Evaluation in Higher Education, 41(3), 450–465. https://doi.org/10.1080/02602938.2015.1022136

35.

McKown

(2019). Challenges and opportunities in the applied assessment of student social and emotional learning. Educational Psychologist, 54(3), 205–221. https://doi.org/10.1080/00461520.2019.1614446

36.

Miller

Votruba-Drzal

McQuigganv

Shaw

(2017). Pre-K classroom-economic composition and children’s early academic development. Journal of Educational Psychology, 102(2), 149–168. https://doi.org/10.1037/edu0000137

37.

Neil

D. T.

Wadley

D. A.

Phinn

S. R.

(1999). A generic framework for criterion-referenced assessment of undergraduate essays. Journal of Geography in Higher Education, 23(3), 303–325. https://doi.org/10.1080/03098269985263

38.

Panayiotou

Humphrey

Wigelsworth

(2019). An empirical basis for linking social and emotional learning to academic performance. Contemporary Educational Psychology, 56(1), 193–204. https://doi.org/10.1016/j.cedpsych.2019.01.009

39.

Ponitz

C. C.

McClelland

M. M.

Matthews

J. S.

Morrison

F. J.

(2009). A structured observation of behavioral self-regulation and its contribution to kindergarten outcomes. Developmental Psychology, 45(3), 605–619. https://doi.org/10.1037/a0015365

40.

Renaissance Learning (2009). Star Math technical manual.

41.

Renaissance Learning (2010). Star Reading technical manual.

42.

Renaissance Learning (2015). Star benchmarks, cut scores, and growth rates.

43.

Reynolds

C. R.

Kamphaus

R. W.

(2004). Behavior assessment system for children (2nd ed.). Pearson.

44.

Rimm-Kaufman

S. E.

Fan

Chiu

Y. J.

You

(2007). The contribution of the Responsive Classroom Approach on children’s academic achievement: Results from a three year longitudinal study. Journal of School Psychology, 45(4), 401–421. https://doi.org/10.1016/j.jsp.2006.10.003

45.

Rowe

H. L.

Trickett

E. J.

(2018). Student diversity representation and reporting in universal school-based social and emotional learning programs: Implications for generalizability. Educational Psychology Review, 30(2), 559–583. https://doi.org/10.1007/s10648-017-9425-3

46.

SAS Institute Inc (2020a). SAS/STAT® 15.2 user’s guide: The GLIMMIX procedure. https://documentation.sas.com/api/docsets/statug/15.2/content/glimmix.pdf?locale=en#nameddest=statug_glimmix_toc

47.

SAS Institute Inc (2020b). SAS/STAT® 15.2 user’s guide: The MIXED procedure. https://documentation.sas.com/api/docsets/statug/15.2/content/mixed.pdf?locale=en#nameddest=statug_mixed_toc

48.

Schmitt

S. A.

McClelland

M. M.

Tominey

S. L.

Acock

A. C.

(2015). Strengthening school readiness for Head Start children: Evaluation of a self-regulation intervention. Early Childhood Research Quarterly, 30, 20–31. https://doi.org/10.1016/j.ecresq.2014.08.001

49.

Sparrow

S. S.

Cicchetti

Balla

D. A.

(2005). Vineland adaptive behavior scales (2nd ed.). American Guidance Service.

50.

Taylor

R. D.

Oberle

Durlak

J. A.

Weissberg

R. P.

(2017). Promoting positive youth development through school-based social and emotional learning interventions: A meta-analysis of follow-up effects. Child Development, 88(4), 1156–1171. https://doi.org/10.1111/cdev.12864

51.

U.S. Department of Education (2021). Supporting child and student social, emotional, behavioral, and mental health needs. https://www2.ed.gov/documents/students/supporting-child-student-social-emotional-behavioral-mental-health.pdf

52.

VanDerHeyden

A. M.

Burns

M. K.

Bonifay

(2018). Is more screening better? The relationship between frequent screening, accurate decisions, and reading proficiency. School Psychology Review, 47(1), 62–82. https://doi.org/10.17105/SPR-2017-0017.V47-1

Examination of Psychometric Evidence for Criterion-Referenced Scores from the SSIS SEL Brief Scales

Abstract

Keywords

Method

Sample

Measures

Procedures

Data Analysis

Results

Stability Evidence

Test-Criterion Relationships Evidence

Treatment Sensitivity

Discussion

Key Findings

Limitations and Future Directions

Implications and Conclusions

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iDs

Notes

References