Abstract
Background:
Schools are an ideal setting for policy, systems, and environmental approaches to obesity prevention. Although school health environment assessments exist for planning purposes, we developed and tested a comprehensive questionnaire that is suitable for both evaluation and planning.
Methods:
Reliability was measured by comparing data collected by school personnel from low-income elementary schools across California at two time points, an average of 2 months apart (n = 23). To assess convergent validity, school responses were compared with the responses completed by the research team (n = 28). A weighted kappa test statistic and percent agreement were calculated for each question and specific groups of questions (questionnaire section, item topic, and response type).
Results:
Test/retest reliability of the questionnaire yielded kappa statistics that ranged from −0.14 to 1.00 (interquartile range [IQR] 0.36). Percent agreement for reliability ranged from 34.78 to 100.00 (IQR 21.7). Kappa statistics for validity ranged from −0.14 to 1.00 (IQR 0.44). Percent agreement for validity ranged from 14.29 to 100.00 (IQR 39.2). Based on these findings, the tool was revised.
Conclusions:
Study findings indicate that the Site-Level Assessment Questionnaire as tested is a reliable and accurate instrument for use in low-income elementary schools. Revisions may have improved the validity and reliability. We therefore recommend either version for use to support low-income schools in their efforts to assess needs, evaluate progress, and create action plans; and to supply high-quality, aggregable data for large-scale analysis. Additional testing is recommended to validate the revised version, increase generalizability, and determine sensitivity to detect change over time.
Introduction
Schools are widely regarded by the public health field to be a primary target for childhood obesity interventions and there has been an increasing focus on improving the health environment of schools through policy, systems, and environment (PSE) change efforts.1–4 Mounting evidence indicates that PSE interventions positively impact nutrition, fitness status, and weight of students.5–9
Initial assessment of a school's health environment, policies, and practices is critical for planning and identifying priority PSE interventions, while reassessment can be used to identify measurable change for ongoing planning and evaluation purposes.10,11 While instruments are currently available to help programs assess needs and plan health interventions, most are designed exclusively for program planning and lack enough specificity for evaluating comprehensive obesity prevention programs at the school level, focus on a particular strategy (e.g., cafeteria/meal programs, physical activity), or assess written policies rather than implementation.12–16
Consequently, there remains a need for an instrument that comprehensively assesses school nutrition and physical activity practices and that also includes the necessary measurement properties (e.g., reliability, validity, sensitivity) to allow comparison between two time points and to detect meaningful differences between sites.17,18 There is a particular need for tools that can be used to measure wellness policy implementation and compliance at the school level, as required by the Local School Wellness Policy Final Rule of the Healthy Hunger-Free Kids Act of 2010.19–21 Recent research showed that the majority of local school authorities have not met this requirement and identified a need for assessment tools to overcome the challenges school administrators cited in monitoring and evaluating the local wellness policy. 22
We developed the Elementary School Site-Level Assessment Questionnaire (SLAQ) as a comprehensive yet concise instrument that supports planning and evaluation of school-based nutrition and physical activity programs. In California, all local health departments using the Supplemental Nutrition Assistance Program Education (SNAP-Ed) funding to implement PSE interventions in low-income elementary schools are required to work with their partner schools to complete a SLAQ before implementation begins, and annually thereafter.
The questionnaire was designed to be straightforward for school personnel to complete, requiring minimal technical nutrition or physical activity knowledge, and to provide immediate feedback to school personnel and their partners to support an informed program planning process. Initial testing of these instruments demonstrated content validity and feasibility. 23 Revisions identified during each stage of development and testing were incorporated before the present study, which aimed to test the convergent validity and reliability of the SLAQ for elementary schools. All aspects of instrument development and testing were funded by SNAP-Ed.
Materials and Methods
Study Sample
Low-income elementary schools (≥50% of students enrolled in free or reduced price meals) serving students in grades K-6 were recruited from across California between May 2019 and January 2020 through partnerships with local SNAP-Ed implementing agencies (such as health departments and cooperative extension agencies) that offered SNAP-Ed programming to the schools. The implementing agencies referred interested schools to the research team to schedule school site visits and interviews, collect relevant documents, and follow up on SLAQ completion. The research team contacted 41 schools, of which 13 either did not respond or declined to participate after receiving additional study details.
Therefore, a total of 28 schools were included in the validity analysis. Five schools did not complete a second questionnaire, reducing the reliability sample to 23 schools. Each school received a payment of $500 for its participation in this study. School staff participating in key informant interviews received a $5 gift card for each interview section completed. Interviewees included school food service managers, principals, wellness leads, physical education leads, and health education leads; interviewees varied from school to school depending on which staff were the most knowledgeable about a given aspect of nutrition and physical activity policies and practices at that school.
Because this study did not include human subjects and was not considered systematic research designed to contribute to generalizable knowledge, it did not require a human subjects research review by the university Institutional Review Board.
Materials
The Elementary School SLAQ was developed by a team of researchers (a subset of the authors) who conducted a scan of the existing environmental policy and practice assessment instruments from a variety of sources, including the SNAP-Ed Evaluation Framework's recommended evaluation tools and the National Collaborative on Childhood Obesity Research (NCCOR) Measures Registry.24–26 Items from existing instruments12,14,15,27–34 were adapted to a standardized statement-and-response format, new items were constructed to fill gaps, and all items were aligned with regulations and recommendations from state and national government and nongovernment organizations, including the Local School Wellness Policy Final Rule of the Healthy Hunger-Free Kids Act of 2010 and USDA's SNAP-Ed Evaluation Framework.11,19–21
The version of the questionnaire tested in this study resulted from an expert panel review for content validity, pilot testing at a small sample of schools, and subsequent revisions. 23 The questionnaire contained 129 items (grouped into 100 questions) in nine sections, each consisting of a practice statement followed by response options. To optimize item sensitivity, wherever possible, response options indicated degree of implementation on a 5-point Likert-type scale; some items used a trichotomous scale or yes/no responses. Most items were scored on a scale of 0–4 (5-point: 0, 1, 2, 3, 4; trichotomous: 0, 2, 4; yes/no: 0, 4); a few items were scaled 0–1 or 0–2 because these individual items were judged to merit less weight than the items coded 0–4.
In all cases, a high score represented the best practice for that item. Most items were phrased as best practices; for purposes of reading clarity, some items were phrased in the negative and reverse-coded. The nine questionnaire sections were organized by topic and respondent type, for example, grouping together items to be completed by school food service, to facilitate questionnaire completion. This allows for school personnel to skip to relevant sections of the questionnaire, reducing the completion time for each person. The types of personnel most likely to have the content expertise to complete each question were identified during pilot testing. Questionnaires were completed as fillable Adobe PDFs, programmed with skips to reduce error, with auto-calculated section and overall scores that allow respondents to see results immediately upon completion.
Procedures
School personnel completed the Elementary School SLAQ at two time points, separated by ∼2 months. To determine convergent validity of the questionnaire, we compared school personnel's responses on the questionnaire at time one (original) with questionnaire responses completed by the research team (validation).
The research team conducted on-site observations, reviewed school wellness policies and procedures, reviewed district or school nutrition services menus and web pages, and interviewed district and school staff to collect the information needed to complete the questionnaires. The research team collected validation data within 112 days [mean 38 (28)] of each school's time one responses. To test reliability, we compared school personnel's time one (original) responses with their time two (reliability) responses, collected within 180 days [mean 66 (47)] of each school's time one responses.
Data Analysis
To assess the degree of agreement between the original and validation observations and between the original and reliability observations, we calculated the following statistics for each item. Percent exact agreement was calculated between original and reliability as well as between original and validity records for all items, categorical or continuous. The average percent agreement among items belonging to the same questionnaire section, topic, and response type was also calculated to identify patterns of high or low validity and reliability.
We also calculated weighted kappa statistics to examine the degree of agreement between original and reliability records as well as original and validation records. 35 To further understand how well the questionnaire's continuous section scores ranked sites, score tertile cut points were calculated for the original observations, reliability observations, and validation observations. These tertile cut points were then used to classify records into one of three categories: records with section score values in the first tertile, records with section score values in the second tertile, and records with section score values in the third tertile.
Weighted kappa statistics were then calculated to assess the degree of agreement in the ranking of records into these section score tertiles between original and reliability as well as between original and validation records. Following the method developed and tested by Muñoz and Bangdiwala, 36 weighted kappa values for each item were assigned to the following strength categories: poor (<0), fair (0–0.2), moderate (0.2–0.45), substantial (0.45–0.75), and almost perfect (0.75–1).
Post hoc power analysis assessed the likelihood of observing differences in kappa based on study parameters, including sample size, number of observers, and observed values of a primary study outcome (question 2.7, more than one vegetable choice is offered at lunch). Using a significance level of 0.05 and the prevalence of responding “Never,” “Not usually,” “Sometimes,” “Usually,” and “Always” to question 2.7 to be 0.1071, 0.0357, 0.0357, 0.0714, and 0.7500, respectively, with 26 schools and 2 raters, we would have 80% power to detect a difference between validity kappa values of 0 and 0.275. Using a significance level of 0.05 and the prevalence of responding “Never,” “Not usually,” “Sometimes,” “Usually,” and “Always” to question 2.7 to be 0.0526, 0.0526, 0.000001, 0.1579, and 0.7368, respectively, with 19 schools and 2 raters, we would have 80% power to detect a difference between reliability kappa values of 0 and 0.344.
Results
Validity analyses included a group of 28 diverse elementary schools representing a range of California locales (Table 1); reliability analyses included 23 of the schools from the validity sample. In both samples, the majority of students attending the study schools were low income and Hispanic.
Characteristics of Schools in Elementary School Site-Level Assessment Questionnaire Study Sample
FRPM, free or reduced price meals; N, number; SD, standard deviation.
Validity
Results of the validity analyses are reported for all items in Table 2. Weighted kappa statistics ranged from −0.14 to 1.00 across all items (interquartile range [IQR] 0.44). Weighted kappa was unable to be computed for seven items, for which original and validity observations were identical. The least valid items asked about using food for rewards and punishments (weighted kappa −0.14, 95% confidence interval [CI] −0.37 to 0.09) and student promotion of the meal program and healthy meals (weighted kappa −0.14, 95% CI −0.31 to 0.03). Two items with weighted kappa values of 1.00 asked about the presence of breakfast programs and noncompliant beverages. Moderate or better validity was obtained for 55.0% of items (Table 2). Percent agreement for validity ranged from 14.29 to 100.00 (IQR 39.2).
Elementary School Site-Level Assessment Questionnaire Item Validity and Reliability Statistics with Strength Ratings and Decision About Items
Key: Poor, weighted kappa <0; Fair, weighted kappa 0–0.2; Moderate, weighted kappa 0.2–0.45; Substantial, weighted kappa 0.45–0.75, Almost perfect, weighted kappa 0.75–1.
Validity of the item assessed by comparing school personnel rating with research team member rating based on key informant interview.
Validity of the item assessed by comparing school personnel rating with research team member rating based on document review.
Validity of the item assessed by comparing school personnel rating with research team member rating based on observation.
CI, confidence interval; NA, not applicable; PE, physical education.
Validity statistics by questionnaire section, item topic (environment, policy, practice, or program participation), and response type are reported in Tables 3 and 4. Weighted kappa statistics for questionnaire section score tertiles ranged from −0.06 (95% CI −0.34 to 0.21) for section 3 (Food and Drink around the School) to 0.69 (95% CI 0.51 to 0.88) for section 4 (Gardens). Average percent agreement ranged from 43.30% (8.42) for section 5 (Nutrition Education and Student Involvement) to 72.09% (28.52) for section 4 (Gardens).
Elementary School Site-Level Assessment Questionnaire Section Score Tertile Ranking Validity and Reliability Using Weighted Kappa
N, number; CI, confidence interval.
Validity and Reliability of Items by Section, Item Topic, and Response Type Using Percent Agreement
SD, standard deviation.
Average percent agreement between validity and original records was lowest for policy items [52.92% (12.56)] and highest for environment questions [70.98% (19.03)]. Average percent agreement for 5-point scale items was lowest [48.02% (20.10)] and was highest for dichotomous items [77.04% (17.87)], a pattern of results that would be expected based on the probability of agreement for response type.
Reliability
Results of reliability analyses are reported for all items in Table 2. Weighted kappa statistics ranged from −0.14 to 1.00 across all items (IQR 0.36). Weighted kappa statistics were unable to be computed for six items, for which original and reliability observations were identical. The least reliable item asked how inviting the student eating area is (weighted kappa −0.14, 95% CI −0.28 to −0.01). Several items had the highest possible weighted kappa of 1.00, including one that asked about participation of students' family members on a school wellness committee. Moderate or better reliability was obtained for 86.8% of items. Percent agreement for reliability ranged from 34.78 to 100.00 (IQR 21.7).
Reliability statistics by questionnaire section, item topic (environment, policy, practice, or program participation), and response type are reported in Tables 3 and 4. Weighted kappa statistics for questionnaire section score tertiles ranged from −0.26 (95% CI −0.52 to 0.01) for section 3 (Food and Drink around the School) to 0.82 (95% CI 0.65 to 0.98) for section 4 (Gardens). Average percent agreement among items in the same section ranged from 60.33% (11.49) for section 8 (Parent and Family Involvement) to 81.16% (14.09) for section 4 (Gardens).
Average percent agreement for reliability was lowest for policy items [66.01% (13.02)] and highest for environment questions [80.71% (14.37)]. Average percent agreement for 5-point scale items was lowest [60.79% (11.69)] and was highest for dichotomous items [80.98% (15.35)], a pattern of results that would be expected based on the probability of agreement for response type.
Based on these findings, the questionnaire was revised by removing items with low validity or reliability and, where appropriate, revising questions and response options based on feedback from respondents or observations the research team made when collecting validity data. Out of 129 items, 60 were kept with no changes, 39 were revised, and 30 were removed from the instrument (Table 2). Of those that were revised, changes were to the stem only for 14 questions, the response options only for 6, and to both for 19 questions. Revisions were generally minor, reducing wordiness or providing clarification such as changing “milk served on campus” to “milk served with meals.” Revisions resulted in improved item-level readability of revised questions as measured by the Flesch/Kincaid Index on the Online-Utility.org Readability Calculator [mean = 0.76 (0.37) grade-level reduction]. 41
Discussion
This article reports the reliability and convergent validity of a new instrument, the Elementary School SLAQ, developed for elementary schools to self-assess healthy eating and physical activity practices, for use in program planning and evaluation. The majority of items were found to have moderate or better reliability (86.8%) and validity (55.0%). These findings are comparable with, and at times better than, those from other measures of school and child care environments. For example, a similar proportion of items on the Nutrition and Physical Activity Self-Assessment for Child Care (NAP SACC) were reported to have moderate or better test/retest reliability (89%) and validity (52%). 27 On the School Health Policies and Practices Study (SHPPS) questionnaires, reliability of categorical and ordinal questions was similar to that found for all items in our study (88% moderate or better), while a lower proportion of continuous and ordinal items from SHPPS (56%) were found to have acceptable reliability. 29
A strength of the Elementary School SLAQ, when compared with many currently existing assessments of institutional nutrition and physical activity practices, is that it was designed to have the sensitivity to detect change between sites and across time points. Using aggregated scores collected by multiple sites, organizations such as public health agencies and school districts can associate the strength of measured institutional practices with available nutrition and physical activity outcome data. When used to assess the same sites over time, changes in the practice scores measured on the Elementary School SLAQ can likely be correlated with changes in measured outcomes. However, because this initial study represents a single year of data collection, it is currently unknown how sensitive to change items are. Future research is planned to measure sensitivity of questionnaire items for detecting change following interventions, as well as differences between sites where interventions differ. 18
Importantly, schools that self-assess using the Elementary School SLAQ benefit from collecting data closely aligned with the Final Rule of the Healthy, Hunger-Free Kids Act of 2010, which outlines school wellness requirements for all local educational agencies and schools participating in national school meal programs. 19 This alignment means that schools can measure and report the extent to which they meet each school wellness requirement of the Final Rule, and to identify areas of Final Rule implementation that may be lacking. While school-level monitoring and reporting of local wellness policy implementation are required by the Final Rule, only a third of local education agencies have conducted monitoring or reporting activities, and administrators report that a lack of vetted tools that are usable by school/district personnel is a significant challenge to accomplishing this. 22
The Elementary School SLAQ is well suited to fill this need, having been designed for use by school personnel and successfully feasibility tested with this audience. 23 School and district personnel may also appreciate the usefulness of an action planning tool that is available to accompany the Elementary School SLAQ to guide them in turning SLAQ results into actionable steps to improve the school health environment (Supplementary Files 1 and 2).
Although the sample size was sufficient to reveal variability in item reliability and validity, a larger sample could offer stronger generalizability. In addition, while the sample included high-priority schools serving the low-income students most often in need of health promotion efforts, findings obtained from this sample may not be representative of all schools in the United States or beyond, and future testing of the instrument in a sample with greater diversity in income and ethnicity would increase generalizability. However, because this sample represents students of color and low-income populations in greatest need of high-quality obesity prevention programs, findings from this study can confidently be applied to planning and evaluating interventions for these priority populations.
The primary statistic used to assess both validity and reliability was weighted kappa, which adjusts for agreement due to chance. 35 However, the kappa statistic is affected by the prevalence of the outcome. In this study, there was very little variation in responses to specific items, reflecting the widespread implementation of some practices, especially those enacted by state or federal law. These include items, such as protecting identities of students receiving free or reduced price meals and not selling sugary drinks, where over 90% of respondents selected a single response category. In cases such as these, an alternate statistic is more appropriate. For this reason, this article presents percent agreement in addition to weighted kappa.
The findings presented here can be used to selectively remove or modify items with lower levels of reliability and/or validity in accordance with the user's needs and objectives. For application by local health departments implementing SNAP-Ed in California, the authors removed or revised items with low validity or reliability. This likely increased the validity and reliability of the questionnaire; ideally retesting should be conducted. While a follow-up study to test the psychometric properties of revised items is not currently feasible, improved readability of revised items suggests that validity and reliability are unlikely to have declined, and key informant interviews are being conducted with school personnel using the revised instrument.
As seen in Table 5, a small number of items with fair or poor reliability or validity kappa strength ratings were retained. These items had good percent agreement and likely received low strength ratings as a result of the limitations of the kappa statistic outlined above. In addition to item-specific changes, the questionnaire has been moved from a fillable Adobe PDF form to an online survey platform as a result of feedback from schools and technical issues. We now recommend use of the revised questionnaire, available from the authors, for assessment in elementary schools.
Validity and Reliability Measures of Items by Decision to Maintain with No Change, Revise, or Remove
N, number; SD, standard deviation.
The questionnaire presented here relies on self-report from school personnel, which has both advantages and disadvantages. Because self-report is subject to social desirability bias, it can be less accurate than other data collection methods such as direct observation or review of school menus. However, overall questionnaire scores ranged from 15% to 85% (mean: 58.6% [standard deviation: 13.4]) of points possible, demonstrating that responses were not uniformly high and respondents were willing to report their practices as less than optimal. The advantages of this self-assessment approach are compelling.
In the context of needs assessment and action planning, the process of self-assessment that actively involves school personnel, particularly school administrators, is critical for ensuring that assessment results are used to inform sustainable change. 42 In addition, while the information collected by direct observation methods is necessarily limited to what can be observed during brief observation periods, a self-assessment questionnaire is able to capture a broader range of information that more comprehensively captures the nutrition and physical activity practices at a school.
Conclusions
This article describes the psychometric testing of a new SLAQ for elementary schools that can support schools to assess needs, create action plans, and evaluate progress, while also offering valid and reliable data for larger scale analysis. Moreover, this new questionnaire fills an identified gap of providing a usable tool for school-level monitoring of local wellness policy implementation in elementary schools. Measured reliability and validity were reasonably good. Future research to measure the instrument's item sensitivity will further inform its usefulness as an evaluation tool.
Footnotes
Authors' Contributions
C.D.R.: conceptualization, methodology, investigation, writing—original draft, and project administration; J.K.: conceptualization, methodology, validation, investigation, writing—review and editing, and project administration; S.C.H.: formal analysis, data curation, and writing—review and editing; C.M.B.: conceptualization, methodology, software, investigation, data curation, and writing—review and editing; A.L.: conceptualization, methodology, and writing—review and editing; G.W.L.: conceptualization, writing—review and editing, supervision, and funding acquisition.
Acknowledgments
This study would not have been possible without the participation of personnel at participating schools, as well as the public health departments and cooperative extension agencies who partnered with us to recruit them. We also thank Evan Talmage for technical assistance with questionnaire deployment, Kaela Plank for assistance with study development, and Azraa Ayesha and Sana Khader for data collection. The California Department of Public Health Nutrition Education and Obesity Prevention Branch supported the effort. Our institution is an equal opportunity employer.
Funding Information
The project was funded by USDA's Supplemental Nutrition Assistance Program Education.
Author Disclosure Statement
No competing financial interests exist for any of the authors.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
