Abstract
Background
Prior data suggests the Mindfulness-Based Interventions: (MBI) Teaching Assessment Criteria (MBI:TAC) has good inter-rater reliability, but many raters knew teacher experience level.
Objective
We sought to further evaluate the MBI-TAC’s inter-rater reliability and obtain preliminary data on predictive validity.
Methods
We videorecorded 21 MBSR teachers from academic and community settings. We trained 19 experienced MBI teachers in using the MBI:TAC. MBSR teachers were rated by three assessors; teachers and their assessors did not know one another. To assess predictive validity, MBSR students in courses taught by 18 of the MBSR teachers were invited to complete PROMIS-29 measures before the MBSR course, at the end of the course (month 2), and month 4.
Results
Intraclass correlation coefficients (ICCs) representing a single rater ranged from 0.33 to 0.56 on the 6 MBI:TAC domains. Using an average of two raters, ICC estimates ranged from 0.48 to 0.71 and ICCs generalizing to an average of three raters ranged from 0.6 to 0.8. Among n = 152 participating MBSR students, we found improvements from baseline to 2 months and 4 months in PROMIS measures of Anxiety, Depression, Fatigue, Sleep, and Social Role function (range in improvement 2.3 to 6.3, P < 0.0001 for all comparisons except Social Role at 2 months, P = 0.007). Higher MBI:TAC ratings were associated with greater improvements in anxiety among MBSR students from baseline to 2 months, with a −0.31 lower participant anxiety score per 1 unit increase in MBI:TAC composite teaching rating (95% CI −0.58, −0.05, P = 0.019), but we did not find statistically significant relationships with improvements in other PROMIS-29 domains.
Conclusions
ICCs indicated good reliability using an average of three ratings, but inter-rater reliability was only fair using a single rater. We found initial validation that higher MBI:TAC ratings predicted greater improvements in anxiety symptoms in MBSR participants.
Keywords
Mindfulness-based approaches such as Mindfulness Based Stress Reduction (MBSR) have gained significant empirical support, showing benefits for the treatment of chronic pain, 1 substance use disorders, 2 , 3 anxiety disorders, 4 and depression. 5 Teacher skill is likely critical to the quality of mindfulness-based interventions (MBIs). In psychotherapy, there are clear indications that therapist skill influences outcomes. 6 In the case of MBIs, teaching competency may be even more important in influencing outcomes. MBI delivery has emphasized the centrality of the teacher’s capacity to embody mindfulness through their way of being in the teaching space, rather than conveying concepts cognitively. 7
Defining which teacher-related factors can both be feasibly and reliably assessed and shown to predict participant outcomes is potentially important in selection of teachers for MBI research studies and monitoring of intervention delivery quality. 8 Identifying teacher factors that influence outcomes may also be important for supporting the integrity of implementation of evidence-based MBIs, 9 and for strengthening teacher training for research and clinical programs. 10 The Mindfulness-Based Intervention Teacher Assessment Criteria (MBI:TAC) instrument may be a useful tool in assessing teaching competence in MBIs. We use “competence” in a specific way in this context: the knowledge, skills and attitudes relevant to leading MBIs. 11 Assessing the competence of MBI teachers is challenging because it is multi-dimensional and the key aspects need to be defined and validated. The MBI:TAC development involved a close analysis of the MBI teaching process by a group of teacher trainers from three university training centers. 12 These teacher trainers conducted a series of developmental stages in which the face and content validity of the tool were tested by practical application in training and research contexts. 12
In initial assessment of the reliability and validity of the MBI:TAC, the internal consistency was high (α = .94), as was the inter-rater reliability, with an overall intraclass correlation coefficient = .81 (range = .60-.81). 12 (There are several limitations to this prior work, however. First, inter-rater reliability was tested using 16 assessors from three centers, some of whom developed the instrument. For broader dissemination, it important to know whether training of a broader, more diverse group of assessors is feasible and still results in good inter-rater reliability. Second, the validity of the MBI:TAC in distinguishing experienced from novice teachers was tested in situations in which the assessor was usually aware of the experience of the teacher. Blinding assessors to the experience and background of teachers being rated provides a more rigorous assessment of whether more experienced teachers are rated as having greater skill. Third, there has been limited evaluation of whether teacher skill as measured by the MBI:TAC is related to participant benefit from mindfulness-based programs (predictive validity). We carried out the PrOMPT (Predictors of Outcomes in MBSR Participants from Teacher Factors) study, which we report on here, to address these three issues. The study aimed to assess the inter-rater reliability of the MBI:TAC when evaluating teachers who were not known to the assessor, using a panel of recently trained assessors from a variety of centers. 13 We also asked MBSR students in courses taught by the teachers being evaluated in the PrOMPT study to participate in surveys before and after the course so that we could obtain preliminary predictive validity data on whether teacher skill, as measured by the MBI:TAC, was associated with the amount of change in validated measures of outcomes such as depression, anxiety, and stress.
Methods
Participants and Study Procedures
This study was reviewed and approved by the Institutional Review Board of University of California, San Francisco, and all participants provided written, informed consent. We recruited MBSR teachers from several sites that agreed to provide information about the study to MBSR teachers and students. These sites consisted of MBSR programs at academic medical centers (University of California, San Francisco and University of Massachusetts Medical School); and community programs in Florida, New York, North Carolina, Florida, Texas, and in Canada. The sites were selected to include both academic medical centers and community-based programs. MBSR teachers were eligible to be included in the study if they agreed to participate and were not participating in a training on the MBI:TAC instrument as assessors for this study. Participating teachers agreed to video record their 8-week MBSR courses. Recordings were made with the video camera pointing toward the teacher (and away from students). Students in the MBSR course were informed about the recording process and the procedures in place to protect students and teachers; these included use of a secure, password-protected server for digital storage, and the fact that recordings would only be used for research and carefully selected training purposes. MBSR students were also told that they could sit in locations that were not adjacent to the teacher to avoid having their faces included in the recording. Three MBSR teachers we recruited had video recordings of courses they previously taught that we used for assessment of inter-rater reliability in this study. As they were not teaching courses in which we could recruit participants at the time we were enrolling MBSR participants these three teachers were not included in the predictive validity component of the current study.
Once an MBSR teacher agreed to study participation and provided written informed consent, students who registered for their MBSR courses were provided with study information including links to a study website with further information about the study. Inclusion criteria for MBSR students were: (1) enrollment in an MBSR classes taught by a participating MBSR teacher during the study period, (2) providing consent to participate in the study after receiving detailed information about what participation involved, and (3) age 18 years or older. Those who were interested in enrolling were asked to view a video describing the study and the importance of follow-up if they enrolled. After viewing the video, students who were interested in participating signed an online consent form and complete enrollment online. If preferred, potential research participants could also request to be contacted by phone, or could call a study coordinator directly to learn more about the study, and could complete study enrollment and assessments steps in person at University of California San Francisco or the Center for Mindfulness at University of Massachusetts rather than online.
Measures
We used a panel of assessors who were trained in using the MBI:TAC instrument to rate teaching skill in six domains. The training process has been described in detail previously. 13 In brief we conducted a 7-session course to train experienced MBI teachers in using the MBI:TAC;18 assessors provided ratings in the current study. Three assessors, none of whom knew the teacher, then rated each MBSR teacher. We randomly selected two recorded sessions from each teacher for rating, with one selected from the first four classes and the second recording from the last four classes of the teacher’s 8-week MBSR course. Assessors assigned an initial rating after viewing the first session, then made a final rating of the teacher after watching the second session. The MBI:TAC instrument has six domains: (1) coverage, pacing, organization; (2) relational skills; (3) embodying mindfulness; (4) guiding mindfulness practices; (5) conveying course themes through interactive inquiry and didactic teaching; and (6) holding the group learning environment. 12 Each of these domains is rated on a six-point scale, ranging from “1: incompetent” to “6: advanced.” There is also a summary score across all six domains.
MBSR students who agreed to be part of the study were asked to fill out an online survey three times: (1) prior to starting the MBSR course, (2) 2 months later (immediate post-MBSR), and (3) 4 months later (post-MBSR follow-up). The survey included the PROMIS-29 profile v1.0, 14 which has sub-scales for measuring fatigue, depression, anxiety, sleep disturbance, physical function, social role function, pain interference, and pain intensity. We also included the Perceived Stress Scale 4-itemshort form 15 to measure perceived stress.
Analysis
We used Stata (version 16) for statistical analyses, We evaluated inter-observer variability using the intraclass-correlation coefficient (ICC). 2 ICC provides a measure of how strongly ratings of the same item (e.g., a domain of the MBI:TAC) by different raters resemble each other, with values ranging from zero0 (only random agreement) to 1 (perfect agreement). We calculated the ICC for each of the six domains in MBI:TAC, using the ratings from the different assessors. We also calculated a composite score on the MBI:TAC by the summing the six MBI:TAC domains, and calculated the ICC for the composite score. We made use of the pool of three different assessors’ rating of each teacher in several different ways. We calculated the ICC using each assessor separately to estimate the ICC if a single assessor is used to rate a teacher. As averaging the ratings from three different assessors should provide a higher ICC that reduces variability due to individual assessor diffferences, we also used the ratings obtained from three different reviewers’ rating of each component of the MBI:TAC to model the ICC if ratings from three different assessors were obtained and averaged. As each teacher was rated by three different assessors, but the combination of assessors differed for each teacher, this is meant to provide an estimate of the ICC if a panel of three assessors is used to evaluate a teacher and the rating of each MBI:TAC domain is averaged. Third, to provide a robust estimate of the ICC values if we obtained ratings from two assessors, we made use of ratings from all three assessors and implemented a resampling procedure with 10,000 reps in which we randomly selected 2 of 3 ratings for each teacher to obtain estimates of ICCs if the average of 2 assessors was used for ratings. The recommended practice for MBI:TAC ratings has been to assess two classes before making a rating. To assess whether reviewing two MBSR classes rather than a single class improved the inter-rater reliability, we asked assessors to provide an initial rating after reviewing a single video recording of an MBSR course and calculated ICC values. We then compared the ICC after viewing a single class to the ICC values after viewing two classes. Based on prior examples, we pre-defined ICC values of at least 0.6 to be good agreement, with 0.75 or greater considered excellent agreement. 16 To assess whether MBI:TAC teacher ratings predicted change in MBSR student outcome measures, we used linear mixed effects models with participant PROMIS-29 measures as outcomes (one model per outcome), and MBI:TAC rating, time point, and their interaction, as predictors, and with random effects for students nested within teachers. Models were used to estimate marginal slopes of teacher ratings on PROMIS-29 outcome measures at 2 and 4 months. Pearson correlation coefficients were calculated for the correlation between participant outcome measures and teacher MBI:TAC ratings at follow up time points.
Results
Baseline Characteristics of MBSR Teachers in the PrOMPT-F Study.
an = 1 teacher checked only “other” for race/ethnicity, and in the follow up text box listed “Chinese and White European”.
Baseline Characteristics of PrOMPT-F MBSR students in Courses Taught by Teachers who Were Rated using the MBI:TAC Instrument.
aAmong 5 participants listed as race/ethnicity = “other,” n = 2 identified as both White and as having Hispanic/Latino ethnicity (for their self-reported “other” race, n = 1 self-reported “Nicaraguan”; and n = 1 self-reported “Hispanic” in a text follow up field). One of the 5 who selected “other” identified as White but did not identify as having Hispanic/Latino ethnicity, and this participant self-reported “Cape Verdean” as their other race/ethnicity. The remaining 2 participants who listed race/ethnicity as “other” did not select any other race category, and both identified as having Hispanic/Latino ethnicity; both also self-reported “Hispanic” in the “other” race text follow up field.
Intraclass Correlation Coefficients (ICCs) for MBI:TAC Domains When Rating Mindfulness-Based Stress Reduction Teachers.
Notes: ICC = intraclass correlation coefficient. The final rating represents the rating after the standard process of three assessors viewing two MBSR classes. We also asked assessors to make a rating after viewing the first class (1 video rating). The individual ICC represents the ICC if using a rating from a single assessor. The Average represents the ICC if using an average of 3 assessors. The estimates based on 2 assessors were generated by a random resampling procedure with 10,000 reps.
To assess how much reviewing two MBSR classes rather than a single class improved the inter-rater reliability, we asked assessors to provide an initial rating after reviewing a single video recording of an MBSR course (Table 3). The ICCs of ratings done after reviewing only one session were substantially lower when using a single assessor, with the highest ICC being 0.37. When we used the average of three assessors viewing a single class session, ICCs were lower than after viewing two sessions, but were still above 0.5 for all domains except domain 2 (relational skills, ICC = 0.32).
Association of Composite MBI:TAC Teacher Rating With Change in MBSR Participant PROMIS Measures.
Notes: We report Pearson correlation coefficients for the association of participant outcomes (PROMIS-29 measures) at 2 and 4 months with teacher’s mean MBI: TAC composite score (the composite score was defined as the sum of scores across 6 domains; these scores were averaged across three assessors). The 95% confidence intervals for the correlation coefficient were based on Fisher’s transformation. The slopes of MBT:TAC teacher ratings on outcome measures at 2 and 4 months (with 95% confidence intervals and associated P-values) were derived from linear mixed effects models, and represent the change in participant outcomes with each one unit increase in the teacher’s mean MBI:TAC composite score.

Title: MBSR participant Mean PROMIS Anxiety Score Over Time by Teacher MBI:TAC Rating. Legend: The y-axis shows the mean MBSR participant’s score on the PROMIS Anxiety measure at baseline (0 months), 2 months (end of MBSR course), and 4 months (two months after the end of course). Participants are divided into four quartiles based on the average composite score of their MBSR teacher across all six MBI:TAC domains. Participants with teachers in the 1st quartile of MBI:TAC ratings (highest rating) had the greatest decrease in PROMIS Anxiety scores, followed in order by each of the remaining quartiles (P = 0.019, linear mixed model of MBI:TAC score predicting change in PROMIS Anxiety measure). The change in PROMIS Anxiety score by teacher MBI:TAC rating was no longer statistically significant at 4 months (P = 0.96).
Discussion
We had several important findings in this study of the MBI:TAC instrument that may be particularly relevant for its use in research studies, but also have implications for its use in other settings in which it is used to evaluate teaching competence. Overall, we found good inter-rater reliability for the average of ratings from three assessors, with ICC’s ranging from 0.60 to 0.80 on different domains after viewing two MBSR sessions. This helps to further validate the instrument. However, ICC’s were lower when using only one assessor, with ICC’s above .50 for only three of the six domains, indicating limits on inter-rater reliability for several of the MBI:TAC domains when using a single assessor. This suggests that for purposes where a high degree of inter-rater reliability is needed with the MBI:TAC, averaging across several assessors is optimal. ICCs for an average of two assessors were lower than for an average of three assessors, but found good agreement (
The ICCs in this study are lower than in a prior report, where the ICC using a single assessor ranged between 0.60 and 0.81. 12 Several differences between the methods used in the current study and those used in earlier studies may account, at least in part, for the lower ICCs we observed. First, in the prior study, the teachers being rated were typically known to the assessor, including their level of experience in teaching. Knowledge of the teacher’s background may have provided additional information that assessors used when rating the teacher, resulting in more consistent or potentially biased ratings. In contrast, in the present study we selected assessors who did not know or recognize the teacher being assessed. Second, we used assessors who had gone through a standardized training in use of the MBI:TAC and were experienced teachers themselves, but in general assessors in our study had less experience using the MBI:TAC than in prior studies, and may have had less opportunity to develop a scoring approach that was closely calibrated to other assessors in the study. 13 This was planned intentionally to represent inter-rater reliability that might be obtained after training new assessors. Third, our assessors came from multiple countries and may have had greater diversity in their training experiences and approach to MBSR teaching as well as diversity in cultural and language backgrounds than assessors in earlier studies, most of whom were trained in the UK. Of note, however, assessors used in the current study had substantially higher ICCs when rating selected test videos at the end of their training, when ICC’s ranged between 0.67 and 1.0. The lower ICC in the current study using many of the same assessors might be due, in part, to greater challenges in rating the videos used in this study. The earlier test videos were shorter, focused on specific sections of a class, and selected to check calibration of ratings after training. It is possible that the longer and more “real life” MBSR class sessions, with greater diversity of class activities being evaluated in this study were more challenging to evaluate consistently. The ratings for this study were also done at least 6 months after the training was completed, and it is possible there was some loss of shared calibration of ratings over this time.
We also evaluated whether assessing a single MBSR session rather than viewing two MBSR sessions, as has been standard practice, yielded ICC’s that were fairly similar to those from watching two sessions. If reliable ratings could be obtained after viewing a single session, this could reduce the time needed to obtain an MBI:TAC rating nearly in half. Unfortunately, we found that ICC’s were substantially lower after viewing a single session, suggesting that viewing two sessions is more optimal for inter-rater reliability.
There are several implications of this study for the use of the MBI:TAC in research studies. We believe our results suggest that two or more assessors should usually be used to get good inter-rater reliability for fidelity assessments in research studies. For good inter-rater reliability, two class sessions need to be viewed rather than one. This is labor intensive and thus resource intensive (e.g., about $250 or more per teacher evaluated by one assessor) and relies on the availability of trained assessors. This project has expanded the pool of trained assessors, and developed materials that can be used for future training of assessors, making it more feasible to have trained assessors available. The cost of obtaining ratings from skilled assessors may still make it challenging to use the MBI:TAC in studies with limited resources, however, such as in pilot studies. An important future direction may be the development and validation of instruments for teacher rating by participants, which could offer less expensive, if potentially less accurate, measures of teacher skill. Checklists of elements of teaching that can be evaluated by study staff may also provide an important measure of fidelity with lower cost.
This study was also intended to assess feasibility of a study design to assess whether MBI:TAC ratings predict participant benefit on selected outcomes, and to obtain preliminary data on associations between MBI:TAC teaching ratings and outcomes in students taking MBI courses in typical academic and community-based centers. While there was some loss of follow-up in our design—which we believe might be improved in future research—overall retention was adequate. We found that higher MBI:TAC teacher ratings predicted greater improvements in anxiety at 2 months, the end of the MBSR course, suggesting that teaching skill as rated by the MBI:TAC was important for reducing anxiety in course participants (Figure 1). By 2 months, average PROMIS Anxiety scores were no longer in the elevated range (< 55) in participants in MBSR courses taught be teachers in the upper half of MBI:TAC ratings (1st and 2nd quartiles). However, by 4 months these differences based on the MBI:TAC were no longer statistically significant. This occurred despite slight overall improvements in average PROMIS anxiety scores between 2 and 4 months. By month 4, average PROMIS Anxiety scores were no longer elevated in participants across all four MBI:TAC quartiles of MBSR teachers, however. This could be consistent with more gradual improvements in anxiety after the MBSR course that were less dependent on teacher skill.
We did not find clear evidence of a relationship between participant improvements on other PROMIS-29 domains and MBI:TAC ratings. One factor may be that participants in this study appeared to have greater elevation of anxiety than other PROMIS 29 scales, which was the only scale elevated above the normal range, while other scales were in a generally normal range (between 45 and 55), leaving less room for improvement (floor effects). The importance of anxiety to the population we studied was also reflected in the answers to why participants had chosen to take the MBSR course: “reduce anxiety” and “stress reduction” were two of the four most common reasons for taking the course. In contrast, only 3% of participants noted pain as a reason for taking the course, and the PROMIS pain interference score (49) was slightly below average for a US population. The modest room for improvement in pain interference limited the utility of this measure in assessing predictive validity of the MBI:TAC for this outcome in the population we studied.
An important limitation of the current study in assessing whether MBI:TAC teacher ratings predict participant outcomes is that it was designed to collect preliminary data for designing future studies of the relationship between teaching skill and outcomes in MBIs, and was not designed to collect definitive data. For future studies of the relationship between teaching skill and MBI outcomes, our study suggests it may make sense to restrict the outcomes studied to those that are central for the population studied (e.g., assess pain in a population seeking the program for a pain issue). Alternatively, if a general population taking MBSR is studied, this may require a large population in which sub-sets of individuals seeking the program for specific reasons, such as chronic pain, can be used in testing the relationship between teacher skill and pain outcomes. Another potential limitation is that we evaluated seven different outcome measures from the PROMIS-29 and did not adjust for multiple comparisons. We did not plan adjustment for multiple comparisons for several reasons, including that we were collecting preliminary data, some of the outcome measures (e.g., anxiety and depression) are correlated which is not optimal for the assumptions of most multiple comparison adjustments, and the need for multiple comparison adjustments in this type of study is controversial. 17 The consistency of finding that five out of the six domains on the MBI:TAC had statistically significant associations with anxiety at 2 months provides additional reassurance in these associations. Nevertheless, further validation of these findings is needed for greater confidence in their implications. Another limitation of our study is that MBSR teachers and participants were mostly college graduates or had more advanced degrees, and were mostly white. This limited diversity reflects, in part, the demographics of current MBSR teachers and participants, but is an important limitation of the current study that the authors hope can be better addressed in future studies.
The experience of training MBI:TAC assessors in this study led to significant learning and subsequent adjustments to future training implementation practice. 13 This included recognizing the diversity of motivations for engaging in training to use the MBI:TAC and creating tailored trainings for these various aims. 18 Trainee motivations include building skills in training MBI teachers, 19 supervising MBI teachers, 20 conducting assessments of MBI teachers, 21 and as an informal tool to enable personal reflection on MBI teaching skills. 22
In summary, this study provides further data on the inter-rater reliability of the MBI:TAC instrument. Our data suggest that if higher ICCs are important, averaging ratings from more than one assessor is optimal, and that reviewing two MBSR course sessions rather than one provides a higher ICC. We found preliminary data that greater teaching skill, as measured by the MBI:TAC, predicts greater improvements in student anxiety at the end of the MBSR course, providing initial predictive validity of the instrument. Further research is needed to assess the relationship of MBI:TAC ratings to other student outcomes. Future evaluations of the relationship between teaching skill and participant outcome need to select research measures that are sensitive to the particular issues that are meaningful and relevant to the population in question.
The MBI field holds significant promise for addressing wellbeing in individuals and groups. Realizing this potential requires that the ‘thorny issue of clinician training’ 23 and subsequent teaching skill is engaged with and is folded into the research journey going forward.
Footnotes
Acknowledgments
This research was supported by the Predictors of Outcomes in MBSR Participants from Teacher Factors Project Period grant: R34AT008948 (Hecht/Brewer), and Mentoring and Research in Integrative Medicine: K24AT007827 (Hecht).
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Institutes of Health, National Center for Complementary and Integrative Health 5R34AT008948 (Hecht/Brewer), K24 AT007827 (Hecht).
