Abstract
This study is a replication of Rousmaniere et al., in which no differences in client outcome between supervisors were found and few differences in client outcome due to either degree level or experience as a supervisor were found. Hierarchical linear modeling was used to determine variance estimates in client outcome accounted for by supervisors. The longitudinal archival data set consisted of 3,030 clients, 80 therapists, and 39 supervisors at a University Counseling Center in the Rocky Mountains. Therapists practiced psychodynamic, strategic, cognitive behavioral therapy (CBT), solution-focused, and family systems approaches. Average improvement of clients was 7.91 points across supervisors using the Outcome Questionnaire-45.2 (OQ-45.2). Consistent with Rousmaniere et al., the amount of variance in client outcome attributable to clinical supervision was less than 1%. Implications indicate supervision may be enhanced by increased focus on aiding professional development of supervisees and emphasized future clarification surrounding the role of improvement of client welfare by supervisors.
Clinical supervision, widely regarded as an essential part of psychotherapy training, has two related yet distinct goals: (a) facilitating the professional development of supervisees and (b) protecting/enhancing the welfare of psychotherapy clients (Bernard & Goodyear, 2014). Over three decades of research have demonstrated that supervisors can have an impact—both positive and negative—on supervisees (Bernard & Goodyear, 2014). For example, effective supervision has been associated with increased supervisee self-efficacy (Gibson et al., 2009), decreased supervisee anxiety (Inman et al., 2014), skill acquisition (e.g., Lambert & Arnold, 1987), increased supervisee autonomy and openness, and reduced confusion about professional roles (Ladany & Friedlander, 1995). On the other end of the spectrum, “inadequate or harmful supervision” (Ellis et al., 2014, p. 434) is associated with increased supervisee anxiety (Gray et al., 2001), decreased supervisee self-disclosure (Mehr et al., 2010), increased multicultural misunderstandings (Ladany & Inman, 2012), and even causing supervisees to drop out of the field entirely (Ellis, 2001).
Although the impact of supervision on supervisees has been clearly established, its impact on clients is still debated (e.g., Ladany & Inman, 2012; Reiser & Milne, 2014; Watkins, 2011). Given that the impact of supervision on client outcome has been called the “acid test” of good supervision (Ellis & Ladany, 1997, p. 485), there have been frequent calls for examining this topic (e.g., Hill & Knox, 2013; Watkins, 2011). Four recent reviews of the literature have been conducted, with mixed results. In a review of 10 studies, Freitas (2002) found some relationship between supervision and client treatment outcome. However, all studies in that review were found to have methodological problems significant enough to raise questions about the validity of the findings (e.g., use of outcome measures with poor psychometric properties, nonrandom assignment of participants). Milne et al. (2011) examined 11 studies and concluded that “the blend of training and supervisory methods . . . were effective in facilitating supervisor and supervisee (therapist) development, which . . . was associated with patient benefits” (pp. 61–62). However, the authors noted that only two of the 11 studies in the review examined the direct effects of supervision on client psychotherapy outcome, so the “clinical outcome estimate should be treated with great caution” (p. 62). In a third review, Watkins (2011) examined 18 studies and also found mixed results, concluding “we do not seem to be any more able to say now (as opposed to 30 years ago) that psychotherapy supervision contributes to patient outcome” (p. 235). In the fourth and most recent review, Reiser and Milne (2014) raised conceptual and methodological questions about supervision outcome research. The authors proposed a fidelity framework and examined 12 studies from that perspective. Of the four reviews on this topic, their conclusions represent the strongest support for the supervision–outcome connection: there is reason to believe that supervision that improves adherence to an empirically supported protocol appears likely to improve client outcomes . . . [and] we can conclude that supervision is likely to outperform no supervision in terms of client outcomes. (Reiser and Milne, 2014, p. 154)
An often-cited reason for the lack of empirical clarity in this area are the considerable methodological hurdles involved in empirically examining supervisors’ impact on client outcome (e.g., Ellis & Ladany, 1997; Reiser & Milne, 2014). For example, supervisees are legally required to receive supervision, which prohibits the use of a no-supervision control group for an experimental study. In addition, many potentially mediating variables exist at the supervisor, therapist, and client levels (e.g., the therapeutic working alliance, supervisory working alliance), all of which could minimize supervisors’ impact on client outcome (Wampold & Holloway, 1997). It has been suggested that client variables themselves may account for most of the variance in psychotherapy outcome (e.g., symptom severity, treatment history; Bohart & Tallman, 2010), making it difficult for supervisors to “reach through” and effect client outcome.
Despite these methodological challenges, there has been a recent rise in empirical research in this area. Six recent studies on this topic warrant attention in particular. The study that is perhaps most highly regarded in the literature is Bambling et al. (2006). These authors were able to utilize a randomized experimental design because supervisee participants were licensed therapists. One group of therapists (n = 67) received weekly supervision; the other group (n = 60) received no supervision. After 8 weeks of treatment, clients of therapists in the supervision group had significantly lower depression scores and higher working alliance scores than clients in the no-supervision group. Notably, clients in the supervision group had much lower rates of noncompletion at eight sessions (3.0% and 6.1%) than clients in the no-supervision group (35%).
In two studies, experimental designs were used to examine the effects of using client outcome feedback within supervision. Reese et al. (2009) performed a controlled study within which the outcomes of 28 second-year trainees (19 in a master’s-level marriage and family therapy program, nine in a doctoral clinical or counseling psychology program) who received supervision that included regular outcome feedback (n = 11 supervisees and four supervisors) were compared with the outcomes of trainees receiving supervision without regular outcome feedback (n = 17 supervisees and three supervisors). Data from 115 psychotherapy cases indicated that trainees in the supervision-with-feedback condition had significantly better outcomes than trainees receiving supervision without feedback. However, in a larger replication study with 18 supervisors, 44 supervisees (15 master’s-level clinical/counseling psych and 15 master’s-level marriage and family therapy in their first or second semester of practicum, and 14 doctoral-level counseling psychology in their third or fourth semester of practicum), and 138 clients, found that outcome scores were similar across supervision conditions.
In three studies, the supervision outcome question has been investigated using naturalistic data to examine whether client outcomes varied depending on who was assigned to supervise the case. Callahan et al. (2009) examined the outcomes of 76 adult psychotherapy clients at a university training site randomly assigned to 40 pre-internship supervisees in a doctoral-level clinical psychology program who were in supervision with nine supervisors and found that supervisors had a moderate effect on client outcome, accounting for 16% of the variance. In a replication study with data from a different training site (Wrape et al., 2015) with 23 supervisors, 75 supervisees (practicum students in doctoral-level clinical or counseling psychology), and 310 clients, supervisors were again found to have a moderate effect on client outcome. Wrape et al. (2015) also examined supervisor demographics, training era, and faculty status, and found that none had a significant correlation with client outcome.
In a third recent naturalistic study, Rousmaniere et al. (2014) used 5 years of data from a private nonprofit mental health center (23 supervisors, 175 supervisees, and 6,521 clients) to examine whether outcomes varied by supervisor. Supervisees were an unspecified mix of practicum students (in social work, psychology, marriage and family therapy, and pastoral counseling) and interns who had graduated with a master’s degree in psychology, social work, or marriage and family therapy. Conflicting with the other two naturalistic studies, data analysis via hierarchical linear modeling (HLM) indicated that supervisors accounted for 0.04% of variance in client outcome. The lack of variability between supervisors in client outcomes was demonstrated across variables at the supervisee level (interns vs. residents) and supervisor level (supervisory experience; field: social work vs. psychology; degree level: MS vs. PhD).
In summary, there is conflicting evidence from both experimental and naturalistic data about the influence of supervision on client outcome and whether client outcome may vary among different supervisors. The purpose of this study was to replicate Rousmaniere et al. (2014) by examining the amount of variance in client treatment outcome as measured by the Outcome Questionnaire-45.2 (OQ-45.2) accounted for by supervisors and therapists and to provide an extended analysis of client treatment duration at the supervisor and therapist levels. The benefit of examining duration of treatment in this study will add further context to the effect of supervisors on the client outcomes of their supervisees. Although treatment duration is not an indication of a favorable or unfavorable therapy experience in itself, the authors of this study wished to examine if supervisors might have an impact on duration independent of treatment outcome. Given that the variable of duration of treatment was not examined in the original study, we felt that further exploration of this variable would assist in understanding the overall potential impact of supervision on client treatment. If the results of this study are indeed similar to those of Rousmaniere et al. (2014), this information may assist in the explanation as to whether any of these variables are affected more so than others as a result of the process of supervision.
Method
Participants
The archival data in this study were collected at a university counseling center, located in the Rocky Mountain Region over a 6-year period from August 2009 to August 2015 as a part of routine care and practice. Just as in the original study by Rousmaniere et al. (2014), naturalistic data included client outcome which was tracked using the OQ-45.2 throughout the course of treatment. Consequently, following the collection of the client’s self-reported responses on the OQ-45.2, client outcome was evaluated by therapists prior to each session using this measure. The results from the OQ-45.2 measure were used to influence the course of therapy throughout the duration of treatment. The final sample included data from 3,030 client participants in this study. The clients in this study were seen by 80 different therapists who were supervised by 39 different supervisors. This data sample included therapists who saw two or more clients. It also included supervisors who supervised two or more therapists. Due to the assumptions involved in HLM, supervisors who supervised a single therapist or therapists who provided therapy to a single client were not included in the study.
Clients
The average age of client participants was M = 22.95 years, SD = 3.53, ranging from 17 to 60 years. Just over half of them (57.1%) identified as female and 42.9% identified as male. The vast majority identified as White (90.2%), with 3.9% identifying as Hispanic, 2.4% Asian, 0.8% Black, 0.5% Pacific Islander, 0.4% American Indian/Alaska Native, 0.1% Hawaiian, and 1.0% other. Two thirds were single (66.6%), 32.4% were married, 0.9% were divorced, and 0.1% widowed. All clients were full-time students at the participating university and were seeking counseling services at the university counseling center. Presenting concerns ranged from clinical diagnoses such as anxiety and depression (52% and 47%, respectively) to more developmental concerns such as self-esteem (16%) and relationship problems (24%), with nearly all clients endorsing more than one area of concern. Clients were included if they received individual therapy only; those receiving additional services were excluded. The average score on the OQ-45.2 at the start of treatment for participants was M = 69.47, SD = 22.98. Clients attended 6.86 sessions (SD = 6.38) on average. The range for the total number of sessions that clients attended was between 1 and 63. During this time frame, there were no session limits at the University Counseling Center and approximately 20% of clients (rounded up to the nearest percent) attended therapy for a single session. Historically, students at this clinic have averaged between five and seven sessions with outliers having attended hundreds of sessions (nearly weekly therapy for several years).
Therapists
The 80 therapists (unlicensed supervisees) in this study represented one of the two distinct groups. One group was made up of practicum students and predoctoral interns. These individuals were currently in the process of acquiring the necessary experience to complete their doctoral degree in either counseling or clinical psychology. The other group in this study was made up of residents. These individuals had already completed their degree in either counseling or clinical psychology and were actively working toward licensure as a practitioner. Supervisees were assigned to supervisors based on a preference process, with those with most experience (residents) having first choice in supervisors down to first-year practicum students being assigned with little choice. The amount of supervision that supervisees received was based upon the student’s program of study or internship requirements. Beginning supervisees (first- and second-year) received 1 hr of supervision per week; more advanced students (third- and fourth-year, predoctoral interns, and postdoctoral residents) received 2 to 3 hr of supervision each week. All supervisees received supervision from one supervisor at a time and did not change supervisors during the course of the study. Of the 80 therapists in the study, each therapist met with an average of 37.88 clients, SD = 22.31, ranging from four clients for one therapist to 108 clients for another therapist. First- and second-year supervisees did not receive formal group supervision throughout the process of data collection. However, supervisees did participate in student practicum courses consisting of six to eight students each where training and review of cases occurred. Given that this study was archival and naturalistic in nature, much of the demographic information on supervisees was unavailable and accordingly could not be included in an overall description of this sample.
Supervisors
The average number of supervisee clients being overseen by a supervisor was 77.69 clients, SD = 70.79. The number of clients per supervisor ranged from six clients for one supervisor to 353 clients for another supervisor. All supervisors possessed doctoral degrees in either counseling or clinical psychology. Supervisors ranged in age from 30 to 71 years. Again, due to the archival nature of this study, information surrounding the gender of supervisors and their acquired years of experience were not readily available. A diverse set of theoretical modalities were employed; no modality was used more frequently than the others. These modalities ranged from psychodynamic, strategic, solution-focused, cognitive behavioral, to family systems approaches. Every supervisor and therapist was free to practice from whatever modality of treatment they preferred. Supervisors and supervisees collaboratively determined which modality to work from throughout the treatment process, and supervisees worked from this modality in their treatment of clients.
Measure
The outcome measure utilized in this study was the OQ-45.2 (Lambert et al., 2004). This is a self-report measure made up of forty-five 5-point Likert-type scale items which inform the weekly progress of clients. The measure is given prior to each therapy session and allows therapists to observe changes throughout the duration of treatment. Total scores range from 0 to 180 with higher scores, indicating a higher amount of disturbance or distress in the client’s life and functioning. Scores in the nonclinical range of therapy services are 63 and below and scores in the clinical range of therapy services are 64 and above. The items on this measure fall into three distinct subscales: (a) Social Role Performance (e.g., “I feel stressed at work/school”), (b) Interpersonal Relationships (e.g., “I feel lonely”), and (c) Subjective Distress (intrapsychic functioning, for example, “I feel blue”). The psychometric properties of this measure have been established. The OQ-45.2 has a test–retest reliability of .87 and an internal inconsistency of .93. In addition, Lambert and colleagues (2004) have shown that the measure has high concurrent validity with a number of other measures. Among them are the Social Adjustment Scale (Weissman, 1999), the Beck Depression Inventory (Beck et al., 1996), the Inventory of Interpersonal Problems (Horowitz, 2002), the State-Trait Anxiety Inventory (Spielberger et al., 1970), and the Symptom Checklist-90 Revised (Derogatis, 1986), as well as other measures. Vermeersch et al. (2000) found evidence for this measure to reliably and validly detect slight changes for clients throughout the course of treatment. They also found evidence that this measure has shown stability through time for those individuals who are not receiving therapeutic services. Other findings suggest that there are no discrepancies between genders for the OQ-45.2 (Lambert et al., 2004).
Procedure
The data for this study were naturalistic and archival in nature. Hence, when the data were collected, therapy was being conducted on a treatment as usual condition for both therapists and clients. The majority of clients in this study were assigned on a “next available appointment” basis—meaning when clients called, they were assigned to whomever was the first clinician available matching their schedule. However, a very small minority, less than 3% of clients, chose to be assigned to particular therapists due to their requests for specific modalities of treatment, therapeutic style, personal recommendations, and so on. All therapy was provided at no cost. As a part of routine practice, each client filled out the OQ-45.2 before each therapy session. Throughout the course of treatment, the client OQ-45.2 scores were available for review by supervisors and supervisees prior to each therapy session. There is no data to indicate whether, or how often, supervisors or supervisees reviewed OQ-45.2 scores for each client throughout the duration of treatment.
Data Analysis Plan
In this study, we were interested in testing whether there were differences in clients’ treatment outcomes and treatment duration between supervisors which were the dependent variables in this study. HLM was used to test for these differences, with treatment outcomes and duration for clients (Level 1) nested within therapists (Level 2) and therapists nested within supervisors (Level 3). Specifically, the Level 1 variable was continuous and consisted of the number of sessions of treatment and OQ-45.2 measurements at the client’s intake and final session which was included as a fixed Level 1 predictor in the models. For each set of analyses, three models were built. The first was an empty model with a fixed intercept, and the client treatment duration or treatment outcome was not allowed to vary between therapists or supervisors. The second model added therapists at Level 2; thus, the client treatment duration and outcomes were allowed to vary between therapists. A comparison of the −2LL was used to test if the second model better fit the data, supporting that therapists did differ one from another in their clients’ treatment outcomes. In addition, the Wald Z test of the random intercept was used to further evaluate if therapists differed in their clients’ outcomes. The third model added supervisors at Level 3; thus, client treatment duration and outcomes were allowed to vary between therapists and between supervisors. The −2LL value from the third model was compared with the value from the second model, and if significantly different, it indicated that allowing client outcomes to vary between supervisors and therapists better fit the data than just allowing them to vary between therapists alone. The Wald Z values from this model were also examined to test if a significant amount of the client variance was explained by both therapists and supervisors. Similar to Rousmaniere et al. (2014), we also tested whether the amount of variance explained by supervisors in client outcomes differed, depending on the type of therapist (practicum students vs. postdocs). This was done by entering therapist type at Level 2 and comparing a model where supervisor variance was held constant across therapist type with a model where supervisor variance was allowed to vary depending on the type of therapist. SPSS was the software used for the data analysis in this project.
Results
Treatment Outcome
To examine whether variance in client treatment outcomes was explained by the supervisor, the OQ-45.2 total score from the last attended session was predicted while including the intake OQ-45.2 total in each of the models as a fixed Level 1 effect. There was an average of 7.91 points, SD = 4.04, of client improvement across supervisors, ranging from an average of 2.64 points deterioration for clients seen under the direction of one supervisor to 18.29 points of improvement for clients seen under the direction of another supervisor (see Figure 1). There were no outliers for intake OQ-45.2 scores greater than 3.5 SDs from the mean OQ-45.2 score. However, there were three outliers for OQ-45.2 scores for the client’s final sessions. These scores were included in the analyses as the scores did not appear to differ from next closest nonoutlying neighbor scores and also fit within the normal shape of the distribution of scores. Therefore, although the scores were extreme when compared with the mean OQ-45.2 score, the scores appeared reasonable given the other scores by participants in the sample and the norms reported for the measure. The internal consistency alpha for OQ-45.2 total scores was .93 in the study sample. Although intake OQ-45.2 scores appeared to be normally distributed, D(3030) = .015, p = .113, end OQ-45.2 scores were not, D(3028) = .020, p = .006. However, the Kolmogorov–Smirnov statistic for the end OQ-45.2 score was small, kurtosis and skewness values were in the acceptable range, and a visual inspection of the histogram showed a normal distribution.

Differences in average client outcomes (OQ-45.2 total change) with 95% CI bars by supervisors.
The −2LL value from the empty model was 25,730.85, and the estimate for residual variance was significant, Estimate = 287.08, Wald Z = 38.91, p < .001, indicating that clients did differ significantly in the amount of change they made on the OQ-45.2 over the course of psychotherapy (Table 1). The −2LL value for the second model with therapists added at Level 2 was 25,716.78, which was significantly lower than the value from the first model, χ2 = 14.07, p < .001. In addition, the variance in clients’ treatment outcomes due to therapists was significant, Estimate = 5.05, Wald Z = 2.54, p = .01, with differences in the therapists explaining approximately 1.76% of the variance in total change made by clients. The −2LL value for the third model was 25,716.78, which was identical to the −2LL value for the second model, χ2 = 0.00, p = 1.0, thus indicating that supervisors did not explain a significant amount of variance in client treatment outcomes. Further supporting this, the variance explained by letting client treatment outcomes vary by supervisors was 0.00% and not significant, Estimate = 0.00, Wald Z = 0.00, p = 1.0, although the variance explained by therapists remained at 1.76% in this final model. We next tested whether the supervisor variance in client outcomes differed depending on the therapist type. The −2LL value for the baseline model where supervisor variance was held constant across therapist type was 25,716.78. The −2LL value for the model where supervisor variance was allowed to vary across therapist type was also 25,716.78. A significant difference between these models was not evident, χ2 = 0.00, p = 1.0, indicating that the variance between supervisors in client outcomes did not differ between practicum student and postdoc therapist groups.
Outcome: Δχ2 From Baseline/Previously Significantly Improved Model, and Significance, for Each Successive Model.
Treatment Duration
For treatment duration, clients attended an average of M = 6.86 sessions, SD = 6.38, ranging from 1 to 63 sessions. There were 39 outliers (>3.5 SD from the Mean; >29 sessions) identified in the number of sessions that clients attended. However, this would be expected given the low mean and that most clients attended five sessions or less. These scores were not removed as they aligned with what might be expected for this session variable and therefore remain meaningful to this data set. The outlying scores appeared reasonable when compared with nonoutlying scores near to these scores. However, number of sessions attended was not a normally distributed variable, D(3028) = .190, p < .001. Thus, the following results should be interpreted with caution.
Averaging across supervisors, clients attended an average of 6.48 sessions, SD = 1.45, ranging from an average of 3.67 sessions for clients seen by one supervisor to an average of 9.37 sessions for clients seen by another supervisor (see Figure 2). The −2LL value for the empty model was 19,813.63 (Table 2). The estimate for the residual variance was significant, Estimate = 40.67, Wald Z = 38.91, p < .001, indicating that clients did differ significantly in the number of sessions they attended. The −2LL value for the second model with therapists added at Level 2 was 19,750.74, which was significantly lower than the value from the first model, χ2 = 62.89, p < .001. In addition, the variance in clients’ treatment duration due to therapists was significant, Estimate = 1.83, Wald Z = 3.84, p < .001, with differences in the therapists explaining approximately 4.50% of the variance in the number of sessions attended by clients. The −2LL value for the third model was 19,737.54, which was significantly lower than the value from the second model, χ2 = 13.20, p < .001. In addition, the variance in clients’ treatment duration due to both therapists, Estimate = 1.65, Wald Z = 3.41, p < .001, and supervisors, Estimate = 0.75, Wald Z = 2.06, p = .04, was significant. The results indicated that differences between supervisors explained 1.83% of the variance in client treatment duration above the variance explained by therapists. We also tested whether the supervisor variance in client treatment duration differed depending on the therapist type. The −2LL value for the baseline model where supervisor variance was held constant across therapist type was 19,737.54. The −2LL value for the model where supervisor variance was allowed to vary across therapist type was 19,736.03. A significant difference between these models was not evident, χ2 = 1.51, p = .22, indicating that variance between supervisors was not different depending on if the therapist was a student and postdoc.

Differences in average number of sessions attended with 95% CI bars by supervisors.
Duration: Δχ2 From Baseline/Previously Significantly Improved Model, and Significance, for Each Successive Model.
Discussion
Findings from replication studies often do not reproduce the effects from the original studies or the magnitude of original effects (Open Science Collaboration, 2015; Rogers, 2018). Given that “concerns about reproducibility are widespread across disciplines” (Open Science Collaboration, 2015, p. 25), including in the psychological literature, more replication studies are needed to either bolster or challenge research findings from which the profession draws evidence. This follow-up study to Rousmaniere et al. (2014) was an effort to confirm and extend existing research findings to address this need. To draw a comprehensive understanding and discussion of the findings of this replication a review of the findings of the original study, an examination of therapist effect on client outcome, the generalizability of the research findings of this replication (even when routine feedback is provided to therapists and supervisors), as well as an exploration of the variance present due to treatment duration, and considerations by the authors regarding confounding factors is also elucidated upon. Each of these topics is reviewed in turn.
The findings of the original study and this replication study were compared against one another following the analysis of the archival data set for this project. In the original study conducted by Rousmaniere et al. (2014), less than 1% of the variance in client treatment outcome was attributed to supervisors. Given this finding, a replication study was therefore warranted. This follow-up study utilized the methodology from Rousmaniere et al. (2014) and was conducted at a different counseling center. Results from this replication study indicated that supervisors accounted for 0.00% of the variance in client outcome on the OQ-45.2 total score. Therefore, it is possible that our results may support those found by Rousmaniere and colleagues (2014). The contribution of supervisors to client outcome in routine practice settings should be further examined to determine the effect supervisors have on client outcome, especially given that supervision in this study appeared to have a similar effect on outcome across all clients of supervisees.
To make sense of the relationship between supervisor and client outcome of supervisees in this follow-up study, an understanding of the effect that therapists have on the outcome of their own clients is warranted. The findings from this replication study are intriguing when considering the effect that therapists may or may not have on client outcome in general. For example, in their meta-analysis of 15 studies examining therapist contribution to client outcome, King et al. (2017) found that therapists may account for less than 1% of the variance present in client outcome (though confidence intervals may place this at 5%). In addition, clients receiving either self-help interventions (no therapist) or psychotherapy had equivalent outcomes and completion rates (N = 723). The findings from King et al. (2017) suggest the effect that therapists have on client outcome may be minimal to begin with, which may assist in explaining the minimal effect that supervisors had on the outcomes of clients of their supervisees in this study. The results from King et al. (2017) might also suggest that there is an upper limit for the impact of therapists, and consequently, supervision would be unlikely to exceed that amount of change variance.
Due to the findings derived in this study, the potential for generalizability of these findings should be further examined. Results from this replication suggest the effects found in Rousmaniere and colleagues (2014) were not site-specific. Rather, the lack of variance in client outcome accounted for by supervisors could be observed in a different counseling center with a similarly large data set. Even when treatment feedback throughout the therapy process was available to all supervisors and therapists (and just as in this study it remains unclear the extent to which the routine outcome measure was reviewed by supervisees), the amount of client outcome variance accounted for by supervision was similar to Rousmaniere et al. (2014). These findings contrast with the findings of Callahan et al. (2009), perhaps because they did not employ a HLM approach to data analysis. However, the authors in this replication study contend that the HLM approach to data analysis is more appropriate in this research context, given the nested nature of the data (Table 3).
Comparison Between Current Study and Rousmaniere et al. (2014).
Note. CBT = cognitive behavioral therapy; OQ-45.2 = Outcome Questionnaire-45.2.
Given the findings of this project, the role of treatment duration should also be examined. In this study, despite the inability of supervision to explain outcome variance, a significant amount of variance was attributable to treatment duration. The variance due to duration of treatment was 1.83% (p-value of .04). It is unclear why the variance in treatment duration was significant in this replication study or whether this variance is meaningful clinically. Although there may be no difference between supervisors in client outcome, there may be supervisors who are able to achieve similar outcome results in fewer sessions. Future researchers should continue to explore the role that supervision might play regarding duration of treatment.
Overall, these findings reveal more conclusively that the effect of supervision on client outcome may be limited and that the variance in outcome may in fact be more attributable to confounding variables at each level of this experiment (supervisor variables, supervisee variables, and client variables) (Wampold & Holloway, 1997). There were a number of limitations of this study as well as alternative explanations for the results that were derived. Multiple variables at each level of analysis had the potential for unique and specific moderating effects on supervision and the results found in this study. For example, there were a number of variables that may have potentially moderated the effect of supervision on client outcome at the supervisee level of analysis. These identified variables were similar to those variables present in the original Rousmaniere et al. (2014) study.
One variable that might have had a moderating effect at the supervisee level of analysis is the fact that supervisees have a tendency to rate their perceived collaboration with their supervisors as low (Rousmaniere & Ellis, 2013). Another variable is that supervisees have been shown to have a high frequency of nondisclosure in the supervision process (Mehr et al., 2010). In addition, in this study, there was no accounting for the match/mismatch of supervision methods/models to those of supervisees or how well supervisees were able to incorporate supervision feedback into their care for clients (Bernard & Goodyear, 2014). Yet another variable that may have had a moderating effect is that it is unknown what, if any, effect the supervisory alliance between supervisor and supervisee might have on client outcome (Inman et al., 2014).
Limitations
There were a number of limitations present at the supervisory level of this study as well. Interventions employed by supervisors throughout the supervision process would have to travel through each layer of confounding variables (supervisor, supervisee, and client) to noticeably impact client outcome. Consequently, interventions would have to be very effective to penetrate the client level. The effect that group supervision may have on improved client outcome is a limitation that was also encountered in the Rousmaniere et al. (2014) study. It is difficult to say what effect, if any, that supervisors and other supervisees may have on each other during the allotted 2-hr time period each week that group supervision occurs, as peer feedback could potentially affect outcome in general.
Other limitations present in this study were the number of client variables that may have moderated the effect of supervision on overall client outcome which were not assessed for in this analysis. In fact, client variables that may likely account for the majority of variance in outcome between clients more so than supervisor effects on outcome (Bohart & Tallman, 2010). For example, some specific client variables that may likely impact the variance between supervisors include severity of symptomology and clients’ previous treatment histories (Bohart & Tallman, 2010), clients’ expectations for change and transformation in the treatment process (Swift et al., 2012), and therapeutic alliance with their therapist (Norcross & Lambert, 2011).
Because of potential legal and ethical ramifications of clinical supervision, there could not be a control group of unsupervised providers. Therefore, similar to Rousmaniere et al. (2014), we are unable to say experimentally whether supervisors had any effect on client outcome at all. This poses a large limitation for this study. We can, however, safely conclude that outcome among supervisors appeared to improve generally when examined across this data set, given that on average clients improved 7.91 points on the OQ-45.2 throughout the duration of treatment. This result suggests that clients were provided quality care throughout their treatment at this clinic.
It is also important to note that the Wald Z test for random effects is not entirely suitable given that the distributions are only approximately normal in this study. However, there is currently no alternative test to be used in this instance. Hence, the Wald Z test can only be a supplementary statistic with emphasis on the significance of the likelihood ratio test (the significance of −2LL change). This should be taken into consideration in the interpretation of the results of this study.
One possible limitation of this study is the fact that while supervisees did not participate in shared group supervision experiences they did participate in a practicum course with other students which included activities similar to those found in group supervision (case presentations, case discussions, etc.). Practicum courses were made up of six to eight students in their first or second year of training. It is unclear whether participation in this course may have affected the overall improvement of client participants in this study and whether there is variance among supervision groups rather than supervisors. It should be noted that this aspect of the study differs slightly from the original Rousmaniere et al. (2014) study where “supervisees received one hour of individual and two hours of group supervision per week, from the same supervisor” (p. 4). Another difference between this study and the original study is that this the archival data collected for this experiment were gathered at a university counseling center, whereas Rousmaniere et al. (2014) collected data at a private nonprofit health center. Future research may benefit from controlling variables such as participation in shared practicum courses, group supervision, and exploration of sites other than community health centers and university counseling centers.
This study, as well as the study conducted by Rousmaniere and colleagues (2014), consisted of analyzing archival data sets from naturalistic treatment settings. Use of only one outcome measure (OQ-45.2) and the accessibility of demographic and other information on supervisors, supervisees, and clients provided other limitations to the study. Also, due to the retrospective nature of the study, there was an inability to control for some therapist and supervision variables. Future researchers might draw more substantial findings by conducting a deliberative longitudinal study on this topic and by the purposeful inclusion of a similar individual and group supervision process for all supervisee participants.
Implications and Future Directions
Information from this study will likely have implications for the way that the field chooses to understand and conduct the practice of supervision for developing therapists. The goals of supervision of psychotherapy center on (a) aiding in the professional development of supervisees, and (b) both protecting and ensuring client welfare in psychotherapy (Bernard & Goodyear, 2014). While ensuring client welfare should remain a central priority, further exploration surrounding the first goal is warranted given these findings regarding the influence of clinical supervision. Although, when taken together, the findings in this study suggest that client welfare was generally enhanced overall for clients throughout the course of treatment, further clarification is needed as to what exactly is aiding in the improvement of client welfare generally and how supervisors are contributing to the development of their supervisees in ways that impact client outcome.
Footnotes
Authors’ Note
Jason Whipple is now affiliated with Alaska VA Healthcare System, USA.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
