Abstract
Aim:
To evaluate the evidence of the effectiveness of classroom-based Crew Resource Management training on safety culture by a systematic review of literature.
Methods:
Studies were identified in PubMed, Cochrane Library, PsycINFO, and Educational Resources Information Center up to 19 December 2012. The Methods Guide for Comparative Effectiveness Reviews was used to assess the risk of bias in the individual studies.
Results:
In total, 22 manuscripts were included for review. Training settings, study designs, and evaluation methods varied widely. Most studies reporting only a selection of culture dimensions found mainly positive results, whereas studies reporting all safety culture dimensions of the particular survey found mixed results. On average, studies were at moderate risk of bias.
Conclusion:
Evidence of the effectiveness of Crew Resource Management training in health care on safety culture is scarce and the validity of most studies is limited. The results underline the necessity of more valid study designs, preferably using triangulation methods.
Keywords
Background
While health-care workers are educated in settings differing in expertise, educational level, and overarching perspective, in practice they have to work together and are expected to be good team players. Until a decade ago, health-care workers received hardly any training in the area of working in teams and corresponding non-technical skills, while the literature shows that many contributing factors to adverse events are related to miscommunication, a lack of communication and teamwork, and other non-technical skills. 1
The importance of non-technical skills was recognised four decades ago in aviation. As a result, specialised training programmes, like Crew Resource Management (CRM), aimed at minimising the effects of human error by improving non-technical skills, were developed to improve safety-critical behaviours on the flight deck. 2 CRM typically includes educating teams about the limitations of human performance. 3 Operational concepts include inquiry, seeking relevant operational information, assessing personal and peer behaviour, communicating proposed actions, conflict resolution, and decision-making.3–5
In common with others,6,7 Salas et al. 8 reported that CRM training in aviation resulted in positive reactions, enhanced learning, and desired behavioural change in the cockpit. Due to its face validity, the Institute of Medicine advocated the adoption of CRM to safety and error management in health care for creating the necessary safety culture. 9 Consequently, international health authorities placed a high priority on CRM training as a method to improve patient safety, especially in high-risk areas such as emergency departments, intensive care units, and surgery.9,10 As a result, efforts have started to be made to implement CRM in the health-care sector.6,8 Evaluations of these programmes generally focus on one or more of the four levels of Kirkpatrick and Kirkpatrick’s 11 framework for evaluating educational interventions: reactions, learning/knowledge, behaviour, and organisational impact.
Several reviews on medical team training exist.12–15 The current review focuses on organisational impact – more specifically the patient safety culture – since the ultimate goal of CRM training is to alter safety culture. It is suggested that a positive, proactive safety culture will lead to fewer adverse events and less patient harm. 16 We systematically reviewed the effects of CRM training on patient safety culture to investigate the effect of CRM training on safety culture, focussing on classroom-based training courses given to health-care teams in hospitals. It includes an extensive description of the validity of the included studies.
Methods
This manuscript adheres to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2009 checklist for reporting of systematic reviews. 17
Data sources
Studies were identified by online searches performed up until 19 December 2012 in four electronic databases: PubMed, Cochrane Library, PsycINFO (via EBSCO) and Educational Resources Information Center (ERIC) (via EBSCO). No limits were applied concerning year of publication or language, but only articles in English were included. References of relevant articles and related reviews were checked by hand.
Selection of articles
Using a predefined set of 10 CRM articles, we (M.C.d.B., I.V.N., and E.P.J.) developed a search strategy containing free-text terms only. Indexing terms (controlled terms) did not identify relevant articles, neither did they detect additional articles compared to the free-text terms. In general, indexation of CRM studies was very heterogeneous and lacked standardisation. The following terms were combined with Boolean ‘or’: ‘crew resource management’, ‘Medical team training’, ‘Non-technical skills’, ‘teamwork training’, ‘team training’, ‘teamwork performance’, ‘team performance’, ‘team resource management’, ‘Medical team education’, ‘teamwork education’, ‘team education’, ‘team collaboration’, ‘team behaviour’, ‘team behavior’, ‘team skills’, ‘teamwork skills’, ‘team decision making’, ‘team effectiveness’, ‘team structure’, and ‘team competencies’. We excluded the term ‘crisis resource management’ since this term only revealed studies based on simulation techniques.
During the selection procedure, two researchers (I.V.N. and M.C.d.B.) assessed whether references were ‘relevant’, ‘uncertain’, or ‘irrelevant’ according to the inclusion/exclusion criteria based on title and abstract only. Full texts of abstracts judged relevant were requested. Abstracts judged ‘uncertain’ were re-evaluated. In the case of disagreement regarding a re-evaluated abstract, the full text was requested. Both researchers judged all full texts separately. In the case of disagreement, the full texts were discussed until consensus was reached.
Second, we performed a hand search on existing systematic reviews about team training and checked for relevant references in the included studies. In a final stage, we selected those studies that used safety culture as an outcome (Figure 1).

Flowchart of included articles.
Inclusion and exclusion criteria
Studies were eligible for inclusion when the training focussed on health-care teams in hospitals and covered at least two CRM topics (e.g. communication and leadership). We excluded studies evaluating CRM in pre-clinical medical education, outside health care, in primary care, and dental care. CRM training courses based (partly) on high-tech simulation techniques were also excluded, as they are fundamentally different from classroom-based training courses. When evaluation studies compared classroom-based CRM training versus simulation-based CRM training or no training, we neglected results from simulation-based training. Furthermore, we excluded manuscripts based on qualitative research.
Data extraction
Studies with a positive effect of CRM training on safety culture were defined as those studies with statistically significant changes from baseline and/or a control group or changes in safety culture dimensions of 10% or more. 18 Descriptive data (information about the type of training, participants, measurement instruments, follow-up, analyses, and implementation and sustainment strategies) were extracted by I.V.N., while data were checked by N.Z. to confirm whether this had been extracted correctly from manuscripts.
Quality appraisal
The Methods Guide for Comparative Effectiveness Reviews of the Agency for Healthcare Research and Quality 19 was used to assess the risk of bias in the individual studies in the systematic review. The taxonomy of the five core biases of the Cochrane Handbook was used, namely, selection bias (including randomisation and blinding bias for randomised controlled trials (RCTs)), performance bias, attrition bias, detection bias, and reporting bias. For RCTs, and cohort- and cross-sectional studies, we used specific criteria according to the description in Table 4 of the Methods Guide for Comparative Effectiveness Reviews (Table 1). Every separate criterion is reported as well as a percentage score of the risk of bias.
Definitions of the types of biases used for the risk of bias assessment, adjusted for training interventions.
Zero points were assigned to low risk of bias, one point to moderate risk of bias, and two points to high risk of bias. The sum of points was divided by the total points possible for all criteria together, multiplied by 100 (‘unclears’ were disregarded). As a result, 0%–33.2% resembles low risk of bias, 33.3%–66.6% indicates a moderate risk of bias, and 66.7%–100% reflects a high risk of bias.
Results
We retrieved 1650 manuscripts from PubMed, 225 from PsycINFO, 110 from the Cochrane Database, and 537 from the ERIC database, resulting in 1926 unique references. The selection procedure resulted in 50 manuscripts evaluating one of the four levels of Kirkpatrick and Kirkpatrick’s 11 framework. Of these, 22 reported data on safety culture (Figure 1).
Study and training characteristics
All studies were published from 2006 onwards, with the majority being conducted in the United States. Half of the studies focussed on the operating theatre setting. Of the studies, 16 had uncontrolled designs, of which 10 were single-centre studies. And 10 studies were multicentre studies, 3 of which had a controlled design. In total, there were six controlled designs, three in which a control site was used, two in which trained versus non-trained personnel were compared, and one in which the last trained cohort was compared to the first trained cohort before they received training. The TeamSTEPPS training curriculum was implemented on eight occasions. Follow-up measurement varied between 3 months and 4 years. Response percentages differed widely among the studies (range: 19%–96%), as well as the number of trained individuals (range 29–32,150). Implementation and sustainment strategies consisted of, among others, change or leadership teams (usually formed with champion figures), briefings/debriefings, coaching, comprehensive implementation plans, embedding of training within patient safety programmes, structural training of incoming employees, and train-the-trainer modules (Table 2).
Characteristics of the included studies.
NS: not specified; VHA: Veterans’ Health Administration; VA: Veterans Affairs; NCPS: National Center for Patient Safety; MTT: Medical Team Training; USA: United States of America; UK: United Kingdom; CH: Switzerland; AU: Australia; OR: Operating Room; L&D: Labour and Delivery; PICU: Paediatric Intensive Care Unit; HSOPSC: Hospital Survey on Patient Safety Culture; SAQ: Safety Attitude Questionnaire; T-TPQ: TeamSTEPPS Teamwork Perception Questionnaire; PTS: Perception of Teamwork Survey; TAQ: Team Assessment Questionnaire; TTP: Team Training Programme; CRM: Crew Resource Management.
Training effects
Table 3 describes the effects of the different studies and their risk of bias. Predominantly, the Hospital Survey on Patient Safety Culture (HSOPSC, 8 times) or the Safety Attitude Questionnaire (SAQ, 11 times) was used to assess the effect of team training on patient safety culture. The results per questionnaire are given below.
Effects of CRM training described in the included studies and the risk of bias assessment of the included studies. a
NS: non-significant; L&D: Labour and Delivery; PICU: Paediatric Intensive Care Unit; SAQ: Safety Attitude Questionnaire; SICU: Surgical Intensive Care Unit; CRM: Crew Resource Management.
Only statistically significant or relevant (more than 10% change) effects are reported. If not all dimensions of the particular questionnaires are reported, this is mentioned, otherwise no effects were found.
0: low risk of bias; 1: moderate risk of bias; 2: high risk of bias; (–): unclear. Percentages are calculated by assigning zero points to low, one point to moderate, and two points to high risk of bias per criterion (criteria not shown separately in this table). The sum of points is divided by the total possible points for all criteria together times 100 (‘unclears’ were disregarded).
Note that number of items changed when the outcomes were regarded as dichotomous. We could not discover which numbers are right.
Safety Attitude Questionnaire
Regarding studies that used the SAQ as an evaluation method, 25- and 100-point scale scores, item-level differences, and differences in positive responses were reported. Four studies reported results on all SAQ dimensions, two of them finding mainly positive results34,38 and two of them finding mainly negative results.35,40 Four studies reported only teamwork climate and found positive increases. Safety climate was reported separately in one study, and a significant positive change was reported. 23 The study of Haller et al. 27 reported teamwork climate, safety climate, and stress recognition and found some improvements at item level. In sum, teamwork climate increased in six of the nine studies reporting this dimension. Safety climate changed in a positive direction in four of seven studies that reported that outcome.
Hospital Survey on Patient Safety Culture
As with the SAQ, not all dimensions were evaluated in all studies. Four studies reporting all dimensions found mixed results. Blegen et al. 22 found positive changes for all dimensions except for frequency of event reporting (staffing was not reported). Marshall and Manus, 18 in contrast, only found significant changes for four dimensions. Stead et al. 36 found statistically significant improvement on two dimensions, but when the cut-off of more than 10% change in positive responses was used, all but one (hospital management support for patient safety) improved. Thomas and Galla 37 found positive results in general, although different results were partly seen for the pilot hospital compared to the system-wide evaluation. Four studies reported only selected dimensions of the HSOPSC,20,26,30,39 and except the study of Weaver et al., 39 all found positive changes in those selected dimensions. Gore et al. 26 did not find positive changes when outcomes were regarded as continuous as opposed to dichotomous (positive answers); when regarded as dichotomous, all selected dimensions changed.
Other questionnaires
Other questionnaires used to assess the effect of team training on patient safety culture were a Brief Teamwork Perception Questionnaire, and the Team Assessment Questionnaire, in which we only considered the domains ‘Leadership’ and ‘Climate’ as relevant for patient safety culture. Castner et al. 25 found a result on Leadership. Mahoney et al. 29 found a positive improvement on Climate, where Armour et al. 20 did not. Halverson et al. 28 reported results at item level and observed that 14 of 19 items improved.
Quality appraisal
The studies showed an average risk of bias percentage of 43.7% (standard deviation (SD) 15.3%), indicating that on average the studies had a moderate risk of bias. Seven studies had a low risk of bias (<33.3%),21,22,25,27,31,38,39 but three of them did not describe all the bias criteria31,38,39 (Table 3); 14 studies had a moderate risk of bias (between 33.3% and 66.7%),18,20,23,24,26,28–30,33–37,40 11 of which18,20,23,24,28,30,34–37,40 did not describe all bias criteria (see Appendix 1 for criteria). One study had a high risk of bias (≥66.7%). 32 At first sight, it seemed that controlled studies had less risk of bias in general, but a non-parametric independent-sample test did not demonstrate these differences as being significant (p = 0.19)
Selection bias comprised items about allocation and the analyses and design regarding modifying and confounding variables. Nine studies26,28,29,32–34,36,37,40 had a high risk of selection bias, mainly because the design and analyses did not take into account possible confounding and modifying variables. Fisher’s exact test showed that single-centre uncontrolled studies more often had a high risk of selection bias compared to other study designs (p = 0.494). Performance risk of bias was high in two studies18,20 and considered low in three.21,31,39 Attrition bias concerned the loss of follow-up of respondents. If attrition is a concern, missing data have to be handled appropriately according to the risk of bias assessment format. A high risk of attrition bias was found on eight occasions,23–26,29,30,39,40 and in seven studies, it was unclear how missing data were handled and/or what the response rates were.20,28,31,32,34,35,37 The risk of detection bias was considered high in five studies,18,32,36,37,40 while it was considered low in seven.21,22,25,26,35,38,39 Fisher’s exact test revealed that multicentre controlled design more often had a low risk of detection bias compared to the other study designs (p = 0.225). Reporting bias concerned predefined and reported outcome variables. In two studies,32,35 outcome variables were prespecified but not all reported, which gave them a high risk of reporting bias classification.
Discussion
This systematic review explored the effects of a classroom-based CRM training on safety culture. In total, 22 studies were included whose effect was mainly evaluated by means of the SAQ or the HSOPSC. Uncontrolled studies in our systematic review all found positive effects, although the magnitude of effects varied across the studies. Two controlled studies that used a control group found no training effects. All but one of the cross-sectional controlled studies found some effects. Risk of bias assessment revealed that in general, studies were at moderate risk of bias, with selection bias and attrition bias being the most common biases. Results also showed that single-centre uncontrolled designs were at higher risk of selection bias than other designs and that multicentre studies had a lower risk of detection bias than other designs.
There are several possible reasons why the results of the controlled studies were different from those of the uncontrolled studies. First, single-centre studies usually involve hospital-driven initiatives. In essence, these differ from externally evaluated training programmes since internally driven improvement projects are likely to have more support from upper and middle managers and staff. According to Salas et al., 41 organisational support is one of the critical success factors determining the success of a training course. Additionally, uncontrolled studies allow more variance in content and implementation strategies than controlled studies. Embedment in organisational goals and aims will motivate frontline care leaders and managers to commit to CRM training principles. 41 Second, compared to controlled designs, training effects in uncontrolled designs may have been more confounded by context factors (e.g. staffing profiles, departmental activities concerning patient safety) or by the fact that patient safety is on the national research agenda in many countries. Context may influence the interpretation of results due to possible beta statistical (type II) errors since outcomes could have been influenced by factors other than the intervention alone. Nevertheless, our risk of bias assessment did not reveal that this aspect played a role. Additionally, context may influence the fidelity of the implementation of the trained principles in practice. 42 Third, the timing of the measurement may be of influence on the mixed results of the studies, although no clear pattern can be discovered between the magnitude of effects found in studies in our systematic review and their follow-up periods.
We noted some striking findings in this review. First, none of the multicentre or multisite studies made corrections for the clustering of responses within units. Effects may therefore be overestimated since clustering of responses decreases power when intraclass correlation coefficients (ICCs) are high. Concerning the clustering of responses within units or hospitals, Smits et al. 43 reported ICCs for the HSOPSC that ranged from 4.3 to 31.7 for the unit level and 0.0 to 6.2 for the hospital level, with 15 as a threshold for high clustering. 44 Second, the relatively high frequency of high risk of attrition bias suggests that the biggest challenge lies within the prevention of losses to follow-up of respondents or in achieving a high response in the case of cross-sectional studies. Third, the outcome data were often handled as a dichotomous outcome, and consequently, results moved sooner towards statistical significance, as shown in the study by Gore et al. 26 Furthermore, when the cut-off of a 10% change in positive responses is used, there is a chance that results are regarded as significant, while statistical tests will not show any significant results, as shown in the study by Stead et al. 36
In general, results of the studies varied widely, and keeping in mind the possible bias, we are cautious in drawing firm conclusions. The possibility of publication bias supports this feeling. When an intervention study does not find an effect, study quality evaluations are more stringent, focussing on the appropriateness of the study design, measurements, and methodology. Thus, intervention studies with negative results have a lower chance of publication. 45 By contrast, single-centre controlled studies have more chance of publication as these studies show larger effects. 46
Regarding generalisability, we must take into account that safety culture is a subcomponent of organisational culture and will thereby be influenced by the dominant organisational culture. Organisational culture reflects shared behaviours, beliefs, attitudes, and values regarding goals, functions, and procedures. 47 As with organisational culture, beliefs, values, and attitudes towards safety culture could vary between individuals. The same variance is possible between departments and within disciplines, which hampers the generalisability of results to other departments and types of departments, as those settings will have profound differences in initial safety culture as well as contextual features. The variety of evaluation methods and designs used to track changes in culture elements aggravates this problem.
In addition to this, the variety in CRM training concepts and the manner in which they are trained makes it hard to pinpoint which training elements are related to specific culture changes. CRM, in general, is focussed on non-technical skills, but different training underscores different concepts or use different training forms. Moreover, expert trainers might adapt their programme, exercises, and feedback to the knowledge, skills, and dynamics in the group. Extensive descriptions of CRM training interventions are therefore a prerequisite when more in-depth analyses after correlations or causations are desirable. Another recommendation would be to also provide thorough descriptions of the trained participants, to be able to gain more insight into for which teams CRM training potentially works.
With respect to methodology, using questionnaires to assess safety culture might not always be the best choice. This method is appropriate within the analytical approach, which assumes that culture is something an organisation
Internationally, the impact of team training on secondary outcomes such as adverse outcomes and safety culture change has to be evaluated with highly reliable study designs. This is challenging, especially in the health-care environment where clinical practice is influenced by a variety of highly uncontrollable factors. 49 Non-controlled before–after evaluations of the specified secondary outcomes might seem the most realisable of all study designs. However, as mentioned previously, in these studies, effects could possibly be attributed to other developments within or outside the organisation since it is harder to distinguish between cause and effect. 50 One might suggest that controlled clustered (randomised) studies would be a suitable solution. A sufficiently large intervention and control group is a prerequisite. 50 For example, Nielsen et al. 51 have considered that 11–13 sites per group are needed to have 80% power to detect a 40% reduction in adverse outcomes at labour and delivery units. Another appropriate design for the evaluation of CRM effects may be the stepped wedge design in which sites will act as their own control. This will also reduce the risk of bias. Advantages and disadvantages of the stepped wedge design are mainly of a practical nature, that is, its design suits situations in which interventions allow for a phased introduction, but it demands a laborious and extended amount of data collection. 50
We are aware of the delay in the publication process since December 2012. We performed a quick scan of the literature in 2013 and the beginning of 2014 in MEDLINE, the database in which all the included articles can be found. Probably four studies have been published since December 2012 that we would have included in this review.52–55 Three studies were quasi-experimental52,53,54 and one study was a randomised controlled trial, 55 two of them being small.53,55 Since these studies did not show extreme findings at first sight, we do not expect that an update of our review will materially change the results.
In sum, we conclude that evidence of the effectiveness of CRM training in health care in terms of improved safety culture is scarce and the validity of most studies is limited, due to the predominant use of uncontrolled study designs. Although it might be easier to comply with critical success factors for team training with single-centre evaluations, the results underline the necessity of a control group to reduce the risk of bias. In addition to that, more in-depth measurements of context and triangulation methods to analyse these in combination with primary outcomes will help to acquire insight into the working mechanisms of the CRM training and the influential role of context.
Footnotes
Appendix
Risk of bias criteria used for the risk of bias assessment (Viswanathan et al. 19 )
| Type of bias | Criterion | Study design |
||
|---|---|---|---|---|
| RCT | CCT/cohort study | Cross-sectional | ||
| Selection bias | Was the allocation sequence generated adequately? | x | ||
| Was the allocation of treatment adequately concealed? | x | |||
| Were participants analysed within the groups they were originally assigned to? | x | x | ||
| Did the study apply inclusion/exclusion criteria uniformly to all comparison groups? | x | x | ||
| Did the strategy for recruiting participants into the study differ across study groups? | x | |||
| Does the design or analysis control account for important confounding and modifying variables through matching, stratification, multivariable analyses, or other approached? | x | x | x | |
| Performance bias | Did researcher rule out any impact from a concurrent intervention or an unintended exposure that might bias results? | x | x | x |
| Did the study maintain fidelity to the intervention protocol? | x | x | x | |
| Attrition bias | If attrition was a concern, were missing data handled appropriately? | x | x | x |
| Detection bias | Was the length of follow-up different between the groups or was the time period between the intervention/exposure and outcome the same for cases and controls? | x | x | |
| Were outcomes assessors blinded to the intervention or exposure status of participants? | x | x | x | |
| Were intervention exposure assessed using valid and reliable measures implemented consistently across all study participants? | x | x | x | |
| Were outcomes assessed using valid and reliable measures implemented consistently across all study participants? | x | x | x | |
| Were confounding variables assessed using valid and reliable measures implemented consistently across all study participants? | x | x | ||
| Reporting bias | Were the potential outcomes prespecified by the researchers? Are all prespecified outcomes reported? | x | x | x |
Acknowledgements
This study was part of the National Patient Safety Program for Hospitals in the Netherlands.
Declaration of conflicting interests
The authors declare that they have no competing interests.
Funding
Financial support for this study was provided by the Dutch Ministry for Health, Welfare and Sports (grant no. 1075879).
