Sage Journals: Discover world-class research

Abstract

Background: Objective Structured Clinical Examinations (OSCEs) are a simulation-based assessment tool used extensively in medical education for evaluating clinical competence. OSCEs are widely regarded as more valid, reliable, and valuable compared to traditional assessment measures, and are now emerging within professional psychology training programs. While there is a lack of findings related to the quality of OSCEs in published psychology literature, psychometric properties can be inferred by investigating implementation. Accordingly, the current review assessed implementation of OSCEs within psychology programs against a set of Quality Assurance Guidelines (QAGs). Methods: A systematic review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) recommendations. Electronic databases including ProQuest Psychology, PsycArticles, Psychology and Behavioural Sciences Collection, PsycInfo and key indexing databases such as Scopus, ProQuest, and Web of Science were used to identify relevant articles. Twelve full-text articles met all inclusion criteria and were included in the review. Results: There was considerable heterogeneity in the quality of studies and reporting of OSCE data. Implementation of OSCEs against QAGs revealed overall adherence to be “Fair.” Conclusion: The current review consolidated what is known on psychometric quality of OSCEs within psychology programs. A further need for quantitative evidence on psychometric soundness of OSCEs within psychology training is highlighted. Furthermore, it is recommended that future training programs implement and report OSCEs in accordance with standardized guidelines.

Keywords

Objective Structured Clinical Examination OSCE psychometric quality validity reliability psychology program guidelines

Psychologists have a duty to develop and maintain knowledge and skills to practice effectively and safely. Psychology training programs (psychology programs) are responsible for determining if psychology students are capable of practising professionally in the real world (Paparo et al., 2021). Central to this decision are attainment and demonstration of clinical competencies which ensures students have the necessary skills and attributes required to practice psychology competently and safely (Australian Psychology and Accreditation Council, 2019).

While there are many methods available to psychology programs to evaluate skills and knowledge, they are not without limitations. Traditional forms of assessment such as essays and examinations only assess whether knowledge is acquired and generally lack ecological validity and fidelity to practice (Lichtenberg et al., 2007). Similarly, supervisor ratings have been found to be subject to leniency and bias (Gonsalvez & Freestone, 2007), and heterogeneity within clinical placement experience makes reliable determination of competency difficult (Paparo et al., 2021). Furthermore, while knowledge about psychotherapeutic models and techniques are essential, it does not imply that they can be applied with clients (Beccaria, 2013; Pachana et al., 2011). For example, a student who performs well on an essay assessing knowledge of clinical interviewing, may still face difficulty interviewing a real client. As such, assessing real-world clinical skills in a consistent, fair, and impartial manner is a complex challenge (Goodie et al., 2021).

Objective Structured Clinical Examinations

In recent times, there has been an increased uptake of Objective Structured Clinical Examinations (OSCEs) as a form of competency-based assessment (Sheen et al., 2015; Yap et al., 2021). During OSCEs, students role-play clinical vignettes designed to replicate real-world interactions with simulated patients or standardized patients (both terms are used interchangeably [SPs]) (Khan et al., 2013b). Examiners observe the students’ performances and evaluate competencies using objective measures such as checklists, rating scales, or both (Khan et al., 2013a). As such, OSCEs are better suited to assessing demonstration of competencies or skills which cannot be adequately captured by traditional assessment tools (e.g., conducting a mental state examination or administering an assessment) (Khan et al., 2013a). The OSCE is widely regarded as a fair and relevant examination that has been well received by students and examiners, due to multiple factors such as the controlled exposure and proximity to “actual clinical practice” and requirement to translate knowledge into practice (Patrício et al., 2013). OSCEs are also utilized as an assessment tool with the increasing use of simulated learning activities in competency-based psychology programs (Rice et al., 2022). OSCEs have been used within medical and physical health schools for over four decades (Brannick et al., 2011) and continue to be relevant as a formative or summative assessment across 94% of medical schools (Barzansky & Etzel, 2018).

Implementation Guidelines

Reliability and validity profiles of OSCEs are highly influenced by the way they are facilitated (Harden, 2016; Khan et al., 2013b). For instance, studies have shown that increasing the number of examiners (Brannick et al., 2011), stations (Khan et al., 2013a), and metrics (Ilgen et al., 2015) increases the overall reliability of OSCEs. A synthesis of implementation methods that improve psychometric qualities of OSCEs is provided by Khan et al. (2013b) as a set of Quality Assurance Guidelines (QAGs; Figure 1). These guidelines are referenced across studies evaluating the development and implementation of OSCEs in medicine (Daniels & Pugh, 2018), pharmacy (Cheema & Ali, 2021), dentistry (Salawu et al., 2022), social work (Bogo et al., 2012), and psychology (Goodie et al., 2021; Sundström & Hakelind, 2022; Yap et al., 2012). However, it should be noted that due to the early adoption and continued widespread use of OSCEs within medical training, QAGs have been developed from medical program OSCEs. As OSCEs expand across health professions, including psychology, it is important to ensure that while the content of the examination may differ, the essential quality indicators are present. Khan et al.'s (2013b) QAGs provide a blueprint for training programs to assess OSCE quality, by identifying each of the core components (as shown in Figure 1). As such, training programs can utilize QAGs during the development, implementation, and evaluation of OSCEs to maximize the strengths and psychometric profile of the assessment (Khan et al., 2013b).

Figure 1.

Quality assurance guidelines (Khan et al., 2013b).

In contrast to the depth of studies investigating medical OSCEs, there is a dearth of literature pertaining to the evaluation of OSCEs within psychology programs (Sheen et al., 2015). The adoption of OSCEs as a competency-based assessment within psychology programs was largely based on assessments within adjacent fields such as psychiatry, social work, and mental health nursing (Goodie et al., 2021; Plakiotis, 2017). Evidence of psychometric assessment of OSCEs in psychology has started to emerge in two broad sources of evidence: (1) face validity and (2) predictive and construct validity.

Face Validity

According to Sheen et al. (2021), OSCEs have the potential to enhance students’ educational experiences by providing learning opportunities that are authentic and proximal to their clinical work environments. This is supported by studies investigating student and staff perspectives of the OSCE as “preferrable to other forms of assessment” in addition to being “valid,” “realistic,” “fair,” and “authentic” (Melluish et al., 2007; Roberts et al., 2020; Sheen et al., 2015; Yap et al., 2012).

Predictive and Construct Validity

While this evidence suggests that OSCEs are favorably appraised, the ability of OSCEs to offer insights into the clinical capabilities of students is still under debate. OSCE performance and scores have been shown to significantly correlate with supervisor ratings of Communication and Interpersonal Skills in placement performance (Glatz et al., 2022). However, the same students that failed Communications tasks within an OSCE were still found to meet minimum expectations or higher by supervisors (Meghani & Ferm, 2021). Similarly, OSCE scores converge with other measures of competence for some clinical skills (Diagnosis and Clinical Assessment) but not others (Ethical standard & Communication skills) (Meghani & Ferm, 2021). Lastly, Sundström and Hakelind (2022) conducted an OSCE where they assessed the generalizability of performance scores across stations and found large variability in scores across and within stations, concluding that some tasks were either too difficult or marked less consistently. This large variability in scores across different OSCE stations could indeed mirror the real-world variability in clinical scenarios that students will face in practice. Thus, the measurement of reliability in this context should not be solely predicated on consistency across stations. Instead, it could be more beneficial to evaluate reliability in terms of a student's ability to consistently meet the demands of diverse clinical scenarios. This perspective not only guides the design of OSCEs but also assists in interpreting their outcomes. As such, Sundström and Hakelind found the reliability of their OSCE to be on par with OSCEs in other mental health areas, such as psychiatry (α = 0.51; Hodges et al., 1998) and social work (α = 0.55; Bogo et al., 2011), but far below the acceptable reliability values within medicine (α = 0.66–0.80; Brannick et al., 2011; Khan et al., 2013a). Combined, the initial evidence suggests that while OSCEs may enhance educational experiences, they should be interpreted with caution.

More importantly, it is necessary to determine whether these findings suggest that OSCEs are more effective than existing assessment methods in distinguishing students who have demonstrated the required competency, or if some psychological competencies are less reliably assessed than others. Systematic reviews on the reliability of OSCEs administered within medical schools confirmed that skills which require subjective and idiosyncratic judgments on performance such as communication skills and cultural competence are scored less reliably on rating scales (Brannick et al., 2011; Cömert et al., 2016; Halman et al., 2020; Piumatti et al., 2021). This presents further challenges for the assessment of psychological competencies which require intuitive and situational judgments (Ilgen et al., 2015). Balancing reliability and validity in OSCEs is crucial. Over-emphasis on reliability, through overly objective rubrics, may oversimplify complex skills and risk compromising predictive validity. Ideally, assessments should be both reliable (i.e., ensuring consistent results) and valid (i.e., accurately measuring the intended skills). Achieving this balance necessitates thoughtful design, clear conceptualization of the assessment's goals, and commitment to ongoing review for iterative improvements. However, there are no studies or reviews investigating the psychometric qualities of OSCEs within psychology programs.

Objective

In the absence of direct comparison data, there remains a need to evaluate OSCE quality. Evidence suggests there is a relationship between the psychometric properties of an OSCE (which includes reliability, validity, and educational impact), and the methodology employed in its implementation (Harden, 2016; Khan et al., 2013b). This may present a way forward, through examination of quality by adherence to QAGs. Higher degrees of adherence are likely to represent psychometrically sound OSCEs, while lower degrees may indicate poor psychometric quality. Accordingly, the current systematic review assessed OSCEs within psychology programs against QAGs to indicate the overall psychometric quality.

Methods

Protocol and Registration

A systematic review was conducted according to the Preferred Reporting Items for Systematic Review for Meta-Analysis (PRISMA) recommendations and the protocol was registered with PROSPERO.

Eligibility Criteria

The studies were included based on the following criteria: (1) peer reviewed articles that were published in English; (2) psychology students; (3) OSCE style assessment or similar; and (4) OSCE examination data or feedback is provided. Studies that provided only a general overview of the OSCE or contained a mixed-discipline cohort of students were excluded from the systematic review. It was necessary to screen out studies that did not provide examination data, so that implementation was able to be assessed. Similarly, as the clinical competencies of psychologists differ from those of other mental health professions such as counselors, psychiatrists, social workers, and mental health nurses, it was important to only include studies which included a sample of psychology students within a psychological training setting.

Information Sources

The key indexing databases of Scopus and Web of Science were used to identify relevant studies, with additional searches repeated in psychology specific databases PsycInfo, PsycArticles, and ProQuest Psychology. Relevant references of studies were manually searched. The final search was conducted in September 2022.

Search Strategy

Initial searches were designed in consultation with a research librarian with expertise in literature searching using key terms “psychology program”, “objective structured clinical examination”, “clinical competency”, and “quality”. APA Thesaurus of Psychological Index Terms was used to identify additional relevant terms. The initial strategy was revised upon evidence that psychotherapy and simulation-based assessments met inclusions criteria. Each term and its synonyms were searched independently within abstracts, headings, and study texts and then later combined. Reference lists were also manually searched from retrieved articles (see Table 1 for sample search strategy).

Table 1.

Example Search Strategy Used in the Electronic Database Search.

Database	Search Terms
PsycINFO	(ab(“Psycholog* Program” OR “Psycholog Education” OR “Psycholog* training” OR “Psycholog* student” OR “Psycholog graduate” OR “Psycholog Cohort”) OR su(“Psycholog* Program” OR “Psycholog Education” OR “Psycholog* training” OR “Psycholog* student” OR “Psycholog graduate” OR “Psycholog Cohort”)) AND (ab(“OBJECTIVE STRUCTURED CLINICAL EXAMINATION” OR OSCE OR “STANDARD EXAM” OR “STANDARD PATIENT EVALUATION EXAM” OR SPEE OR “SIMULAT EXAM” OR “OBJECTIVE EXAM” OR “STANDARD* PATIENT PROTOCOL” OR “STANDARD* PATIENT” OR “SIMULATED PATIENT” OR “VIRTUAL PATIENT”) OR su(“OBJECTIVE STRUCTURED CLINICAL EXAMINATION” OR OSCE OR “STANDARD* EXAM” OR “STANDARD PATIENT EVALUATION EXAM” OR SPEE OR “SIMULAT EXAM” OR “OBJECTIVE EXAM” OR “STANDARD* PATIENT PROTOCOL” OR “STANDARD* PATIENT” OR “SIMULATED PATIENT” OR “VIRTUAL PATIENT”)) AND (ab(COMPETEN OR KNOWLEDGE OR SKILL* OR “PROFESSIONAL STANDARD” OR EXPECTATION) OR su(COMPETEN OR KNOWLEDGE OR SKILL* OR “PROFESSIONAL STANDARD” OR EXPECTATION)) AND (ab(QUALITY OR PSYCHOMETRIC OR VALID* OR RELIAB* OR EVALUAT* OR ACCURA* OR CONSISTENT) OR su(QUALITY OR PSYCHOMETRIC OR VALID* OR RELIAB* OR EVALUAT* OR ACCURA* OR CONSISTENT*))

Database

Search Terms

PsycINFO

(ab(“Psycholog* Program*” OR “Psycholog* Education” OR “Psycholog* training” OR “Psycholog* student*” OR “Psycholog* graduate*” OR “Psycholog* Cohort”) OR su(“Psycholog* Program*” OR “Psycholog* Education” OR “Psycholog* training” OR “Psycholog* student*” OR “Psycholog* graduate*” OR “Psycholog* Cohort”)) AND (ab(“OBJECTIVE STRUCTURED CLINICAL EXAMINATION*” OR OSCE OR “STANDARD* EXAM*” OR “STANDARD* PATIENT EVALUATION EXAM*” OR SPEE OR “SIMULAT* EXAM*” OR “OBJECTIVE EXAM*” OR “STANDARD* PATIENT PROTOCOL” OR “STANDARD* PATIENT*” OR “SIMULATED PATIENT*” OR “VIRTUAL PATIENT*”) OR su(“OBJECTIVE STRUCTURED CLINICAL EXAMINATION*” OR OSCE OR “STANDARD* EXAM*” OR “STANDARD* PATIENT EVALUATION EXAM*” OR SPEE OR “SIMULAT* EXAM*” OR “OBJECTIVE EXAM*” OR “STANDARD* PATIENT PROTOCOL” OR “STANDARD* PATIENT*” OR “SIMULATED PATIENT*” OR “VIRTUAL PATIENT*”)) AND (ab(COMPETEN* OR KNOWLEDGE OR SKILL* OR “PROFESSIONAL STANDARD*” OR EXPECTATION) OR su(COMPETEN* OR KNOWLEDGE OR SKILL* OR “PROFESSIONAL STANDARD*” OR EXPECTATION)) AND (ab(QUALITY OR PSYCHOMETRIC* OR VALID* OR RELIAB* OR EVALUAT* OR ACCURA* OR CONSISTENT*) OR su(QUALITY OR PSYCHOMETRIC* OR VALID* OR RELIAB* OR EVALUAT* OR ACCURA* OR CONSISTENT*))

Selection Process and Study Risk of Bias Assessment

All search results were imported into a reference management software (EndNote) and then into a web database (Covidence). Two reviewers independently performed title and abstract, and full text screening against inclusion and exclusion criteria, after removing duplicates and those studies which did not meet inclusion. Disagreement between reviewers was resolved through discussion and consensus was reached across all studies.

Appraisal of Methodological Quality

Methodological quality and risk of bias of included studies were assessed using the Mixed Methods Appraisal Tool (MMAT), version 2018 (Hong et al., 2018). The MMAT is a critical appraisal tool designed for systematic mixed study reviews and was chosen based on usage within other systematic reviews of OSCEs in other professions (Boland et al., 2020; Nataša Mlinar et al., 2017; Vincent et al., 2022) and appropriateness given the diverse study designs included within the review. For quantitative studies, the MMAT evaluates sampling, construct measurement, nonresponse bias, and statistical analyses, and for qualitative studies it evaluates data collection, interpretation of findings, and overall coherence. Additionally, if studies utilized a mixed methods approach the overall study was evaluated for rationale, integration, and inconsistencies. An overall score was calculated based on the number of criteria satisfied by the included studies (maximum of 5) and represented by a rating indicative of risk of bias; “High” (0 to 1), “Medium” (2 to 3), and “Low” (4 to 5). For mixed methods studies, the appraisal was conducted on the basis that overall quality of a combination cannot exceed the quality of its weakest component (Hong et al., 2019). Thus, the overall quality score is the lowest score of the study components. The initial quality assessment was independently conducted by AV and RLD.

Data Extraction and Synthesis

Data was extracted by one author (AV) using a custom designed template. Items included study characteristics (country, year, study type, sample, aim, and key findings), commonly reported OSCE characteristics, and QAGs (Khan et al., 2013b).

Data Synthesis

Quantitative synthesis of studies was not possible as studies were predominantly quasi-experimental, uncontrolled studies using convenience samples, and were heterogeneous in the implementation and reporting of OSCEs (Cheung & Vijayakumar, 2016). As such, findings are summarized in a narrative form rather than using direct comparison (such as a meta-analysis) in accordance with the synthesis without meta-analysis guidelines (Campbell et al., 2020). Information is presented within text and tables to summarize and explain characteristics of the included studies.

Assessment of OSCE Implementation

Implementation of OSCEs was assessed against the eight QAGs (Khan et al., 2013b). Adherence to the guidelines was scored according to the number of criterions met and represented with a description. In the absence of existing cut off descriptors, scores were categorized according to the number of criteria satisfied; Poor (0 to 2), Fair (3 to 5), and Good (6 to 8).

Results

Study Selection

The electronic database search yielded 514 studies. Of these, 345 duplicates were removed and 119 were excluded following title and abstract screening. The full texts of the remaining 45 studies were assessed for eligibility. Of the 50 studies, 38 were excluded by applying the inclusion and exclusion criteria. This resulted in 12 studies for the systematic review. Manual searches of reference lists yielded nil results. Figure 2 presents the study selection process as a PRISMA flowchart.

Figure 2.

PRISMA flow chart depicting study selection process.

Study Characteristics

Studies were conducted across a number of countries. In descending order these included Australia (n = 5), Sweden (n = 3), North America (n = 2), the United Kingdom (n = 1), and Germany (n = 1). The study samples ranged from 9 to 91 students and varied in the inclusion of students (n = 8) or students and examiners (n = 4). OSCEs were delivered within postgraduate (masters) degrees (n = 7), PhD degrees (n = 3), or both (n = 2). The included studies were mixed in their respective designs. Mixed methods designs were most dominant (n = 6), followed by descriptive quantitative studies (n = 4) and lastly, qualitative (n = 2). Summary of study characteristics is provided in Table 2.

Table 2.

Characteristics of Included Studies.

First author (year)	Country	Design	Sample	Level	Students
Nikendei (2019)	Germany	Mixed methods	Students	Postgraduate (Masters)	18
Yap (2012)	Australia	Mixed methods	Students	PhD	9
Sundstrom (2022)	Sweden	Quantitative (Descriptive)	Students	Postgraduate (Masters)	55
Sheen (2021)	Australia	Mixed methods	Students	Postgraduate (Masters); PhD	12
Sheen (2015)	Australia	Mixed methods	Students; examiners	Postgraduate (Masters); PhD	49
Roberts (2020)	Australia	Qualitative	Students	Postgraduate (Masters)	18
Roberts (2017)	Australia	Mixed methods	Students; examiners	Postgraduate (Masters)	14
Melluish (2007)	The United Kingdom	Qualitative	Students; examiners	PhD	37
Meghani (2021)	North America	Quantitative (Descriptive)	Students	PhD	44
Hakelind (2022)	Sweden	Mixed methods	Students; examiners	Postgraduate (Masters)	51
Goodie (2022)	North America	Quantitative (Descriptive)	Students	Postgraduate (Masters)	12
Glatz (2022)	Sweden	Quantitative (Descriptive)	Students	Postgraduate (Masters)	91

Critical Appraisal of Study Methodology

Studies were critically appraised utilizing the MMAT (Hong et al., 2018). With the exception of one (Yap et al., 2012), studies that utilized qualitative designs and elements were more likely to satisfy MMAT. Only four studies met criteria for appropriate statistical analysis (4.5) out of a potential of 10 studies (Goodie et al., 2021; Roberts et al., 2017; Sheen et al., 2021; Sundström & Hakelind, 2022). Incongruence was observed between quantitative and qualitative criterions within six studies which utilized mixed methods designs (criterion 5.5). The weakest component rule effected all six studies, resulting in lower overall scores (Hakelind & Sundström, 2022; Nikendei et al., 2019; Roberts et al., 2017; Sheen et al., 2015, 2021; Yap et al., 2012). Of these six studies, three performed better on qualitative criterions (Hakelind & Sundström, 2022; Sheen et al., 2015, 2021) and two on quantitative (Roberts et al., 2017; Yap et al., 2012). Overall, risk of bias was rated as “Low” within seven studies (Goodie et al., 2021; Melluish et al., 2007; Nikendei et al., 2019; Roberts et al., 2017, 2020; Sheen et al., 2021; Sundström & Hakelind, 2022), two as “Medium” (Glatz et al., 2022; Meghani & Ferm, 2021), and three as “High” (Hakelind & Sundström, 2022; Sheen et al., 2015; Yap et al., 2012). See Table 3 for critical appraisal of studies according to the MMAT (Hong et al., 2018).

Table 3.

Critical Appraisal of Studies Using the MMAT.

			Criteria from the MMAT
	Qualitative						Quantitative (Descriptive)						Mixed methods
	1.1	1.2	1.3	1.4	1.5	Total	4.1	4.2	4.3	4.4	4.5	Total	5.1	5.2	5.3	5.4	5.5	Total	Risk of bias**
Nikendei (2019)	1	1	1	1	1	5	1	1	1	1	0	4	1	1	1	1	1	5	Low
Yap (2012)	1	0	0	0	0	1	1	1	1	0	0	3	1	1	1	1	0	4	High
Sheen (2021)	1	1	1	1	1	5	1	1	1	1	1	5	1	1	1	1	0	4	Low
Sheen (2015)	1	1	1	1	1	5	1	1	0	1	0	3	0	1	0	0	0	1	High
Roberts (2017)	1	0	1	1	1	4	1	1	1	1	1	5	1	1	1	1	0	4	Low
Hakelind (2022)	1	0	1	1	1	4	1	1	1	0	0	3	1	0	0	0	0	1	High
Sundstrom (2022)	-	-	-	-	-	-	1	1	1	1	1	5	-	-	-	-	-	-	Low
Meghani (2021)	-	-	-	-	-	-	1	1	1	1	0	4	-	-	-	-	-	-	Medium
Goodie (2022)	-	-	-	-	-	-	1	1	1	1	1	5	-	-	-	-	-	-	Low
Glatz (2022)	-	-	-	-	-	-	1	1	1	0	0	3	-	-	-	-	-	-	Medium
Roberts (2020)	1	0	1	1	1	4	-	-	-	-	-	-	-	-	-	-	-	-	Low
Melluish (2007)	1	1	1	1	1	5	-	-	-	-	-	-	-	-	-	-	-	-	Low

Note. ** Overall risk for mixed methods study utilized the weakest component rule (i.e., QUAL = 1 OR QUANT = 1 OR MM = 1) (Hong et al., 2019).

MMAT: Mixed Methods Appraisal Tool.

OSCE Characteristics

Design and Content

OSCE content development varied across studies. All studies developed stations according to program curriculums, psychological literature, or both and ranged in the number of OSCE stations from 1 to 12. See Table 4 for OSCE characteristics.

Table 4.

OSCE Characteristics Within Studies.

	Design and content	Examiner			Standardized patient	Measures
	Primary content	Stations	#	Characteristics	Training	Characteristics	Training	Objective	Other
Nikendei (2019)	PL	5	4	E, CP	Training manual	TA	ST, PT	GRS	FQ
Yap (2012)	PC	7	7	E, CP	Briefing	TA	ST	*	VM, FQ
Sundstrom (2022)	PC	10–12	13	F, CP	Practice OSCE	*	GVB	CH, GRS	FQ
Sheen (2021)	PL, PC	1	6	E, CP	Training workshop	*	*	GRS	VM
Sheen (2015)	*	2–4	5	F, CP	*	*	*	*	FQ
Roberts (2020)	PL	2	4	F, CP	Training workshop	TA	ST, FT	GRS	FQ
Roberts (2017)	PL	3	6	F, CP	Training workshop	TA	ST	GRS	FQ
Melluish (2007)	PC	*	7	F, CP	*	TA	ST, PT	GRS	FQ
Meghani (2021)	PC	5	4	F, CP	Training workshop	TA	ST, FT, PT	GRS	FQ
Hakelind (2022)	PC	10	9		Briefing	TA	ST	CH, GRS	FQ
Goodie (2022)	PC	4	8	F, CP	Training manual	TA	ST, FT, PT	GRS	FQ
Glatz (2022)	PC	10	NA	F, CP	*	TA	ST, PT	GRS	FQ

Note. * Not reported.

CH: checklist; CP: clinical psychologists; E: external examiners; F: faculty staff; FQ: Feedback Questionnaire; FT: feedback training; GRS: Global Rating Scale; GVB: General Vignette Briefing; OSCE: Objective Structured Clinical Examination; PC: program curriculum; PL: psychological literature; PT: pilot training; ST: standardization training; TA: trained actors; VM: validated measure (State-Trait Anxiety Inventory [STAI] or Cognitive Therapy Scale - Revised [CTS-R]).

Examiner Characteristics

Examiners largely shared the same characteristics across studies. Numbers ranged from four to 13 per examination. Three studies used external examiners (Nikendei et al., 2019; Sheen et al., 2021; Yap et al., 2012), while others either used faculty staff, experienced clinical psychologists, or both. Three studies did not report training protocols (Glatz et al., 2022; Melluish et al., 2007; Sheen et al., 2015) and for those that did, training workshops was the most common type of training provided, followed by training manuals, briefings, and practice OSCEs.

SP Characteristics

Apart from studies that did not report on characteristics (Sheen et al., 2015, 2021; Sundström & Hakelind, 2022), all studies hired trained actors as SPs. Actors were recruited from SP specialized university databases and trained to varying degrees. The most common was standardization training, while close to half incorporated pilot testing of clinical vignettes and few provided feedback training, whereby the SP provided direct feedback to the student following the OSCE. Two studies used all training approaches (Goodie et al., 2021; Meghani & Ferm, 2021).

Type of Objective Measures

Apart from two studies which did not report objective measures (Sheen et al., 2015; Yap et al., 2012), all studies used global rating scales as their objective measure of clinical competency. Global rating scales are individualized objective measures tailored to the specific requirements of the OSCE. Global rating scales can assess a broad range of outcome domains, including professionalism, clinical skills, knowledge, and reasoning. Two studies also used checklists (Hakelind & Sundström, 2022; Sundström & Hakelind, 2022), Sheen et al. (2021) used a validated measure to assess clinical intervention skills and Yap et al. (2012) assessed levels of anxiety. Ten studies paired objective measures with feedback measures. Additionally, all studies bar one (Sheen et al., 2021) used a student feedback questionnaire. The questionnaire typically explored the students’ perception of the design, validity, and utility of the OSCE assessment. For instance, in Roberts et al. (2017) students were asked to reflect on statements such as “the feedback I received as part of the OSCE facilitated my learning.”

Clinical Competencies

The clinical competencies assessed within programs varied, however some competencies were assessed more frequently than others. The most common competencies assessed throughout in descending order were intervention skills (motivational interviewing, psychodynamic, cognitive behavioural therapy [CBT], psychoeducation, and functional analysis), clinical interviewing and assessment (mental status examination [MSE], clinical interviewing, diagnostic assessment, differential diagnosis, risk assessment), communication and interpersonal skills, and psychological testing. Figure 3 represents the frequency of competencies assessed across all studies.

Figure 3.

Frequency chart of competencies assessed within studies. Note. Subcompetencies within each heading are grouped as follows Intervention Skills (motivational interviewing, psychodynamic therapy, cognitive behavioral therapy, psychoeducation, and functional analysis), Clinical Interviewing and Assessment (mental state examination, clinical interviewing, diagnostic assessment, differential diagnosis, and risk assessment), Communication and Interpersonal Skills, Psychological Testing, Other (family assessment, knowledge, reasoning, ethical and legal standards, cultural diversity and note writing).

Adherence to QAGs

Assessment of studies revealed varied adherence to Khan et al.'s QAGs (Khan et al., 2013b). No examination satisfied all guidelines. Instead, studies ranged between one to six guidelines being fulfilled with an overall average close to five.¹ In accordance with scoring criteria, degree of adherence to guidelines was rated as “Good” for four studies (Goodie et al., 2021; Meghani & Ferm, 2021; Nikendei et al., 2019; Sundström & Hakelind, 2022) and “Fair” for seven studies (Glatz et al., 2022; Hakelind & Sundström, 2022; Melluish et al., 2007; Roberts et al., 2017, 2020; Sheen et al., 2021; Yap et al., 2012). Overall adherence across studies was “Fair.” The least reported criterions include; Validity (Q1), Peer Review of Stations (Q2), External Examiners (Q4), and Post-Hoc Psychometrics (Q7). On the other hand, all studies met criteria for Evaluation (Q8), followed by Objective Measures (Q3), Standardised Patient Training (Q6), and more than half satisfied Examiner Training (Q5). See Table 5 for study adherence to QAGs.

Table 5.

Study Adherence to Quality Assurance Guidelines (Khan et al., 2013b).

	Quality Assurance Guidelines									Degree of adherence
	Q1	Q2	Q3	Q4	Q5	Q6	Q7	Q8		Criteria (/8)	Description
Nikendei (2019)	X	^	X	X	X	X	^	X		6	Good
Yap (2012)	^	^	^	X	X	X	^	X		4	Fair
Sundstrom (2022)	^	X	X	^	X	X	X	X		6	Good
Sheen (2021)	^	^	X	X	X	^	X	X		5	Fair
Sheen (2015)	^	^	^	^	^	^	^	X		1	-
Roberts (2020)	^	^	X	^	X	X	^	X		4	Fair
Roberts (2017)	^	^	X	^	X	X	X	X		5	Fair
Melluish (2007)	^	^	X	^	^	X	^	X		3	Fair
Meghani (2021)	X	X	X	^	X	X	^	X		6	Good
Hakelind (2022)	^	^	X	^	^	X	^	X		3	Fair
Goodie (2022)	^	X	X	^	X	X	X	X		6	Good
Glatz (2022)	X	^	X	^	^	X	^	X		4	Fair
# of studies**	3	3	10	3	8	10	4	12	Average	4.8*	Fair

Notes. * Average degree of adherence across all studies. ** Number of Objective Structured Clinical Examinations that fulfilled criterion.

Discussion

The research on OSCEs highlights that specific tool construction and application can influence the correlation, validity, and reliability of this method of evaluation. Accordingly, the current systematic review aimed to infer the quality of OSCE assessments within the professional psychology training field by assessing implementation of OSCEs against QAGs (Khan et al., 2013b). To this end, this review found that overall adherence was “Fair” and by association, the same could be said for the likely psychometric quality of the exams. Analysis of QAGs met across the studies revealed varied patterns of adherence for particular components. These are discussed in descending order of adherence.

Evaluation

In contrast to post hoc analysis, evaluation considers the qualitative experience of key stakeholders within the examination process, to improve the quality and organization of future examinations (Oxlad et al., 2022). According to this definition, this review found that all studies engaged feedback from the key participants, with some exceptions. Feedback was not obtained from SPs and how it was used as part of the quality improvement process, was not discussed. This is a lost opportunity for programs, as the benefits often associated with simulation are gated behind the realistic recreation of assessment scenarios within the examination (Yap et al., 2012).

Balancing the objective, standardized and authentic aspects of OSCEs is critical. Therefore, it is recommended that post completion of OSCEs, studies continue to invite examiners, students, and SPs to provide feedback on their experience. Factors such as flow of the examination, clarity within instruction, appropriateness of tasks and real-world applicability of stations should be gathered at this time (Daniels & Pugh, 2018; Halman et al., 2020). SP feedback also has utility in forecasting experiences of students as practising psychologists. Questions such as “Based on your experience within this exam, how comfortable would you feel seeing this psychologist in real life?” and “Based on your experience within this exam, how likely would you refer a friend or family member to see this psychologist?”

Objective Measures

Most studies utilized custom designed global rating scales (GRS). GRS are better at assessing the quality of multiple skills performed concurrently (Ilgen et al., 2015) however are questionable in terms of reliability for some skills which require more subjective and pragmatic judgment such as communication and interpersonal skills (Brannick et al., 2011; Cömert et al., 2016; Piumatti et al., 2021). Assessment of psychological competencies is more likely to require idiosyncratic judgments in comparison to some other health professions (Goodie et al., 2021). One strategy is to utilize multiple metrics (Ilgen et al., 2015). Checklists and rating scales are commonly used to mark different types of assessments including OSCEs (Wood & Pugh, 2020). Subjectivity can be mitigated by adding task specific checklists to global rating scales which together improve overall validity (Khan et al., 2013a). Another strategy is to provide a more definitive rubric that has been adapted to the competency or task being assessed and contain behavioral anchors allowing for greater discrimination by assessors (Donohoe et al., 2020).

Examiner Training

Majority of examiners underwent some sort of training in advance of the OSCE however reporting of training content varied. Nevertheless, training examiners has been shown to reduce variation and consistency of scoring (Schüttpelz-Brauns et al., 2019). This is due to reliability of scores depending on examiner knowledge of OSCEs (Khan et al., 2013b). Accordingly, standardization of training protocols according to key learning outcomes may assist with maintaining quality, identifying knowledge gaps, and minimizing assessor bias. Several authors have drafted key learning outcomes for examiners within OSCEs (Khan et al., 2013b; Newble, 1988; Schüttpelz-Brauns et al., 2019).

Standardized Patients

All studies hired actors as SPs and provided different types of training. The variation in intensity and type is consistent with the understanding that SPs will require varying degrees of input to find a balance between portraying clinical conditions in a realistic, repeated, and reliable manner (Cleland et al., 2009). This is critical due to psychological conditions and vignettes varying significantly to roles designed for medical examinations from which SPs are usually hired (Goodie et al., 2021). Consequently, it would greatly benefit training programs to develop a database of SPs with expertise in roleplaying psychological conditions across different age ranges and severities. Furthermore, developing key competencies for SPs may allow for staff and students to be trained without compromising on authenticity (Goodie et al., 2021). And last, all SP performances need to be quality tested in advance of OSCEs to ensure portrayals are reliable and realistic (Khan et al., 2013b).

Validity

There was evidence that suggested programs were invested in designing and implementing OSCEs as authentically as possible. This included narrative descriptions of rigorous protocols followed throughout the implementation process to ensure alignment with program curriculums and broader psychological competencies. This is necessary due to the psychology OSCEs having relied on descriptions from medical and other related professional programs which do not cover the range of competencies required in clinical psychology such as psychometric assessments and model-specific clinical intervention skills (Goodie et al., 2021; Harden, 2016). Furthermore despite current findings around face validity (Roberts et al., 2017; Sheen et al., 2015; Yap et al., 2012), a limitation of “face-content” is that it primarily is an indicator of fairness and relevance and not convincing unless supported by further evidence (Downing & Haladyna, 2004). Therefore, in the absence of quantifiable data, there is still uncertainty as to whether these OSCEs are valid for the field.

When assessing or evaluating OSCEs or pilot examinations, correlation with other measures of competence are a good indicator of validity (Khan et al., 2013a). This remains true in light of some competencies being assessed differently depending on whether they are delivered within an exam or demonstrated within an OSCE (Halman et al., 2020). Therefore, one recommendation includes, comparing OSCE results with other measures of competency (Vincent et al., 2022). Second, like the studies in this review, psychology programs need to blueprint, OSCE stations and task against the program curriculum or standards relevant to the psychologist profession (Sundström & Hakelind, 2022). Third, given that some competencies are assessed less reliably than others, it is necessary to determine which psychological competencies should be assessed by OSCEs. And last, investigating the association between OSCE scores and future placement performance or licensing exams has been recommended by several authors (Meghani & Ferm, 2021; Sheen et al., 2015; Sundström & Hakelind, 2022).

Peer Review of Stations

Peer Review of Stations was poorly adhered to throughout the review. Testing the test is vital because the delivery of OSCEs is complex multifaceted, resource intensive, and costly projects in relation to other assessments (Daniels & Pugh, 2018). The large number of moving parts is likely to contribute to error variance, unless identified and corrected in advance (Pell et al., 2010). As such, peer review should screen (using either a checklist or standardized review form) for the most likely sources of bias such as proactive simulated patients whose questions act as prompts to the students, inconsistent application of marking rubrics by assessors, design, equipment and instructions across stations and more (Pell et al., 2010). And as with all large-scale events, the flow and smooth operation on the day is directly correlated with the number of rehearsals and “dry runs” performed in advance of the OSCE (Khan et al., 2013b).

External Examiners

Most studies utilized faculty staff and did not hire external examiners. This presents a threat to the “objective” in OSCEs as studies investigating supervisor feedback have found ratings to be positively skewed, unreliable, influenced by student characteristics and contextual factors (Gonsalvez & Freestone, 2007). Accordingly, staff who are assessing familiar or known students may be more likely to feel pressure to pass them, even if they do not fully meet the competencies (Meghani & Ferm, 2021). In contrast, external examiners are more likely to be able to apply assessment rubrics in a rigorous, objective and fair manner (Khan et al., 2013b). On the other hand, the convenience and cost saving benefits of using faculty staff can be maintained if learning outcomes of OSCE examiners are demonstrated ahead of OSCEs (such as through pilot OSCEs and trial scoring) (Khan et al., 2013b). Secondly, utilizing multiple raters at each station is likely to improve reliability (Brannick et al., 2011). Thirdly, video or audio recording student performances allows for blinding procedures to be implemented to negate the influence of examiner relationships with students (Roberts et al., 2020). Another potential cost effective option is to train student raters as recent studies have found comparable inter-rater reliabilities to expert raters (Donohoe et al., 2020).

Post Hoc Psychometrics

Post hoc analysis of OSCEs was poorly represented across all included studies. Where data was provided, it was either insufficient or inconsistently reported to support the reliability or validity of the exam. This finding is consistent with those of a 2012 review of 104 medical OSCEs which reported that close to two-thirds of all studies failed to report on validity and reliability data (Patricio, 2012). Nevertheless, the heterogeneity in reporting can be understood as resulting from the lack of standardized procedures for the measurement and collection of data (Brannick et al., 2011), structure when reporting and standardized quality metrics (Patricio, 2012). Accordingly, standardized guidelines on reporting examination data would allow for meaningful analysis at the institution and broader psychological field level. Pell et al. (2010) provides OSCE examination metrics that can be used to measure the overall quality of assessments. Pell identified six key metrics including measuring internal consistently, the coefficient of determination, and between group variations. While based on medical OSCEs, they provide a checklist that guides the reporting of results and factors to consider when measuring and analyzing the effectiveness of OSCEs. Until psychology-specific metrics emerge, it is strongly recommended for future programs to design, analyze, and report according to these metrics. These metrics may also be useful in facilitating the development of psychology-specific criteria.

Methodological Quality and Review Limitations

The review findings should be considered in light of possible risk of bias in the included studies as most indicated some risk of bias. Methodological quality and risk of bias of included studies was assessed using the MMAT, version 2018 (Hong et al., 2018). Overall, the majority of studies were categorized as “Low” to “Medium” risk of bias. This is promising, as poor methodological quality is a common criticism found within the literature across programs (Bobos et al., 2021; Brannick et al., 2011; Ilgen et al., 2015). Additionally quantitative studies and quantitative components of mixed methods studies invited more risk. There may be several explanations for this finding. Appropriateness and feasibility were significant considerations around which many study designs were created. The novel and exploratory nature of the aims may have resulted in confounds and biases not being controlled. Second, post hoc psychometrics was a weakness across all studies however more so impacted quantitative studies. Third, mixed method designs primarily focused on qualitative objectives such as perceptions and experiences of OSCEs and accordingly scored higher on qualitative components.

There were also several limitations of this review. In particular, the small sample size of studies limits generalizability of findings and the inclusion of varied study designs makes comparisons subjective. However, this was to be expected given the dearth of literature examining OSCEs within psychology programs (Sheen et al., 2015). There was also a need for review of the study selection criteria in response to insights derived from the results. Expanding inclusion criteria may have facilitated greater coverage, however without an overriding set of principles guiding OSCE development, heterogeneity in the reporting of data would nevertheless complicate meaningful comparison making.

While a significant strength of the current review was its broad examination of studies against OSCE specific QAGs, the way these guidelines were interpreted and appraised may be open to bias. The current review took an approach to note whether the guidelines were reported (like a checklist) instead of quantifying the extent of adherence to the guidelines. While this may have prevented extraction of more meaningful data it does limit the degree of subjectivity. In saying that, the finding that OSCEs have “Acceptable” psychometric quality is consistent with findings from other reviews (Bobos et al., 2021; Brannick et al., 2011; Cömert et al., 2016; Ilgen et al., 2015; Yap et al., 2021).

Implications, Recommendations, and Future Research

The current review has several implications. First, there is a need to authenticate perceptions of validity, authenticity, and reliability within OSCEs with quantitative data. Second, greater focus needs to be placed on conducting statistical analysis of OSCE results and using the outcomes to enhance the quality of the examinations. And third, psychology programs should aim to standardize OSCE implementation and reporting of results.

While this review recommends utilizing the QAGs of Khan et al. (2013b), there are psychology-specific recommendations available that carry similar themes. These were not chosen for the study as they were not derived from analysis of examination data and applied to other assessment types. For example, Yap et al. (2021) provides seven “tips” for implementing psychological OSCEs and Paparo et al. (2021) provides a set of guidelines for simulated learning assessments. It is important that psychology programs reach a consensus on the optimum guidelines and work toward standardizing the implementation and reporting of future clinical examinations in a consistent way with quality assessment and adherence to standardized indicators.

One of the major criticisms of OSCEs is the financial cost in comparison to other forms of assessment (Yap et al., 2012). OSCEs are expensive in terms of resources, time, and personnel required (Kaslow et al., 2009). However, they also enhance students’ learning experiences and provide an opportunity to safely translate knowledge into practice (Sheen et al., 2021). Although OSCEs are an expensive assessment if the decision is made to assess using OSCE then an “all-in” mindset is required in order to ensure maximum return on investment. This will ensure adequate allocation of resources toward each guideline and increase the likelihood that benefits will be realized (Khan et al., 2013b). Furthermore, there are cost-effective strategies available to assist with overcoming barriers to implementation. For instance, psychology programs could share a library of vignettes, OSCE stations and objective measures. Similarly, objectivity can be enhanced by creating an external examiner and SP database where faculty from one program can assess OSCEs from another program and vice versa. Examiner and SP training can be run across institutions and should aim to satisfy learning outcomes proven to improve OSCE implementation (Khan et al., 2013a).

Nevertheless, the current findings suggest that there is still significant scope and need for studies to pilot or evaluate OSCEs using quantitative designs and report data in accordance with guidelines (Pell et al., 2010). Other avenues of investigation include investigating which psychological competencies are best assessed within the OSCE format. For instance, this could involve conducting studies to evaluate the effectiveness of OSCEs in assessing competencies such as crisis intervention skills or therapeutic alliance building, compared to traditional methods like written examinations or essays. A further objective should be to validate the psychometric properties of GRS and checklists used within OSCEs, and demonstrating validity through convergence with other measures of clinical competency. This might involve comparing OSCE performance with performance on assessments such case-based discussions of clinical decision making. Moreover, longitudinal research could be conducted to assess if OSCE performance during training predicts future performance during clinical placements.

Conclusion

The current systematic review broadly examined OSCEs within psychology programs against a set of QAGs, finding overall adherence to be “Fair.” This finding raises concerns around current decisions being made on student competency within psychology programs using OSCEs. At the same time, there is also promise, when recognizing that OSCEs are relatively new within the field and there is substantial scope for refinement (Sheen et al., 2015). It is hoped that this review will motivate future psychology programs to standardize development, implementation, and evaluation of OSCEs.

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Ryan L. Davies

Kylie Rice

Notes

Author Biographies

Azaan Vhora is a private practitioner with a masters in Psychology from the University of New England. His professional interests include understanding therapist factors and competencies which impact on psychological outcomes such as self-reflection and supervision.

Ryan Davies is an academic in Clinical Psychology at the University of New England. Ryan has completed postgraduate training in Clinical Psychology and is an experienced practitioner and researcher. Ryan's research interests include assessment and evaluation in postgraduate psychology training.

Kylie Rice is an academic at the University of New England in Australia. She is trained in clinical psychology and her research areas include psychology within educational contexts. Kylie teaches postgraduate Clinical Psychology at the University of New England, and is a leader in pedagogical approaches to psychology teaching and learning.

References

Australian Psychology and Accreditation Council [APAC] (2019). Accreditation standards for psychology rogams. https://www.psychologycouncil.org.au/sites/default/files/public/Standards_ 20180912_Published_Final_v1.2.pdf

Barzansky

Etzel

S. I.

(2018). Medical schools in the United States, 2017–2018. JAMA, 320(10), 1042–1050. https://doi.org/10.1001/jama.2018.11679

Beccaria

(2013). The viva voce as an authentic assessment for clinical psychology students. Australian Journal of Career Development, 22(3), 139–142. https://doi.org/10.1177/1038416213498713

Bobos

Pouliopoulou

D. V.

Harriss

Sadi

Rushton

Macdermid

J. C.

(2021). A systematic review and meta-analysis of measurement properties of objective structured clinical examinations used in physical therapy licensure and a structured review of licensure practices in countries with well-developed regulation systems. PLOS ONE, 16(8). https://doi.org/10.1371/journal.pone.0255696

Bogo

Regehr

Katz

Logie

Tufford

Litvack

(2012). Evaluating an Objective Structured Clinical Examination (OSCE) adapted for social work. Research on Social Work Practice, 22(4), 428–436. https://doi.org/10.1177/1049731512437557

Bogo

Regehr

Logie

Katz

Mylopoulos

Regehr

(2011). Adapting Objective Structured Clinical Examinations to assess social work students’ performance and reflections. Journal of Social Work Education, 47(1), 5–18. https://doi.org/10.5175/JSWE.2011.200900036

Boland

J. W.

Brown

M. E. L.

Duenas

Finn

G. M.

Gibbins

(2020). How effective is undergraduate palliative care teaching for medical students? A systematic literature review. BMJ Open, 10(9), e036458. https://doi.org/10.1136/bmjopen-2019-036458

Brannick

M. T.

Erol-Korkmaz

H. T.

Prewett

(2011). A systematic review of the reliability of objective structured clinical examination scores. Medical Education, 45(12), 1181–1189. https://doi.org/10.1111/j.1365-2923.2011.04075.x

Campbell

McKenzie

J. E.

Sowden

Katikireddi

S. V.

Brennan

S. E.

Ellis

Hartmann-Boyce

Ryan

Shepperd

Thomas

Welch

Thomson

(2020). Synthesis without meta-analysis (SWiM) in systematic reviews: Reporting guideline. BMJ, 368, Article l6890. https://doi.org/10.1136/bmj.l6890

10.

Cheema

Ali

(2021). It matters how we do it: A review of best practices of Observed Structured Clinical Examination in pharmacy education. Pharmacy Education, 21(1), 283–291.

11.

Cheung

M. W. L.

Vijayakumar

(2016). A guide to conducting a meta-analysis. Neuropsychology Review, 26(2), 121–128. https://doi.org/10.1007/s11065-016-9319-z

12.

Cleland

J. A.

Abe

Rethans

J.-J.

(2009). The use of simulated patients in medical education: AMEE guide No 42. Medical Teacher, 31(6), 477–486. https://doi.org/10.1080/01421590903002821

13.

Cömert

Zill

J. M.

Christalle

Dirmaier

Härter

Scholl

(2016). Assessing communication skills of medical students in Objective Structured Clinical Examinations (OSCE)—A systematic review of rating scales. PLoS One, 11(3), e0152717. https://doi.org/10.1371/journal.pone.0152717

14.

Daniels

V. J.

Pugh

(2018). Twelve tips for developing an OSCE that measures what you want. Medical Teacher, 40(12), 1208–1213. https://doi.org/10.1080/0142159X.2017.1390214

15.

Donohoe

C. L.

Reilly

Donnelly

Cahill

R. A.

(2020). Is there variability in scoring of student surgical OSCE performance based on examiner experience and expertise? Journal of Surgical Education, 77(5), 1202–1210. https://doi.org/10.1016/j.jsurg.2020.03.009

16.

Downing

S. M.

Haladyna

T. M.

(2004). Validity threats: Overcoming interference with proposed interpretations of assessment data. Medical Education, 38(3), 327–333. https://doi.org/10.1046/j.1365-2923.2004.01777.x

17.

Glatz

Bergbom

Edlund

(2022). Lessons learned and preliminary results from implementing simulation-based elements in a clinical psychology programme. Psychology Learning and Teaching, 21(2), 162–181. https://doi.org/10.1177/14757257221093490

18.

Gonsalvez

C. J.

Freestone

(2007). Field supervisors’ assessments of trainee performance: Are they reliable and valid? Australian Psychologist, 42(1), 23–32. https://doi.org/10.1080/00050060600827615

19.

Goodie

J. L.

Bennion

L. D.

Schvey

N. A.

Riggs

D. S.

Montgomery

Dorsey

R. M.

(2021). Development and implementation of an objective structured clinical examination for evaluating clinical psychology graduate students. Training and Education in Professional Psychology, 16(3), 287–298. https://doi.org/10.1037/tep0000356

20.

Hakelind

Sundström

A. E.

(2022). Examining skills and abilities during the pandemic—psychology students’ and examiners’ perceptions of a digital OSCE. Psychology Learning and Teaching, 21(3), 278–295. https://doi.org/10.1177/14757257221114038

21.

Halman

A. Y. N.

Pugh

(2020). Entrustment within an objective structured clinical examination (OSCE) progress test: Bridging the gap towards competency-based medical education. Medical Teacher, 42(11), 1283–1288. https://doi.org/10.1080/0142159X.2020.1803251

22.

Harden

R. M.

(2016). Revisiting ‘assessment of clinical competence using an Objective Structured Clinical Examination (OSCE)’. Medical Education, 50(4), 376–379. https://doi.org/10.1111/medu.12801

23.

Hodges

Regehr

Hanson

McNaughton

(1998). Validation of an objective structured clinical examination in psychiatry. Academic Medicine, 73(8), 910–912. https://doi.org/10.1097/00001888-199808000-00019

24.

Hong

Q. N.

Fàbregues

Bartlett

Boardman

Cargo

Dagenais

Gagnon

M.-P.

Griffiths

Nicolau

O’Cathain

Rousseau

M.-C.

Vedel

Pluye

(2018). The Mixed Methods Appraisal Tool (MMAT) version 2018 for information professionals and researchers. Education for Information, 34, 285–291. https://doi.org/10.3233/EFI-180221

25.

Hong

Q. N.

Pluye

Fàbregues

Bartlett

Boardman

Cargo

Dagenais

Gagnon

M. P.

Griffiths

Nicolau

O'Cathain

Rousseau

M. C.

Vedel

(2019). Improving the content validity of the mixed methods appraisal tool: A modified e-Delphi study. Journal of Clinical Epidemiology, 111(e1), 49–59. https://doi.org/10.1016/j.jclinepi.2019.03.008

26.

Ilgen

J. S.

I. W.

Hatala

Cook

D. A.

(2015). A systematic review of validity evidence for checklists versus global rating scales in simulation-based assessment. Medical Education, 49(2), 161–173. https://doi.org/10.1111/medu.12621

27.

Kaslow

N. J.

Grus

C. L.

Campbell

L. F.

Fouad

N. A.

Hatcher

R. L.

Rodolfa

E. R.

(2009). Competency assessment toolkit for professional psychology. Training and Education in Professional Psychology, 3(4), S27–S45. https://doi.org/10.1037/a0015833

28.

Khan

Ramachandran

Gaunt

Pushkar

(2013a). The Objective Structured Clinical Examination (OSCE): AMEE Guide No. 81. Part I: An historical and theoretical perspective. Medical Teacher, 35(9), e1437–e1446. https://doi.org/10.3109/0142159X.2013.818634

29.

Khan

Ramachandran

Gaunt

Pushkar

(2013b). The Objective Structured Clinical Examination (OSCE): AMEE Guide No. 81. Part II: Organisation & Administration. Medical Teacher, 35(9), e1447–e1463. https://doi.org/10.3109/0142159X.2013.818635

30.

Lichtenberg

J. W.

Portnoy

S. M.

Bebeau

M. J.

Leigh

I. W.

Nelson

P. D.

Rubin

N. J.

Smith

I. L.

Kaslow

N. J.

(2007). Challenges to the assessment of competence and competencies. Professional Psychology: Research and Practice, 38(5), 474–478. https://doi.org/10.1037/0735-7028.38.5.474

31.

Meghani

D. T.

Ferm

B. R.

(2021). Development of a standardized patient evaluation exam: An innovative model for health service psychology programs. Training and Education in Professional Psychology, 15(1), 37–44. https://doi.org/10.1037/tep0000291

32.

Melluish

Crossley

Tweed

(2007). An evaluation of the use of simulated patient role-plays in the teaching and assessment of clinical consultation skills in clinical psychologists’ training. Psychology Learning & Teaching, 6(2), 104–113. https://doi.org/10.2304/plat.2007.6.2.104

33.

Nataša Mlinar

Mateja

Dominika

Brian

Maja

(2017). Assessment of clinical nursing competencies: Literature review. In Majda

Dominika

Gregor

(Eds.), Teaching and learning in nursing (pp. Ch. 5). IntechOpen. https://doi.org/10.5772/67362

34.

Newble

D. I.

(1988). Eight years’ experience with a structured clinical examination. Medical Education, 22(3), 200–204. https://doi.org/10.1111/j.1365-2923.1988.tb00007.x

35.

Nikendei

Huber

Ehrenthal

J. C.

Herzog

Schauenburg

Schultz

J. H.

Dinger

(2019). Intervention training using peer role-play and standardised patients in psychodynamic psychotherapy trainees. Counselling and Psychotherapy Research, 19(4), 508–522. https://doi.org/10.1002/capr.12232

36.

Oxlad

D’Annunzio

Sawyer

Paparo

(2022). Postgraduate students’ perceptions of simulation-based learning in professional psychology training. Australian Psychologist, 57(4), 226–235. https://doi.org/10.1080/00050067.2022.2073807

37.

Pachana

Sofronoff

Scott

Helmes

(2011). Attainment of competencies in clinical psychology training: Ways forward in the Australian context. Australian Psychologist, 46, 67–76. https://doi.org/10.1111/j.1742-9544.2011.00029.x

38.

Paparo

Beccaria

Canoy

Chur-Hansen

Conti

Correia

Dudley

Gooi

Hammond

Kavanagh

Monfries

Norris

Oxlad

Rooney

Sawyer

Sheen

Xenos

Yap

Thielking

(2021). A new reality: The role of simulated learning activities in postgraduate psychology training programs [Policy and Practice Reviews]. Frontiers in Education, 6(280), Article 653269. https://doi.org/10.3389/feduc.2021.653269

39.

Patrício

M. F.

Julião

Fareleira

Carneiro

A. V.

(2013). Is the OSCE a feasible tool to assess competencies in undergraduate medical education?. Medical Teacher, 35(6), 503–514. https://doi.org/10.3109/0142159x.2013.774330

40.

Patricio

(2012). A best evidence medical education (BEME) systematic review on the feasibility, reliability and validity of the objective structured clinical examination (OSCE) in undergraduate medical studies.

41.

Pell

Fuller

Homer

Roberts

(2010). How to measure the quality of the OSCE: A review of metrics – AMEE guide no. 49. Medical Teacher, 32(10), 802–811. https://doi.org/10.3109/0142159X.2010.507716

42.

Piumatti

Cerutti

Perron

N. J.

(2021). Assessing communication skills during OSCE: Need for integrated psychometric approaches. BMC Medical Education, 21(1), 106. https://doi.org/10.1186/s12909-021-02552-8

43.

Plakiotis

(2017). Objective Structured Clinical Examination (OSCE) in psychiatry education: A review of its role in competency-based assessment. Advances in Experimental Medicine and Biology, 988, 159–180. https://doi.org/10.1007/978-3-319-56246-9_13

44.

Rice

Murray

C. V.

Tully

P. J.

Hone

Bartik

W. J.

Newby

Cosh

S. M.

(2022). Commentary: An extension of the Australian Postgraduate Psychology Education Simulation Working Group Guidelines: Simulated learning activities within professional psychology placements. Frontiers in Education, 7, Article 840258. https://doi.org/10.3389/feduc.2022.840258

45.

Roberts

Chur-Hansen

Winefield

Patten

Ward

Dorstyn

(2017). Using OSCEs with simulation to maximise student learning and assess competencies in psychology: A pilot study. Focus on Health Professional Education-a Multidisciplinary Journal, 18(2), 61–75. https://doi.org/10.11157/fohpe.v18i2.140

46.

Roberts

Oxlad

Dorstyn

Chur-Hansen

(2020). Objective Structured Clinical Examinations with simulated in postgraduate psychology training: Student perceptions. Australian Psychologist, 5(55), 488–497. https://doi.org/10.1111/ap.12457

47.

Salawu

Y. K.

Stewart

Daud

(2022). Structures, processes and outcomes of objective structured clinical examinations in dental education during the COVID-19 pandemic: A scoping review. European Journal of Dental Education, 1–13. https://doi.org/10.1111/eje.12869

48.

Schüttpelz-Brauns

Nühse

Strohmer

Kaden

J. J.

(2019). Training OSCE examiners: Minimal effort with far-reaching results. Medical Education, 53(11), 1153–1154. https://doi.org/10.1111/medu.13970

49.

Sheen

McGillivray

Gurtman

Boyd

(2015). Assessing the clinical competence of psychology students through Objective Structured Clinical Examinations (OSCEs): Student and staff views. Australian Psychologist, 50(1), 51–59. https://doi.org/10.1111/ap.12086

50.

Sheen

Sutherland-Smith

Thompson

Youssef

G. J.

Dudley

King

Hall

Dowling

Gurtman

McGillivray

J. A.

(2021). Evaluating the impact of simulation-based education on clinical psychology students’ confidence and clinical competence. Clinical Psychologist, 25(3), 271–282. https://doi.org/10.1080/13284207.2021.1923125

51.

Sundström

A. E.

Hakelind

(2022). Examining clinical skills and abilities in psychology—Implementation and evaluation of an objective structured clinical examination in psychology. Journal of Mental Health Training, Education and Practice, 18(2), 97–110. https://doi.org/10.1108/JMHTEP-10-2021-0124

52.

Vincent

S. C.

Arulappan

Amirtharaj

Matua

G. A.

Al Hashmi

(2022). Objective Structured Clinical Examination vs traditional clinical examination to evaluate students’ clinical competence: A systematic review of nursing faculty and students’ perceptions and experiences, Nurse Education Today, 108, 105170. https://doi.org/10.1016/j.nedt.2021.105170

53.

Wood

T. J.

Pugh

(2020). Are rating scales really better than checklists for measuring increasing levels of expertise? Medical Teacher, 42(1), 46–51. https://doi.org/10.1080/0142159X.2019.1652260

54.

Yap

Bearman

Thomas

Hay

(2012). Clinical psychology students’ experiences of a pilot Objective Structured Clinical Examination. Australian Psychologist, 47(3), 165–173. https://doi.org/10.1111/j.1742-9544.2012.00078.x

55.

Yap

Sheen

Nedeljkovic

Milne

Lawrence

Hay

(2021). Assessing clinical competencies using the Objective Structured Clinical Examination (OSCE) in psychology training. Clinical Psychologist, 25(3), 260–270. https://doi.org/10.1080/13284207.2021.1932452

The Psychometric Quality of Objective Structured Clinical Examinations Within Psychology Programs: A Systematic Review

Abstract

Keywords

Objective Structured Clinical Examinations

Implementation Guidelines

Face Validity

Predictive and Construct Validity

Objective

Methods

Protocol and Registration

Eligibility Criteria

Information Sources

Search Strategy

Selection Process and Study Risk of Bias Assessment

Appraisal of Methodological Quality

Data Extraction and Synthesis

Data Synthesis

Assessment of OSCE Implementation

Results

Study Selection

Study Characteristics

Critical Appraisal of Study Methodology

OSCE Characteristics

Design and Content

Examiner Characteristics

SP Characteristics

Type of Objective Measures

Clinical Competencies

Adherence to QAGs

Discussion

Evaluation

Objective Measures

Examiner Training

Standardized Patients

Validity

Peer Review of Stations

External Examiners

Post Hoc Psychometrics

Methodological Quality and Review Limitations

Implications, Recommendations, and Future Research

Conclusion

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iDs

Notes

Author Biographies

References