Sage Journals: Discover world-class research

Abstract

Evaluation is a practice with increasing global demand. However, an understanding of who wants to learn evaluation and their learning needs related to evaluator competencies is relatively unexplored. This article describes a research effort designed to address those needs using an evaluator competency self-assessment. Despite their validity challenges, self-assessments can yield valuable information for a variety of audiences. Respondents and their organisations can use this method to understand evaluation capacity strengths and needs and create professional development plans. For those conducting evaluation capacity building in or for organisations, and those developing formal evaluation education programs, self-assessment results can provide a road map for planning, development and delivery. This article describes the process of developing and piloting the Learn Evaluation Assessment Portal, an evaluator competency self-assessment tool, with the Australian Evaluation Society. The article reflects on the lessons learnt from the development and testing of the tool and will be valuable for a range of stakeholders from practitioners to commissioners and in particular Voluntary Organisations of Professional Evaluators which are committed to developing evaluation capacity and are working towards professionalisation of the field.

Keywords

evaluation capacity building evaluator competencies evaluator professional associations evaluator self-assessment research on evaluation

What we already know

• Evaluators are on a continuous path of professional development.

• Information about strengths and development areas can enable focussed professional development.

• Publicly available, free instruments for evaluators to assess their competency are rare.

The original contribution the article makes to theory and/or practice

• Documents the value and challenges of self-assessment in the field of evaluation.

• Consolidates and presents an updated set of principles for good self-assessment instruments for evaluators.

• Details the process of developing a publicly available online self-assessment tool for evaluators that can be adapted to any Voluntary Organisation of Professional Evaluators.

Introduction

Evaluation is a practise that is increasingly in demand in Australia and globally. The Australian Centre for Evaluation, for example, has been established to ‘put evaluation evidence at the heart of policy design and decision making’ (Department of the Treasury, 2023). How to educate evaluators is an increasingly hot topic; the subject of multiple recent special issues and peer-reviewed articles across the evaluation literature (e.g., Chouinard & LaVelle, 2021; Kallemeyn et al., 2022; LaVelle et al., 2023). However, an understanding of who wants to learn evaluation and their learning needs related to evaluator competencies is relatively unexplored. This article explores a research effort designed to address those needs.

In this article, we present the design (Phase 1) and testing (Phase 2) of an evaluator self-assessment instrument, the Learn Evaluation Assessment Portal (LEAP), developed by the authors. The research team was formed through collaborative work for the International Society for Evaluation Education (https://www.isee-evaled.com/) which identified a gap in research on evaluation learners – particularly knowledge of their backgrounds and existing competencies. The team decided that an online self-assessment would simultaneously provide data for our research and provide a critical service to evaluation learners and their organisations. Despite the recognised issues with self-assessment, the team saw this technique as the most useful short-term method of gathering data on evaluators.

The pilot of LEAP involved a partnership between the researchers, Matilda Tech, who designed and hosted the Portal, and the Australian Evaluation Society Pathways Committee Competencies Working Group (AES Working Group). The research team designed the initial instrument, and then the two groups collaborated over three years to refine and test it. The University of Melbourne Centre for Program Evaluation and the Australian Evaluation Society (AES) provided financial support to Matilda Tech to develop and launch the self-assessment platform using the AES Evaluators’ Professional Learning Competencies (AES Professional Learning Committee, 2013). These will be referred to as the AES Competencies from this point forward. The updates to the competencies required for them to be suitable for self-assessment are discussed in Gullickson et al. (2024a, this issue).

The LEAP study was primarily a Methodological Study which Mouton (2022) describes as ‘aimed at developing new methods such as questionnaires, scales and tests of data collection and sometimes also validating a newly developed instrument through a pilot study’ (p.159). The article focuses on the design considerations and piloting of the self-assessment instrument only, not the competencies used in the tool.

We began our work with the following three criteria for a successful instrument: (a) it had to provide a service to the individuals who completed it; (b) where possible, it should provide a service to the respondents’ Voluntary Organisations for Professional Evaluation or organisation; and (c) it had to contribute to the knowledge base on evaluation learners and learning, therefore, articles and the final anonymised data should be made publicly available. The research team was committed to ensuring we were not conducting extractive research that did not directly benefit those who participated, as has often been the case (Cram et al., 2018; Smith, 2021), and that we upheld the principle of reciprocity (Australasian Evaluation Society, 2013). These criteria enabled us to focus on building a high-quality assessment tool that provided mutual benefit to participants and researchers. The development of the tool occurred during the COVID-19 pandemic and so we used online events for pilot testing.

Conceptual framework

During the conceptualisation and design phase of LEAP, we looked at the literature to understand the benefits and challenges of self-assessment generally and consider how those would apply to a tool developed for the field of evaluation. We also focussed on case studies where self-assessment had been used in evaluation or related fields to discover principles, ideas, and functionality for LEAP. In this section, we summarise our findings related to those areas.

Benefits

Self-assessments provide several benefits related to learning. They establish the grounds for what knowledge and skill is considered important in a specific context; can contribute to the development of performance standards for a field; and assist with the development of learning pathways for individuals (Dellai et al., 2009; Jahan et al., 2011; Stufflebeam & Wingate, 2005). Self-assessments can therefore contribute significantly to promoting a learning environment. These contributions are now discussed in more detail.

Firstly, the content of a self-assessment, by its very nature, establishes the grounds for what knowledge and skill is considered important in a particular context. What is included has been deemed important enough for the opportunity cost of assessment. In the field of evaluation, evaluator competencies have emerged over the last 20 years as one way to articulate the parameters of what knowledge and skills are considered important in the field. At the time of this publication, the research team found 25 sets of evaluator competencies from across the world, which have similar core components but also varied context specific components. The discussion around the development and adequacy of those competency sets is discussed briefly in Gullickson et al. (2024a, this issue).

A second benefit of self-assessments is that they often require the setting of performance standards. Articulated standards create a shared understanding of what can be expected from different levels of expertise. For instance, a novice to the field can focus on acquiring surface knowledge (e.g. describe the parameters of experimental designs), whereas an expert should be able to integrate and apply that knowledge (e.g. design and conduct an experimental design that is fit for the context and the relevant evaluation questions). This has not yet been a strong focus of current evaluation competency frameworks and further research is required to develop performance standards for the field. The development of LEAP and its resulting data can contribute to the development of performance standards for the field of evaluation.

The creation of standards also links to the third benefit: creating a pathway for learning. Diaz et al. (2020) describe the process of the setting of standards as ‘a road map for those who develop evaluation capacity building programs for CE [Credentialed Evaluator] educators and other non-formal educators’ (p. 8). With the parameters of knowledge and skills, and the performance standards in place, taking a self-assessment can facilitate a process for recognising individual strengths and specific areas for development. Self-assessment, consequently, allows for focused instruction (Honken, 2013) and prioritisation of training needs. The combination of parameters, performance standards, and self-rating can be a powerful contribution to the promotion of a learning environment – the fourth benefit.

The fact that self-assessment, although problematic, was shown to be highly suitable for learning, confirmed our initial research idea – to develop a self-assessment tool for the key purpose of supporting evaluation learners and contributing to a greater understanding of the competencies and standards required for the field. We noted the important challenges of carrying out self-assessments, and these are discussed in the next section.

Challenges

Many of the challenges and limitations of self-assessments to assess knowledge and competence (Lichtenberg et al., 2007; Sitzmann, et al., 2010) stem from the nature of expertise – that it develops with time and experience, and that it is a prerequisite of accurate self-assessment. Persky and Robinson (2017) indicate that ‘experts develop through years of experience and by progressing from novice, advanced beginner, proficient, competent, and finally expert’ (p. 72). Expertise is also not necessarily transferable – ‘each time individuals acquire a new skill, they start at the novice stage where they need to learn the facts and the rules for determining action’ (Honken, 2013, p. 5). Dreyfus and Dreyfus (2005) distinguish between ‘crude skills and “subtle skills”’ (p. 788). Crude skills, like walking, can be done without conscious thought, have a larger margin for error, and allow for time to make corrections. Subtle skills, like music and surgery, require intense concentration with little time for reflection or deliberation while using the skill, and instead require acting swiftly with little or no time to correct mistakes. This distinction has implications for the kinds of skills we try and assess, particularly in the field of evaluation.

The conundrum of self-assessments is that they are developed to help inform people from novice to expert about their abilities in particular domains, but as Kruger and Dunning (1999) argue, the skills that produce competence in a particular domain are often the very same skills necessary to evaluate competence in that domain. Persky and Robinson (2017) indicate that ‘due to a high level of understanding of how they know what they know and what they do not know, experts have greater metacognitive awareness than novices. This awareness allows experts to be “sensitive to task demands (e.g. time, effort, resources needed)”’ (p. 78) which results in more realistic assessments of their expertise. Kruger and Dunning (1999) highlighted the fact that people overestimate their abilities particularly due their incompetence which ‘robs them of the metacognitive ability to realize it’ (p. 1121). Therefore, ‘incompetent individuals lack what cognitive psychologists term metacognition (Everson & Tobias, 1998), metamemory (Klin et al., 1997), meta comprehension (Maki et al., 1994), or self-monitoring skills (Chi et al., 1982; Kruger & Dunning, 1999, p. 1121). As a result, they claim that “unaccomplished individuals”’ (p. 1122) do not possess the metacognitive skills necessary for accurate self-assessments. This results in novices inflating their self-appraisals and experts under-rating their expertise; what has become known as the ‘Dunning-Kruger effect’. This ‘under-rating’ of expertise can also result in what Stufflebeam and Wingate (2005) label as ‘ceiling effects’ for evaluation ‘elders’ (p. 552). An individual’s views of self can also influence their ratings depending on whether they see themselves as able or not in the area being assessed (Ehrlinger & Dunning, 2003).

Case studies

At the time the researchers were exploring self-assessment, four evaluation related examples existed. The researchers reviewed these studies to see how they dealt with some of the practical elements of developing and implementing self-assessment for evaluators. Following the review of literature and the case studies, the team generated a set of principles for best practice in the development of self-assessments for evaluation which are presented at the end of this section.

Materials development training and support program (Stufflebeam & Wingate, 2005)

Stufflebeam and Wingate (2005) describe a study that was carried out ‘to assess learning gains in the 1999 to 2004 Project MTS (Materials Development, Training, and Support Services) Summer Evaluation Institutes at The Evaluation Center at Western Michigan University’ (p. 545). The Institutes were three-week evaluation capacity building programs; the self-assessment was used to replace a traditional pretest-post-test and focused on course content. The instrument the researchers developed was called the ‘Self-Assessment of Program Evaluation Expertise’ and was reviewed by 15 prominent evaluators. The self-assessment was completed by three different cohorts – (a) a group of nationally recognised evaluators, (b) participants in the 1999 Institute and (c) participants in the 2000 to 2004 Institutes.

The final instrument contained ‘77 items divided into eight content areas that correspond to the eight competency areas based on participants’ reported needs and institute goals’ (Stufflebeam & Wingate, 2005, p. 548). Respondents remained anonymous and completed the self-assessment before instruction began and again on the second last day of training. Stufflebeam and Wingate broadly describe the rating scale used as an ordinal scale with defined anchors and provide an example of one of the items as shown in Table 1. The scale was tailored for each item following a general progression of no knowledge to the ability to teach the topic to others (Stufflebeam & Wingate, 2005, p. 549).

Table 1.

Extract from rating scale example. Source: Stufflebeam and Wingate (2005, p. 549).

Item 1.3 From the Self-Assessment of Program Evaluation Expertise
1.3. I have a working knowledge of the Joint Committee Program Evaluation Standards	(1) Not at all	(2) I have only a general knowledge of their main requirements	(3) I have used or could use these standards to plan and/or judge evaluation studies	(4) I have taught or could teach others to apply these standards to plan and/or judge evaluation studies

They found that self-assessment met their needs in gauging knowledge, skills and application of skills without direct assessment which would not have been feasible, particularly as a pre-test. They also discovered that self-assessment of general competencies was sufficient to indicate learning gains across various types of content instruction over the life of the Institute.

Stufflebeam and Wingate (2005) describe some of the lessons learnt in developing their self-assessment. They found the instrument must include: (a) a brief discussion of the purpose, domain, and caveats on the cover page; (b) simple, streamlined instructions and wording; (c) clear items; and (d) defined anchor points for the rating scale. The defined anchor points are necessary to accommodate the complexity of representing expertise in a rating scale, and the complexity of the competencies being rated with that scale. ‘Typical quality ratings of poor, good, fair, and excellent are general and subject to idiosyncratic interpretation’ (Stufflebeam & Wingate, 2005, p. 558). The requirement of participants to interpret scales will be exacerbated by the Dunning-Kruger effect, so the ratings must make the distinctions between each level clear. In addition, the scale should be the same across items and domains so respondents can compare their ratings across the various domains. The authors developed a personal report together with the anonymous aggregated result from the self-assessment. They found this practice reduced likelihood that people will over-rate their expertise to impress others and allowed participants to see how their expertise related to that of the group.

Essential competencies for program evaluators (Ghere et al., 2006)

Ghere et al. (2006) produced a two-hour professional development unit for reflecting on the Essential Competencies for Program Evaluators. The self-assessment activity used a three-page pen and paper rating sheet which included all competencies grouped by domain. Participants were asked to rate themselves on a 7-point expertise scale across the competencies (Figure 1). The authors did not provide information about the origin of the rating scale. They, like Stufflebeam and Wingate, felt that the ratings should be only known to the individual.

Figure 1.

Rating scale from Ghere et al. (2006).

The Magenta Book (Her Majesty’s Treasury, 2020)

Her Majesty’s Treasury (United Kingdom) developed the Magenta Book in 2011 and updated it in 2020 (HM Treasury, 2020c) to provide guidance on evaluation for the United Kingdom Central Government. Alongside the Magenta Book they produced the Government Analytical Evaluation Capabilities Framework (HM Treasury, 2020a) which ‘…describes the skills, attitudes and practices which enable effective and high-quality Government evaluations’ (p. 3). They also produced a self-assessment tool: an excel spreadsheet with a list of competencies by domain in one tab; a collated results tab with scores in charts by domain and by domain and competency; and an action plan tab which was not linked to the results (HM Treasury, 2020b). For the tool they used a shortened version of the full framework, and no information was provided about how they chose which statements to include. The worksheet is not downloadable, so no data set on overall competency ratings is automatically generated by those who complete it. Competency statements are rated using the following scale: 1. No knowledge or experience; 2. Basic or limited knowledge or experience; 3. Working knowledge and practical experience; 3. Detailed knowledge and significant experience; and 4. Expert knowledge and experience. No information was provided about the basis of the rating scale, and there is no information on lessons learnt from the process, but we reviewed it as a case study due to the value of examining its evaluation content domains. It is also the only self-assessment instrument the research team found that was developed by a government agency.

Social Value (UK)

Social Value developed an online self-assessment tool which practitioners and organisations can use to assess themselves against the Seven Principles of Social Value developed by Social Value UK (Social Value International, n. d.). Whilst values are a key consideration for evaluators, this self-assessment tool does not utilise any other evaluation competencies. Users complete the assessment by selecting from a set of six tailored statements under each Principle, which best describe their practice or project. For example, for Principle #1 (Involve Stakeholders), users are asked to select the most applicable statement from ‘Stakeholders are not involved at any stage [of the measurement]’ to ‘Representatives from each stakeholder group are involved [in all aspects of measurement]’. The results of the self-assessment are provided as scores under each Principle and displayed online as a spider chart and can be downloaded. Users can also benchmark their scores against averages for their country, their industry, the age of their organisation/program and the turnover of their organisation/program. A set of generic suggestions for improvement under each Principle are also provided.

Summary

Through a review of the self-assessment case studies, the researchers identified seven key principles for developing a good self-assessment instrument and three key principles regarding useful outputs from self-assessments for evaluators (Table 2). All four case studies used six of the principles; these formed the core of the LEAP research team’s aims: begin with purpose, domain and caveats (1); use simple, streamlined instructions and wording (2); ensure all items are clear (5); define anchor points and use similar progressions (6); avoid temptation to over-rate by being individual/anonymous (7), and produce individual reports for learning (8). Principles 3 and 4 created an either/or choice covered by three of the four case studies. Principles 8 and 10 related to our key criteria of reciprocity for participants and usefulness for organisations.

Table 2.

Principles for good self-assessment instruments.

	Sources
Good self-assessment instruments	Stufflebeam and Wingate (2005)	Ghere et al. (2006)	HM Treasury (2020b)	Social Value (UK)
1. Begin with purpose, domain, and caveats	x	x	x	x
2. Use simple, streamlined instructions and wording	x	x	x	x
3. Use a subset of competencies OR	x		x
4. Use a full list of competencies		x
5. Ensure items are clear	x	x	x	x
6. Define anchor points explicitly and use a similar progression across all domains	x	x	x	x
7. Avoid respondents’ temptation to over-rate by being anonymous or only individual.	x	x	x	x
Good self-assessment instruments produce
8. Individual reports for learning	x	x	x	x
9. Benchmark reports of the individual versus the group				x
10. Anonymous aggregate reports to direct learning and coordinate expertise in organisations and teams	x

Whilst all the case studies provided information on the purpose and target group for their self-assessments, not all of them discussed the structure or provided detailed information on the rating scales or lessons learnt. Stufflebeam and Wingate (2005) and Ghere et al. (2006) used written instruments, whilst Her Majesty’s Treasury (HM Treasury, 2020b) systematised the self-assessment with a shortened list of competencies and an electronic tool (Excel). While these two approaches (paper and Excel) were effective for individual reflection, the researchers involved in those studies missed the opportunity to create an aggregate data set that could inform evaluation education efforts more broadly. The Social Value (UK) tool (Social Value International, n. d.) took that step and added benchmarking. Creating an aggregate data set was a key need for the LEAP and formed the basis of the design of an online tool with the good self-assessment parameters provided in the case studies.

Research design and questions

The next step in the research was to design, develop and pilot test the self-assessment. That work was split into two phases with aligned research questions:

1. Phase 1 (design): How can we design a self-assessment that meets the needs of individuals, organisations, and our research team?

2. Phase 2 (pilot testing):

a. Will there be interest in taking a self-assessment on evaluator competencies?

b. Will people judge that it is worth their time to complete?

c. Can doing the self-assessment expand people’s frame of reference around what is required for an evaluator to know and be able to do?

d. Can this self-assessment tool assist evaluators at any level of expertise in identifying areas for professional development?

e. How can we improve the LEAP to make the self-assessment more effective and user friendly?

The methods, findings and discussion are presented in the following sections.

Methodology and methods

An overview of the LEAP study is presented in Figure 2. The research team used the review of the literature to understand the strengths and limitations of self-assessments and case studies to understand their principles for assessment development and resulting products. In this section we describe the methods for developing and pilot testing LEAP.

Figure 2.

Purposes and methods of study.

Phase 1: Instrument development

The researchers used the information from the literature review to inform design choices for the instrument and then in collaboration with the AES Working Group and the team at Matilda Tech developed LEAP using the AES Competencies. This required revisions to the competency framework described in Gullickson et al. (2024, this issue). The researchers’ design choices are described in the findings section.

Phase 2: Pilot testing

In Phase 2 of the study, the researchers in collaboration with the AES Working Group piloted LEAP and then carried out analysis of the qualitative feedback from participants to make changes, a common component of Methodological Studies. Due to COVID-19, in September 2020 and 2021, the AES replaced its annual conference with free online events: FestEVAL. The AES Working Group and the research team led four sessions (two each year) to refine the self-assessment and reports using crowdsourcing. The following sections provide further details on the aims and methods for the alpha and beta tests.

2020 alpha test

In 2020, the draft instrument was alpha tested on the online platform (Gullickson et al., 2020a, 2020b). The alpha test consisted of two sessions. The first session had 120 participants and 90 of those participants engaged in the live test and feedback session. Provision was made for completing the self-assessment after the session. The presenters used Rogers (2003) diffusion of innovation model to position the alpha test as a raw instrument that would be best tested by innovators. This was apparently successful framing, as and by the time the self-assessment closed, 116 people had completed the task.

The researchers asked participants in the piloting process to provide feedback on their experience of using LEAP and the use of the AES competency items which were embedded in the tool for the self-assessment process. Feedback was provided through three sources: emails (4) to the researchers, messages (226) from the chat function of the online meeting hosted by AES and messages (8) from a comment function in the tool itself. The feedback was placed into ATLAS.ti (computer-assisted qualitative data analysis software) and inductively coded by the researchers. The AES Working Group analysed the demographic data and produced heat maps of the expertise distribution across the domains and individual competencies. In the second session, the AES Working Group representatives shared and discussed the demographics and competency findings (see Gullickson et al., 2024a, this issue). All data was analysed and integrated into the 2021 beta tests. In the period between the alpha and beta tests, the research team and the AES Working Group developed the individual and organisation level report formats which were included in the beta testing.

2021 beta tests

In 2021, the AES group mentoring project and AES FestEVAL 2021 provided beta testers. Seven participants in the mentoring project took it as a pre-test; fifty-eight completed at FestEVAL. In the first 2021 FestEVAL session, the working group and the research group presented the feedback from the alpha test and how it had been addressed (Wildschut et al., 2021). In the second FestEVAL session, the respondents examined and commented on the content and format of the individual and organisation level reports (Gullickson et al., 2021). Feedback from FestEVAL 2021 was used to make minor changes to the self-assessment competencies and demographic items, and to modify the report format. The revised instrument (version gamma) was then launched publicly on learnevaluation.org for use at AES’ annual in person conference in Adelaide in 2022 (Gullickson & Siddiqi, 2022).

Sample

As is described in the earlier sections, the research team utilised purposive sampling, as AES members and people involved in evaluation in evaluation were the target of LEAP (Table 3).

Table 3.

LEAP data timeline and sources.

Test event	Date	Event	Population	# of respondents
AES Alpha	2020	FestEVAL	AES members + public	116
AES Beta	2021	AES mentoring program	AES members only	7
	2021	FestEVAL	AES members + public	58
			Total	181

Ethics

Our motivation for this project was research on evaluation learners. Using a self-assessment survey with individual and organisation level reports provided reciprocal value for participants and us in keeping with the AES Ethical Guidelines (Australasian Evaluation Society, 2013). Reciprocity was the research team’s first ethical consideration related to the project.

The learnevaluation.org self-assessment portal (LEAP) and associated research is part of an ongoing research project on evaluation learners, located at the University of Mississippi. Before the research team conducted a preliminary study at the 2019 American Evaluation Association conference, the project was reviewed by the University of Mississippi Human Ethics Institutional Review Board. The Board determined the research did not meet the regulatory definition of human subjects research at 45 C.F.R. 46.102, which characterises this as eligible for exemption as ‘any disclosure of the human subjects’ responses outside the research would not reasonably place the subjects at risk of criminal or civil liability or be damaging to the subjects’ financial standing, employability, educational advancement, or reputation’ (Protection of Human Subjects, 46.104, 2018). This aligns with the NHMRC Guidelines related to evaluation and quality control (National Health and Medical Research Council, 2014).

At FestEVAL 2020, the AES Working Group and research team verbally briefed session participants about the use of their portal data and feedback via the Zoom chat. Comments from the Zoom feedback shared below have been de-identified as per the verbal briefing. In response to the large number of participants in the session, the researchers developed a retrospective consent process for the LEAP data. The AES Working Group approved the process and the University of Mississippi Human Ethics Institutional Review Board reviewed it again under the existing research project with no change to their determination. Therefore, when alpha test participants logged in to view their individual report, they were asked to provide consent to use their self-assessment data in publications. Based on alpha test feedback, the research team added a privacy statement which is now embedded in the Portal registration process. Only employees of the technology company (Matilda Tech) running the portal can see identified data; the research team and any organisation level users can see only de-identified data unless special arrangements have been made with the learnevaluation.org team and Matilda Tech.

Findings

Our literature review informed design of the LEAP, and crowdsourcing through the alpha and beta tests enabled us to refine and test the instrument and reports. In this section we describe the design considerations and how they were enacted, the findings from the qualitative data analysis of participant feedback, and the report design and testing.

Phase 1: Designing the LEAP

In Phase 1, we focused on the question: How can we design a self-assessment that meets the needs of individuals, organisations, and our research team? In this section we discuss the key design considerations, including how we used lessons learnt from the case studies. The section is ordered based on the principles presented in Table 2. In terms of overall process, the research team first created the basic self-assessment tool and then worked with the AES Working Group on the competencies that were used in the LEAP.

Principle 1: State purpose and domain

In 2019, when we began the process of developing the self-assessment instrument, there were 25 evaluator competency sets globally. Since we intended our tool to be internationally applicable, we considered looking across all the competency lists from the various Voluntary Organisations for Professional Evaluation and creating our own, shorter list (with commonalities) to use in the self-assessment. However, we agreed with the literature that there was value in the contextual nature of each of the evaluator competency sets (AES Professional Learning Committee, 2013; Aotearoa New Zealand Evaluation Association, 2011; Stevahn et al., 2005). Keeping in mind our key criteria of benefit to those who took the self-assessment, we thought learners would want to self-assess on a competency set relevant to the contexts in which they worked. As a result, we decided to provide functionality for any set of competencies to be used in the tool in the future. We included a statement at the beginning of the tool describing the purpose of the self-assessment and that we would be using the AES Competencies for the current version.

Principle 2: Use simple, streamlined instructions and wording

Developing the LEAP for use in an online platform meant we had to consider text and layout for both phone and computer. We chose to use minimal text at the start, clear navigation and progression visuals, and a caveat about the length of the survey (Figure 3). Competency domains were presented one domain per page with rating scale and descriptions above. Due to the length of the competency statements, we told respondents the survey was best suited to completing on a computer.

Figure 3.

First page of the LEAP alpha test.

Principles 3 & 4: Choose full or reduced set of competencies

We decided to include the full set of AES competencies as the researchers and the AES Working Group agreed domain level ratings would be too general to be useful for the intended purposes of the self-assessment. We considered asking participants to rate the importance of the competencies to generate a shorter list for future self-assessments but based on the diversity inherent in evaluation, we decided against this. Those who practice evaluation come from a variety of backgrounds and operate with a range of evaluation definitions that inform their practice (Gullickson, 2020). Therefore, if we asked people to rate competencies on their importance, we believed we were likely to get equal ratings across all, or lower ratings of competencies they didn’t use based on their operating definition of evaluation.

Principle 5: Ensure items are clear

Most items were based on the domains and competency statements from the AES Evaluators’ Professional Learning Competency Framework. The original framework, launched in 2013, was deliberately designed not to meet measurement requirements, so the statements needed to be revised to suit self-assessment. The process to address this need is described in Gullickson et al. (2024, this issue). The measurement ready statements were then entered into the LEAP.

Principle 6: Clear and justified rating scale

We chose to use a five-point expertise scale based on Dreyfus and Dreyfus (2004, 2005) and referenced in relation to the evaluator competencies by Baizerman (2009). We believe tailored scales are desirable but this was not feasible, given the substantial increase in the length and cognitive burden of the tool. Instead, we opted for a rating scale that closely mirrored the original Dreyfus and Dreyfus (2005) descriptions of expertise. An initial examination of that scale by the AES Working Group revealed that the language used was too technical. As a result, in consultation with the working group, the rating scale was revised using simpler language (Table 4). Figure 4 shows the rating scale and the items from the Evaluation Activities Domain, summarising Principles 5 and 6.

Table 4.

2020 Alpha Test Expertise rating scale adapted from Dreyfus and Dreyfus (2005).

Rating	Description
Novice	I have heard about this domain/competency, but the vocabulary or concepts are unfamiliar and/or I don’t really understand how to it do in practice.
Advanced beginner	I have had some experience with this domain/competency either through examples or experience. I have a basic understanding of what it is and how it works, which I can apply in contexts where I have used it before. Outside of those contexts I need help.
Competent	In this domain/competency I am able to analyse, calculate and prioritise information to determine what needs to be done in any situation. I take responsibility for my choices, but I’m still refining my intuition. Sometimes my choices work or don’t; I learn from both.
Proficient	In this domain/competency I have a variety of positive and negative experiences that have shaped my perspective so that I can see what needs to be done in each situation. It still takes some thinking and planning. I don’t often make big mistakes.
Expert	In this domain/competency I operate on intuition honed over years of experience. I am able to see immediately how best to adapt my experience to practice in different contexts.

Figure 4.

Example Domain rating page from alpha test.

At the end of the self-assessment, we asked a few more questions related to the respondents’ learning. We then used a matrix item and asked them to indicate their preferred type and mode of learning. We believed that the use of the rating scale together with open-ended questions at the end of LEAP would provide the required data on evaluators’ competencies and learning needs.

Principle 7: Avoid respondents’ temptation to over-rate by being anonymous or individual

On registration to LEAP Individuals were informed that they would be able to see their report, but any data shared in the public domain would be anonymised and aggregated.

Principles 8–10: Individual reports for learning/Benchmark reports of the individual versus the group/Anonymous aggregate reports to direct learning and coordinate expertise in organisations and teams

The nature of the reports produced by LEAP were addressed in the Phase 2 beta test as individual and aggregate reports were discussed with AES members and then embedded in the LEAP. Funding limitations meant benchmarking was not possible in these first iterations.

Demographic considerations

None of the self-assessments we reviewed discussed collecting demographic information on their respondents. Since this was an important element of our research, we considered carefully which demographic items were important to help us understand learners and their needs. We asked a range of workforce related questions at the start of the tool and then basic demographics at the end. We asked participants if there was anything else they would like to tell us about their evaluation learning needs (open text) and any feedback they had about using LEAP and challenges experienced.

Phase 1 summary

In the design of the platform, we addressed the lessons learnt from the literature. LEAP began with the purpose, domain and caveats on the opening page. We worked with AES to ensure simple, streamlined instructions and wording, and the AES team revised the competency list to increase clarity. We used an adapted Dreyfus and Dreyfus (2005) rating scale for explicitly defined anchor points across all domains. Individuals set up personal accounts to take the self-assessment and received individual reports for learning; any reports created used only anonymous aggregate information. We were ready to test the LEAP with the AES.

Phase 2: Testing the LEAP

LEAP went through two tests – the alpha test at FestEVAL in 2020 and the beta test in 2021 – to produce the gamma version in 2022. Most of the data collection, analysis and revision to the platform happened through the alpha test in 2020. Therefore, the remainder of this article focuses on the methods and findings from that event, with occasional mention of additional information from the 2021 testing.

Alpha test 2020

Alpha test findings

The initial data analysis done by the AES Pathways Committee members, the self-assessment response data and feedback were handed over to the LEAP team for further analysis and interpretation. Some of the feedback was related to specific questions in the tool and allowed the researchers to address each question specifically, whilst other comments were related to the AES competencies which would require a reformulation for the tool. An overview of the breakdown by topic of these messages and emails is shown in Figure 5.

Figure 5.

Amount of feedback by category.

Many of the messages – content unclear (63), add items (46) and AES competencies (7) – were related to the AES competencies and their formulation. Those shortcomings are addressed in Gullickson et al. (2024a, this issue). However, there were also clear recommendations for the improved formatting of the tool (46), technical (25) and instructions (9). The feedback related to the process of the pilot is discussed first and then we move on to the revision implications for the LEAP tool.

Improvements made to LEAP

The AES Working Group and the research team worked together to use the data collected to improve the self-assessment tool. Both groups discussed together changes to the rating scale, learning needs, workforce items, training items and general items at the end of the tool, which were actioned by the research team and Matilda Tech. These changes are shown in Table 5 and show that most changes were made to the ‘Workforce’ section of the tool. These were reported out to the AES community in a live session at FestEVAL 2021 (Wildschut, et al., 2021).

Table 5.

Changes made to LEAP based on feedback.

Area of change	Description
Rating scale	• None added as an option on the rating scale
Rating scale	• Descriptions of other ratings revised
Demographic itemsGaps and learning needs	• Demographic items were moved to the end of the survey
	• The options for preferred learning types were revised for clarity and completeness
	• Courses were split to include formal and informal options
	• Blogs were removed and workshops added
Workforce items	• Years in the workforce items were revised with explicit instructions about use of the slider function
	• Definitions for internal and external evaluation were added
	• The general work context and evaluation work context were updated to one item each with a list of work contexts derived from existing job lists and Field of Research codes which are used internationally, listed in priority order based on responses from the alpha test
	• Both stems were updated to include work as ‘an employee, volunteer or consultant’ to address the various ways people engage with work and evaluation.
	• Work role was left as an open text, but an item was added that asked what percentage of that role involved practicing evaluation (slider)
	• We revised the item on type of organisation to a broader list
Training items	• For formal training, we created an overall item based on Field of Research codes. This listed 15 areas of research, including evaluation, and invited respondents to check all that apply. For the areas checked, we used skip logic to invite respondents to tell us what qualifications they had completed in each area. The choices for these items were updated to match the Australian Qualifications Framework.
Demographic and open-ended questions	• The basic demographic items stayed at the end of the tool. The age item was switched to a slider. Other basic demographic items were unchanged.

Beta test 2021: Testing the report

The individual and organisational reports were designed to address Principle 8: Individual reports for learning, and Principle 10: Anonymous aggregate reports to direct learning and coordinate expertise in organisations and teams. The reports were shared and tested at AES FestEval 2021 (Gullickson et al., 2021) and then updated based on user feedback. The individual and organisational reports used strengths-based language and the same colour scheme to indicate domains/competencies where ratings were competent or above, or not yet competent. The organisation level report was based on all the Beta test responses collected via the AES link to the LEAP. The first page of the reports summarised the ratings by domain organised from highest to lowest (Figure 6 – individual).

Figure 6.

Example individual report page 1: Overall Summary.

Top and lowest rated competencies were flagged as key strengths and areas for improvement. On the following pages, the ratings were listed for each competency grouped by domain (Figure 7).

Figure 7.

Example individual report page 2: Strengths and opportunities.

A key difference in the organisation level report was the inclusion of heat maps for the domains overall, and for competencies within domains (Figure 8). The use of heat maps rather than statistical reporting was a deliberate choice to keep the reporting as simple as possible for all users and to make it easier for respondents to benchmark their ratings against the group.

Figure 8.

AES Organisation report excerpt with heat map colouring.

Response to LEAP by participants

This section discusses the findings of the analysis of feedback provided by participants. The feedback provided aligned closely to the benefits and challenges related to self-assessment which were identified in the earlier discussion of the literature review.

Benefits of self-assessment raised by participants

The positive comments from the analysis of the alpha test were aligned with the benefits related to learning which were highlighted in the literature review. Participants in the pilot acknowledged the value of a process which allowed them to recognise individual strengths and areas for development.

It takes time to go through the questions rating competencies, but I like it because it compels us to self-reflect and be more specific in identifying needs/gaps each of us needs to address.

Eighteen comments related to key areas for professional development were identified by those participating in the alpha test. These comments focussed on the following needs: research methods; fundamentals of evaluation; quantitative data collection; quantitative analysis methods; knowledge of indigenous evaluation methods; evaluation for policy makers and resources for learning. Participants also claimed that mentoring was critical to their learning as evaluation capacity building opportunities were limited in their organisations and indicated that learning communities should be set up for emerging evaluators.

One of my main challenges is the low level of capability in my organisation at every level so I do not have an accessible brains trust. I am very active in seeking out mentoring opportunities and tapping into other professional forums, e.g., communities of practice, AES courses, but I feel my learning is stunted by the fact I have to work more or less in isolation.

Challenges of self-assessment raised by participants

The participants raised three key challenges to carrying out the self-assessment.

Self-reporting

Sixteen of the responses were aligned to challenges regarding self-assessment discussed earlier in this article and relate primarily to the Dunning-Kruger effect.

I wonder if there is scope for bias because it is a subjective self-assessment. An evaluation of capacity within my organisation showed that a lot of people thought they were very knowledgeable and capable when it came to evaluation. As the resident ‘expert’ this does not appear to be the case in my interactions with my colleagues.

Lack of internationally accepted standards

Two responses also referred to the lack of ‘benchmark’ or ‘standards’ for the various levels of evaluators (from novice to expert).

Challenging to consider what the benchmark might be for the discipline, especially considering it does vary considerably across a variety of domains (NFP, government, private etc).

It is tricky to self-assess many of these competencies as I am not so sure on where the ‘bar’ is set and how I would compare to the bar and to my peers

These challenges were also raised in the literature and will remain a challenge for evaluator self-assessments until international benchmarks are developed.

Timing of the self-assessment

One of the participants raised the issue of timing of the self-assessment.

… my take on all these is different since the pandemic, and likely won’t change any time soon, but might. Hard to predict and I would have put in different answers in January

While the issue of timing is not particular to self-assessments instruments per se, timing of completion of self-assessments does need to be considered. In earlier discussion of other case studies, Stufflebeam and Wingate (2005) indicated that they timed their self-assessments to act as a pre-test-post-test within the timeframe of a course.

We now discuss our responses to the findings and updates to the instrument. Changes to the AES competencies that occurred due to the feedback on tests of the LEAP are discussed in Gullickson et al. (2024a, this issue).

Discussion

The research study involved two phases which focussed on the design and pilot testing of LEAP.

Phase 1 (design)

Self-assessments come with limitations and benefits. Accuracy is the main limitation of self-assessments, related to the nature of the knowledge and skills being assessed, and the metacognitive ability of the person doing the assessment. Crude skills are easier to assess than subtle skills; experts will rate themselves lower, non-experts will rate themselves higher. However, self-assessments provide benefits by setting boundaries for important knowledge and skills, and contributing to the development of performance standards and learning pathways. The researchers and the AES Pathways Working Group found this combination tenable: the benefits addressed key needs in evaluation, and the limitations could be managed with a caveat statement in the tool that self-assessments should not be used as summative reports of ability unless they are triangulated with direct methods of assessing skills and knowledge.

The LEAP piloting process with AES supports the findings from the literature review, that in the context of evaluation, self-assessment can be used successfully for learning. The following tables present our findings using the framework for good practice in self-assessment based on the case studies we examined. Table 6 shows that with the LEAP we were able to adhere to 9 of the 10 principles of good practice in the development of self-assessment tools and reporting of findings. Piloting of the LEAP revealed additional desirable principles: enable participants to track their learning over time (11), match participants with learning opportunities (12) and contribute to the general knowledge base about evaluation learners and learning (13). These have been added for future development of self-assessment tools.

Table 6.

LEAP’s performance on the characteristics for good self-assessment instruments.

Good self-assessment instruments	Findings from LEAP
1. Begin with purpose, domain, and caveats	✓
2. Use simple, streamlined instructions and wording	✓
3. Use a subset of competencies OR
4. Use the full list of competencies	✓
5. Ensure items are clear	✓
6. Define anchor points explicitly and use a similar progression across all domains	✓
7. Avoid respondents’ temptation to over-rate by being anonymous or only individual.	✓
Good self-assessment instruments produce
8. Individual reports for learning	✓
9. Benchmark reports of the individual versus the group	Aspirational
10. Anonymous aggregate reports to direct learning and coordinate expertise in organisations and teams	✓
Additional principles for self-assessment resulting from LEAP study
11. Time series reports that enable individuals/organisations to track their learning	Aspirational
12. Reports that match participants with learning opportunities	Aspirational
13. Findings that contribute to the general knowledge base about evaluation learners and learning.	✓

Phase 2 (pilot testing)

Table 7 provides an overview of the findings from the piloting of LEAP by research question for Phase 2.

Table 7.

Phase 2 findings by research question.

Research question	Finding
1. Will there be interest in taking a self-assessment on evaluator competencies?	The researchers and the AES Pathways Committee members were surprised by the strong interest and motivation of the community in self-assessment (181 responses in early testing).
2. Will people judge that it is worth their time to complete?	Participation in both taking the tool and providing feedback provided evidence that participants not only found the exercise valuable to do, but also worth a further investment of their time to improve it by providing feedback.
3. Can doing the self-assessment expand people’s frame of reference around what is required for an evaluator to know and be able to do?	Further investigation will have to be carried out by the researchers regarding the issue about increased awareness of participants of the range of competencies required. A follow-up study is planned with AES members.
4. Can this self-assessment tool assist evaluators at any level of expertise in identifying areas for professional development?	Feedback from respondents indicated that the individualised reports allowed for reflection and a plan of action regarding development. Future development of LEAP will allow participants to link to service providers and capacity building opportunities aligned to their areas for development.
5. How can we improve the LEAP to make the self-assessment more effective and user friendly?	Changes were made to the
	• rating scale
	• learning needs
	• workforce
	• training
	• demographic and
	• other (general item)
	We also added in individual and aggregated reports for improved learning. The benchmark function will be added in the future.

Engagement with the free online self-assessment has produced benefits for the AES other organisations, one of which is described in this issue (Gullickson et al., 2024b, this issue). It has also laid the foundation for collaboration with further organisations; the South African Monitoring and Evaluation Association (SAMEA) will launch their competencies in the LEAP in late 2024. The free online platform is accessible to the public. Together with the organisational portals, the research team continues its overarching goal to learn more about those who want to learn evaluation.

Future research

Several future research questions emerged from this study. Some research questions are related to the development of competency sets we use in in self-assessment tools – such as ‘How can we identify the domains that differentiate evaluators and generate measures that are useful?’; ‘How might we distinguish between “crude” and “subtle” skills within evaluation competencies?’. Others relate to the user of self-assessments tools themselves – such as ‘Is a rough idea of competence or competency sets good enough?’; ‘How do we allow for benchmarking in self-assessments?’ Some of these research questions will be tackled in the next iteration of LEAP but we also hope that evaluation researchers will partner with us as we work towards understanding more about evaluation learners and their need for professional development pathways.

Conclusion

The co-design and development of LEAP through the partnership between AES and the researchers provided learning opportunities for all those involved. Reflection by evaluators regarding their strengths and weaknesses and their preferred modes of learning allows for the development of appropriate learning pathways for participants by AES. The research also contributed to making evaluator competencies more measurable and provided valuable data for the research team on the competencies of evaluation learners and the development. The authors believe that the further development of and research on using the LEAP will contribute significantly to the use of self-assessments internationally as a formal process for assessing the expertise of practitioners. The collaboration in development and refinement of LEAP has created a platform that can provide benefit to learners and their organisations beyond just the AES and can continue to generate data for research on evaluation learners.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Lauren Wildschut

Amy M Gullickson

Sarah Mason

References

AES Professional Learning Committee . (2013). Evaluators’ professional learning competency framework. Australasian Evaluation Society. https://www.aes.asn.au/images/AES_Evaluators_Competency_Framework.pdf

Aotearoa New Zealand Evaluation Association . (2011). Evaluator competencies. https://anzea.org.nz/assets/Key-ANZEA-Files/110801_anzea_evaluator_competencies_final.pdf

Australasian Evaluation Society . (2013). Guidelines for the ethical conduct of evaluations. https://www.aes.asn.au/images/AES_Guidelines_web_v2.pdf

Baizerman

(2009). Deepening understanding of managing evaluation. New Directions for Evaluation, 2009(121), 87–98. https://doi.org/10.1002/ev.288

Chi

M. T. H.

Glaser

Rees

(1982). Expertise in problem solving. In Sternberg

(Ed.), Advances in the psychology of human intelligence (Vol. 1, pp. 7–75). Erlbaum.

Chouinard

J. A.

LaVelle

J. M.

(Eds.), (2021). How to teach evaluation. Canadian Journal of Program Evaluation, 35(3).

Cram

Tibbetts

K. A.

LaFrance

(Eds.). (2018). Indigenous evaluation. New Directions for Evaluation, 2018(159).

Dellai

Mortari

Meretoja

(2009). Self-assessment of nursing competencies: Validation of the Finnish NCS instrument with Italian nurses. Nordic College of Caring Science, 23(4), 783–791. https://doi.org/10.1111/j.1471-6712.2008.00665.x

Department of the Treasury . (2023). Australia Centre for Evaluation. https://evaluation.treasury.gov.au

10.

Diaz

Chaudhary

A. K.

Jayaratne

K. S. U.

Assan

(2020). Expanding evaluator competency research: Exploring competencies for program evaluation using the context of non-formal education. Evaluation and Program Planning, 79, 101790–101810. https://doi.org/10.1016/j.evalprogplan.2020.101790

11.

Dreyfus

H. L.

Dreyfus

S. E.

(2004). The ethical implications of the five-stage skill-acquisition model. Bulletin of Science, Technology & Society, 24(3), 251–264. https://doi.org/10.1177/0270467604265023

12.

Dreyfus

H. L.

Dreyfus

S. E.

(2005). Peripheral vision: Expertise in real world contexts. Organization Studies, 26(5), 779–792. https://doi.org/10.1177/0170840605053102

13.

Ehrlinger

Dunning

(2003). How chronic self-views influence (and potentially mislead) estimates of performance. Journal of Personality & Social Psychology, 84(1), 5–17. https://doi.org/10.1037//0022-3514.84.1.5

14.

Everson

H. T.

Tobias

(1998). The ability to estimate knowledge and performance in college: A metacognitive analysis. Instructional Science, 26(1/2), 65–79. https://doi.org/10.1023/A:1003040130125

15.

Ghere

King

J. A.

Stevahn

Minnema

(2006). A professional development unit for reflecting on program evaluator competencies. American Journal of Evaluation, 27(1), 108–123. https://doi.org/10.1177/1098214005284974

16.

Gullickson

A. M.

(2020). The whole elephant: Defining evaluation. Evaluation and Program Planning, 79, 101787. https://doi.org/10.1016/j.evalprogplan.2020.101787

17.

Gullickson

A. M.

Siddiqi

(2022, September). Evaluators’ self-assessment tool. Australasian Evaluation Society Conference, Adelaide, Australia.

18.

Gullickson

A. M.

Siddiqi

Lloyd

Stephens

Argyrous

(2020a, September). Evaluators’ self-assessment tool: Part 1. Australasian Evaluation Society FestEVAL.

19.

Gullickson

A. M.

Siddiqi

Lloyd

Stephens

Argyrous

(2020b, September). Evaluators’ self-assessment tool: Part 2. Australasian Evaluation Society FestEVAL.

20.

Gullickson

A. M.

Siddiqi

Lloyd

Stephens

Argyrous

Wildschut

(2024a). Moving from guideline to measure to findings: The Australian Evaluation Society and the Learn Evaluation Assessment Portal. Evaluation Journal of Australasia, 24(4), 272–300. https://doi.org/10.1177/1035719X241293808

21.

Gullickson

A. M.

Tanabe

Davis

E. R.

Howard

J. L.

Gonzalez

(2024b). Evaluative attitude in action: Using the Learn Evaluation Assessment Portal in graduate education. Evaluation Journal of Australasia 24(4), 322–339. https://doi.org/10.1177/1035719X241293809

22.

HM Treasury . (2020a). Government analytical evaluation capabilities framework. HM Treasury. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/879420/Magenta_Book_supplementary_guide._Government_Analytical_Evaluation_Capabilities_Framework.pdf

23.

HM Treasury . (2020b). Government analytical evaluation capabilities framework interactive tool. HM Treasury. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/877197/Magenta_Book_supplementary_guide._Government_Analytical_Evaluation_Capabilities_Framework_Interactive_Tool.xlsm

24.

HM Treasury . (2020c). Magenta book: Central government guidance on evaluation. HM Treasury. https://www.gov.uk/government/publications/the-magenta-book

25.

Honken

(2013, June). Dreyfus five-stage model of adult skills acquisition applied to engineering lifelong learning. [Paper presentation]. ASEE Annual Conference and Exposition, Atlanta, USA. https://doi.org/10.18260/1-2--19457

26.

Jahan

Sadaf

Bhanji

Naeem

Qureshi

(2011). Clinical skills assessment: Comparison of student and examiner assessment in an objective structured clinical examination. Education for Health, 24(2), 421. https://doi.org/10.4103/1357-6283.101446

27.

Kallemeyn

L. M.

Bourgeois

Ensminger

D. C.

(Eds.). (2022). Case-centered teaching and learning in evaluation. New Directions for Evaluation, 2022(172). https://doi.org/10.1002/ev.20370

28.

Klin

C. M.

Guzmán

A. E.

Levine

W. H.

(1997). Knowing that you don’t know: Metamemory and discourse processing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23(6), 1378–1393. https://doi.org/10.1037//0278-7393.23.6.1378

29.

Kruger

Dunning

(1999). Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology, 77(6), 1121–1134. https://doi.org/10.1037/0022-3514.77.6.1121

30.

LaVelle

J. M.

Neubauer

L. C.

Boyce

A. S.

Archibald

(Eds.). (2023). Preparing the 21st century evaluator: Critically defined and responsive education training. New Directions for Evaluation, 2023(177). https://doi.org/10.1002/ev.20451

31.

Lichtenberg

J. W.

Portnoy

S. M.

Bebeau

M. J.

Leigh

I. W.

Nelson

P. D.

Rubin

N. J.

Smith

I. L.

Kaslow

N. J.

(2007). Challenges to the assessment of competence and competencies. Professional Psychology: Research and Practice, 38(5), 474–478. https://doi.org/10.1037/0735-7028.38.5.474

32.

Maki

R. H.

Jonas

Kallod

(1994). The relationship between comprehension and metacomprehension ability. Psychonomic Bulletin & Review, 1(1), 126–129. https://doi.org/10.3758/BF03200769

33.

Mouton

(2022). How to succeed in your masters and doctoral studies. Van Schaik Publishers.

34.

National Health and Medical Research Council . (2014). Ethical considerations in quality assurance and evaluation activities. https://www.nhmrc.gov.au/sites/default/files/documents/attachments/ethical-considerations-in-quality-assurance-and-evaluation-activites.pdf

35.

Persky

A. M.

Robinson

J. D.

(2017). Moving from novice to expertise and its implications for instruction. American Journal of Pharmaceutical Education, 81(9), 6065–6080. https://doi.org/10.5688/ajpe6065

36.

Protection of human subjects, 45 C.F.R. § 46 (2018). https://www.hhs.gov/ohrp/regulations-and-policy/regulations/45-cfr-46/index.html

37.

Rogers

E. M.

(2003). Diffusion of innovations (5th ed.). Free Press.

38.

Sitzmann

Ely

Brown

K. G.

Bauer

K. N.

(2010). Self-assessment of knowledge: A cognitive learning or affective measure? The Academy of Management Learning and Education, 9(2), 169–191. https://doi.org/10.5465/amle.2010.51428542, https://www.jstor.org/stable/25682447

39.

Smith

L. T.

(2021). Decolonizing methodologies: Research and indigenous peoples (3rd ed.). Zed Books. https://doi.org/10.5040/9781350225282

40.

Social Value International (nd). Self-assessment tool. https://www.socialvalueint.org/self-assessment-tool

41.

Stevahn

King

J. A.

Ghere

Minnema

(2005). Establishing essential competencies for program evaluators. American Journal of Evaluation, 26(1), 43–59. https://doi.org/10.1177/1098214004273180

42.

Stufflebeam

D. L.

Wingate

L. A.

(2005). A self-Assessment procedure for use in evaluation training. American Journal of Evaluation, 26(4), 544–561. https://doi.org/10.1177/1098214005279730

43.

Wildschut

Siddiqi

Mason

S. K.

Lloyd

Gullickson

A. M.

Stephens

Argyrous

(2021, September). Evaluators’ self-assessment tool: Part 3. Australasian Evaluation Society FestEVAL.

Making the Learn Evaluation Assessment Portal: Developing a reciprocal tool for learning and research

Abstract

Keywords

What we already know

The original contribution the article makes to theory and/or practice

Introduction

Conceptual framework

Benefits

Challenges

Case studies

Materials development training and support program (Stufflebeam & Wingate, 2005)

Essential competencies for program evaluators (Ghere et al., 2006)

The Magenta Book (Her Majesty’s Treasury, 2020)

Social Value (UK)

Summary

Research design and questions

Methodology and methods

Phase 1: Instrument development

Phase 2: Pilot testing

2020 alpha test

2021 beta tests

Sample

Ethics

Findings

Phase 1: Designing the LEAP

Principle 1: State purpose and domain

Principle 2: Use simple, streamlined instructions and wording

Principles 3 & 4: Choose full or reduced set of competencies

Principle 5: Ensure items are clear

Principle 6: Clear and justified rating scale

Principle 7: Avoid respondents’ temptation to over-rate by being anonymous or individual

Principles 8–10: Individual reports for learning/Benchmark reports of the individual versus the group/Anonymous aggregate reports to direct learning and coordinate expertise in organisations and teams

Demographic considerations

Phase 1 summary

Phase 2: Testing the LEAP

Alpha test 2020

Alpha test findings

Improvements made to LEAP

Beta test 2021: Testing the report

Response to LEAP by participants

Benefits of self-assessment raised by participants

Challenges of self-assessment raised by participants

Self-reporting

Lack of internationally accepted standards

Timing of the self-assessment

Discussion

Phase 1 (design)

Phase 2 (pilot testing)

Future research

Conclusion

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iDs

References