Abstract
Evaluation has become an institutionalised feature of public and social sector administration. With the advent of digital-era governance it is evident that further changes to the evaluation landscape are imminent. This study seeks to identify how evaluation practices will change, evolve or persist with advancing technologies. Drawing on discursive institutionalisation’s conceptualisation of change, we conducted discourse analysis of 25 qualitative interview transcripts, focusing on objects, legitimacy frames and values. Our findings indicate a significant shift towards digital-era evaluation through the automation of data collection, analysis and reporting, shifting control from external evaluators to in-house functions. Despite these shifts, traditional evaluation features remain entrenched, potentially limiting the future scope and utility of evaluation for social sector organisations.
Introduction
Evaluation as the ‘careful retrospective assessment of public-sector interventions, their organisation, content, implementation and outputs or outcomes’ (Vedung, 2010: 264) has become an institutionalised feature of public-sector administration (Dahler-Larsen, 2014). The desire for and application of evaluation methods that quantify service outcomes and provide instrumental information to decision-makers has become such a strong norm that rejecting it is perceived as a sign of inadequacy and no longer acceptable (Benjamin et al., 2023; Dahler-Larsen, 2014). This extends to social sector organisations that, in neoliberal contexts, deliver public services on behalf of the state (Benjamin et al., 2023). The institutionalisation of evaluation is attributable to several interrelated factors including a widespread evaluative mindset throughout society, the concentration of research and development around organisational activity and a structural transformation towards managerialism (Benjamin et al., 2023; Dahler-Larsen, 2011; Meyer and Rowan, 1977).
One theory of the institutionalisation process is that it occurs through discourse. In this way, society’s institutions are created and understood through the ways we communicate about and within them (Berger and Luckmann, 1967; Phillips et al., 2004). As Phillips et al. (2004) explain, ‘institutions are not just social constructions, but social constructions constituted through discourse’ (p. 5). Put more plainly, using evaluation as an example, institutionalisation of evaluation did not occur ‘because actors in the various organisations directly observed it in action, but because of the accumulation of business, professional and academic texts that explained, legitimated, validated, and promoted it’ (Phillips et al., 2004: 12).
Although evaluation is an institution, it is still impacted by norms, values and logics of other institutions in the broader environment, in particular democracy and governance (Vedung, 2010; Zucker, 1983). As outlined by Vedung (2010), within the iterations of ‘evaluation-coupled-to-governance doctrines, there have been strong currents from both the left and the right of the political spectrum’ (p. 264). Changes in evaluation practice, typified by dominant norms and rituals within the field, have been influenced by the overarching authority of governance values. As a result, what is considered legitimate evaluation practice has evolved to align with the priorities and expectations of this hierarchical superior institution (Ahearn and Parsell, 2024; Vedung, 2010; Zucker, 1983).
The utility of evaluation is a persistently contested element in this discourse (Benjamin et al., 2023; Kupiec et al., 2023; Raimondo, 2018). For example, when Chen and Rossi (1983) wrote that the ‘domination of the experimental paradigm in the program evaluation literature has unfortunately drawn attention away from a more important task in gaining understanding’ (p. 2) their discourse served both to challenge the existing experimental paradigm and to conform with the emerging more managerial paradigm. Through discourse, they radically loosened their evaluation approach’s ties to previous purely empirical methodologies by selectively coupling (Suddaby and Greenwood, 2005) with emerging pragmatic managerial values (Ahearn and Parsell, 2024).
A managerial approach to evaluation, characteristic of the neoliberal wave and based on a principal-agent model, has positioned accountability as a foremost objective (Ahearn and Parsell, 2024; Dahler-Larsen, 2014; Vedung, 2010). However, accountability expectations complicate the evaluation process, creating a ‘problematic trade-off’ where funder-oriented accountability takes precedence over learning (Ebrahim, 2005; Reinertsen et al., 2022). Furthermore, measurement systems for upwards accountability can reshape programmes to meet the expectations set by these measures, rather than the original intent or beneficiaries’ needs (Ahearn and Parsell, 2024; Dahler-Larsen, 2014; Ebrahim, 2019). Despite the harms that aligning evaluation practices with neoliberal and managerial expectations appears to cause, outright rejection is not a viable option. Compliance with the broader institutional environment upholds the legitimacy of evaluation and ensures the subsequent use of evaluative machinery to enhance the credibility of evaluated entities (Dahler-Larsen, 2011; Suchman, 1995; Zucker, 1983).
However, significant changes are underway in the broader institutional environment due to digital transformation and ‘digital-era governance’ (Dunleavy et al., 2006). Digital technologies are driving radical changes in public work and, by extension, in the social sector (Cordery et al., 2023; Petrakaki, 2018).
Digital-era governance encompasses a range of information-technology-driven changes that streamline not only ‘back-office processes’ but also the ‘terms of relations between government agencies and civil society’ (Dunleavy et al., 2006: 478). Digital transformations involve transitioning from analogue to digital interactions and data exchange. They also include broader reconfigurations and changes at the organisation or system level to support this transition (Cordery et al., 2023). The introduction of ‘novel actors (and actor constellations), structures, practices, values, and beliefs that change, threaten, replace or complement existing rules of the game within organizations and fields’ (Hinings et al., 2018: 53).
In the public sector, the reconfiguration of information systems – encompassing data collection, storage and transmission – has considerable impacts on existing public accountability arrangements. Petrakaki (2018) presented three primary changes facilitated by digital technologies: the linking of organisations, simultaneous centralising and decentralising of data, and the reorientation of power as new actors are afforded the ‘the right to know and the responsibility to act’ (p. 36). These processes, referred to as re-structuring, re-organisation and reorientation (Petrakaki, 2018), have the potential to disrupt established power structures within the evaluation process (Hanberger, 2022).
Social service organisations are ‘acutely affected [by digital transformations] due to the diversity of their stakeholders, regulatory and funder demands, and ongoing resource constraints’ (Cordery et al., 2023: 1). Those with contractual obligations to the state are increasingly utilising digital tools to collect, analyse, and communicate performance and outcomes data (Adhikari et al., 2023; Alshurafa et al., 2023; Cavicchi and Vagnoni, 2022; Cordery et al., 2023). This technology offers significant opportunities to ‘generate and analyse data at a fraction of the cost and time required by conventional data collection’ (Bamberger et al., 2016: 229). However, there are concerns that digital unaffordability and illiteracy could further marginalise specific groups (Alshurafa et al., 2023; Bamberger et al., 2016). Digital transformations have also been shown to amplify trends towards upwards accountability (Cordery et al., 2023).
The inclusion of digital technology in the evaluation process has been observed, though it is rarely the central focus of evaluation studies (Ahearn and Parsell, 2024; Bamberger et al., 2016; Henman, 2022). In many cases, technologies have quietly integrated into our daily processes, such as data processing software, telecommunications, and digital information transfers, evolving from rudimentary forms to advanced systems now increasingly embedded with artificial intelligence. However, change does not occur on its own; there must be desire for it, and actors must create space for it by legitimising it in their discourse. Animated by this changing, albeit under examined context, this study contributes to the body of evaluation knowledge through examining how digital technologies are being incorporated into evaluation processes in social sector organisations. Using a lens of discursive institutionalisation, we examine how actors discuss evaluation and digital technology use, exploring what these discussions reveal about the potential to transform the current institution of evaluation.
Theoretical positioning
Our approach to conceptualising the integration of digital technologies to evaluation draws from institutional theory and its associated conceptualisations of change. Institutional theory predominantly examines how organisations enact change within their institutional environment (Greenwood et al., 2011; Greenwood and Hinings, 1996). However, the institutional environment consists of organisations responding to the responses of other organisations within the same environment (DiMaggio and Powell, 1983). By studying how actors position themselves within this context, we can understand how the institutional environment is being actively constructed and established (Zucker, 1983).
Evaluation itself is institutionalised and a pivotal aspect of various other institutional arrangements (Dahler-Larsen, 2011). Highly institutionalised organisations or templates are less likely to undergo evaluation as doing so threatens the perception of their inherent legitimacy (Dahler-Larsen, 2011; Meyer and Rowan, 1977). Conversely, less institutionalised entities utilise evaluation to enhance their legitimacy, not just through findings but also by signalling a commitment to quality and evidence (Kupiec et al., 2023; Meyer and Rowan, 1977). This institutionalisation not only makes evaluation indispensable for many organisations but also constrains what we understand evaluation to be and the formats it can legitimately adopt (Zucker, 1983). Specifically, the currently institutionalised format of evaluation necessitates ‘a special set of methods known as “evaluation methods” and special people called “evaluators”’ (Dahler-Larsen, 2011: 17). However, we know that digital transformations warrant the introduction of new ‘actors, structures, practices, values, and beliefs’ (Hinings et al., 2018: 55).
Change can be classified as either evolutionary (also known as convergent), or revolutionary (also known as radical) (Greenwood, 2008; Greenwood and Hinings, 1996). Radical ‘involves the busting loose from an existing orientation’, while convergent is the iterative ‘fine tuning of the existing orientation’ (Greenwood and Hinings, 1996: 1024). However, the feasibility of fine tuning or busting loose is constrained by discourse which ‘“rules in” certain ways of talking about a topic . . . and also “rules out”, limits and restricts other ways of talking, of conducting ourselves in relation to the topic or constructing knowledge about it’ (Hall (2001) as cited in Phillips et al. (2004: 5)). If an institutional environment, and its constitutive discourse, is too tightly constrained whereby iterative dissent is not allowed, radical change becomes more likely (Greenwood, 2008). For this to occur there must be ‘intense pressure for change arising from dissatisfaction . . . [
Methods
Participant sample and interviews
Qualitative interviews were conducted with two cohorts of participants: the first with 20 key evaluation actors, and the second with five technology sector leaders.
In the first cohort, interviewees were selected based on their direct control over commissioning, design, and conduct of large-scale evaluations in the Australian social and public sector. To facilitate this, we focused on Australian not-for-profit organisations with more than $100 million AUD in annual income, and evaluation consultants who work with them. Participants held titles such as Evaluation Officer, Evaluation Manager, Chief Executive Officer, or Research Lead. Public LinkedIn profiles, organisation websites and snowball sampling were used to identify potential participants. Through open coding in the initial stages of the analytical process, we identified the central theme of standardised, digitised and automated monitoring and evaluation systems. We thus returned to the field to conduct further data collection with key stakeholders working at the interface of digital technology and evaluation.
For the second cohort, we conducted qualitative interviews with five technology sector leaders whose platform or services support not-for-profit organisations in data collection, analysis or reporting of evaluations. These participants held titles such as Chief Executive Officer, Founder or Not-for-Profit Technology Lead.
Both sets of interviews were loosely structured. Evaluator actors were asked a broad introductory question similar to ‘Tell me what evaluation looks like in your organisation’. Technology leaders were asked to provide an overview of what their technology aims to achieve for evaluation in the social service sector.
We used the active interview style (Holstein and Gubrium, 1995) where informants were engaged as co-constructors of knowledge. This approach allowed informants to actively narrate their understanding of their environment and actions in a manner that is ‘simultaneously substantive, reflective, and emergent’ (Holstein and Gubrium, 1995: 30).
Discourse analysis
Corresponding to the three elements of discourse identified in our theoretical framework, we conducted a discourse analysis on the data, which comprised the 25 qualitative interview transcripts.
Legitimacy frames
The first configuration of discourse we examined and coded was the framing of legitimacy, defined as the salient premise of the desirability of the object being discussed (Fairclough, 1992; Suddaby et al., 2016). When discussing a specific aspect of evaluation (an object), participants arranged their discourse to either support, strengthen or weaken the perceived legitimacy of that object. Subsequently, legitimacy frames were identified throughout the text and coded first as ‘maintain’, ‘enhance’ or ‘undermine’. Maintaining frames include a ‘status quo’ positioning whereby the object held a natural and unquestioned merit. Enhancing frames took a highly positive yet prospective tone to affirm the legitimacy of the incoming object. Finally, undermining frames adopted a negative tone, and often drew on comedic rhetorical devices including sarcasm or metaphor to delegitimise the object. A demonstration of these frames and their codes is included in the first paragraphs of our findings. Legitimacy frames were coded in the first stage, as doing so distinguished text containing value patterns from supplementary discussion between the interviewer and participant.
Objects (templates)
Objects of discourse are the targets of statements, or the ‘thing’ being discussed. Objects are formed and re-formed through discourses which define, describe or explain them (Fairclough, 1992). In our interviews, objects were the templates of evaluation such as methods, data collection approaches, analysis techniques, reporting, and use of findings. The second stage of our analysis was to deductively code objects into a series of broad codes which were then categorised into a smaller set of codes. The final codes in order of prevalence were standardisation, external evaluators, internal evaluators, automation, findings use, epistemologies and lived experience.
Values
Values are ‘conceptions of the desirable’ (Parsons, 1968: 136). Actors employ values within their discourse to align with a course of action by appealing to higher-order institutional values. Values refer to the object to which the actor is orientated, rather than to the actor (Parsons, 1968). The combination of an object, legitimacy frame and value forms a value-pattern that defines a direction of choice and commitment to action (Greenwood and Hinings, 1996; Parsons, 1968). Examining value-patterns is crucial to understanding the potential for radical change (Greenwood and Hinings, 1996). The final stage of our analysis required considerable contextualisation and analysis as values were not always present within value statements. Thus, after assigning text to both a legitimacy frame and object, we re-examined the intersections to identify any missed values. The final value codes in order of prevalence were utility, accountability, care, validity, rigour, impartiality and innovation.
Findings
Digital-era evaluation, as the latest wave of evaluation practice, emerged as a striking theme from our data. Through our discourse analysis we identified three contested areas related to digital-era evaluation: automation, reconfiguration of information systems and evaluation templates. These three areas were the primary focus of value statements. These findings were identified from our analysis that was facilitated through NVivo-generated hierarchy charts and concept maps to visualise the targets of value statements. We present these in turn.
Digital-era evaluation
When describing evaluation practice within their organisations, all social sector participants spoke to the standardisation of data collection and analysis. This involves digital technologies that simplify or entirely manage the collection of data from beneficiaries including automated survey distribution, pre-configured analysis and ‘real-time’ dashboards. An executive from a large technology organisation gave a clear description of the function and purpose of technology that captured the responses of our research participants:
So, what does the platform offer to charities?
Automated data collection. I think one of the challenges that we see for nonprofits, especially if they’re doing measurement or data collection, is that they’re using a load of disparate systems. So typically, it’s survey tools, SurveyMonkey, or even paper surveys. They use Excel and then they have to either go to consultants to help with that analysis or try to piece it together themselves with analytics engines or whatever else. So really our spiel is that we’re a purpose-built platform for outcomes measurement. We enable long-term outcomes measurement, not just piecemeal studies. We automate it and make it simpler from a data collection management perspective to help increase response rates. The governance of data is better, and there are benefits from using technology and systems.
Does your platform integrate those different data sources?
Absolutely. Our platform has a CRM at its heart and then we’ve built components that enable organisations to embed their theory of change or framework. They can build in the data collection, define the program activities, the data they want to collect from individuals at specific points in time, and we have dashboards and reports that pull in the data and provide results.
The above exchange demonstrates what we heard from both the social sector, and the digital technology participants in terms of objects, legitimacy frames and values. The three core objects of discourse are demonstrated in this section. First, the ‘automation’ of data collection and analysis through data integration and dashboards. Second, the reconfiguration and reorientation of data away from consultants and simplified for in-house use. Third, the pervasiveness of specific evaluation norms and templates, in this example, theory of change models.
Undermining, enhancing and maintaining frames are also evident in this exchange. The legitimacy of ‘loads of disparate systems’ is undermined by presenting them as a ‘challenge’. The alternatives, where organisations ‘have to’ rely on consultants or ‘piece it together themselves’ are equally devalued. The comparison of their platform to ‘piecemeal studies’ further demonstrates a delegitimisation of an outgoing template in favour of the emerging alternative. Conversely, the ‘purpose-built’ technology enables ‘long-term outcomes measurement’, indicating a commitment to ongoing comprehensive measurement thereby enhancing the legitimacy of the technology solutions. Technology is often framed as ‘simpler’ than the alternatives, particularly highlighting the benefit of ‘automation’. The second passage from Wayne regarding the Customer Relationship Management platform (CRM) and other data integration demonstrates a maintaining frame. The tone is generally positive, but the framing suggests a taken-for-granted belief in natural benefit.
Underlying these legitimacy frames is a core value of utility. This was observed throughout the discourse of both social and technology sector participants. By utility, we mean the quality of being useful or beneficial. Participants drew frequently on this value to frame the legitimacy or illegitimacy of a particular template. Utility was outlined as arising through two familiar objectives: accountability and improvement. Measurement and evaluation provided utility for accountability purposes when it could be ‘utilised for the reporting side and for understanding the collective impact of the organisation’ (Hyacinth, NFP Research Executive). Conversely, utility through improvement was met when measurement and evaluation produced ‘really actionable findings’ (Avery, National Evaluation Manager).
In alignment with previous discourse analyses in this space, the logic of caritas, care or social welfare often emerged as a foundational position (Mason, 2012; Nicholls, 2010). By this, we mean that when participants were challenged on their position, or were struggling to communicate the utility of something, they drew on language, metaphor and narrations of care and goodwill. The collection of data was consistently presented as an ‘obligation’ (Colin, Evaluation Manager) for social organisations to ensure ‘evidence [can] show this is the best possible thing we can do [for the community]’ (Gareth, Public Sector Evaluator). This emphasis was further highlighted by the focus on value for money and cost-efficient utility, which is not surprising given the resource constraints in the social sector.
Overall, utility was a stable value within the discourse, although the means to achieve it were contested. The ability of an evaluation template to render utility was often contested in relation to additional values of rigour, impartiality and validity. Although these values are traditionally associated with empiricism or science, they were frequently adopted to delegitimise previous or current norms in evaluation practice in favour of automated and digital solutions. We infer this dynamic as indicating that utility, as a joint value between empirical and market logics, is not under threat. However, values previously aligning to the empirical institution are being contested and negotiated to facilitate legitimate alignment with digital technology.
Automation
Participants legitimised the automation of evaluation processes including data collection, analysis, and reporting by emphasising greater visibility and control over data. In particular, the people who can use the data to directly inform governance and practice can access it when needed, as opposed to previous evaluation dynamics and data management:
The issue for us is really the lack of sophistication in our current systems, particularly with analysis of information in pulling together variables that sit in discrete systems [. . .] It’s not so much that we can’t access the information that we need. It’s just a very convoluted and manual process [. . .] we spend a quite a significant amount of time swapping and chopping and manipulating data to get it into the right format.
Immediate access to data and automated analysis was also presented as expanding the potential ‘for more intelligent purposes than just compliance’. Participants contrasted this incoming potential to the previous approaches to data management and analysis which were framed as a reactive response to compliance requirements and precluded ‘learning from our data’ (Lou, Outcomes and Improvement Director).
The first stage to achieve this was to establish a standardised outcomes evaluation framework, which was either already in place or in the process of being established in the majority of our informants’ organisations. Many of these organisations had embarked on the second stage of automating this framework through custom-build digital technologies.
Once established, these standardised outcome measurement frameworks, whether automated or not, are seen as delivering marked utility for both accountability and learning perspectives. As described by Theo (Senior Evaluation Consultant):
[It’s] the old adage of if it doesn’t get measured and reported, it won’t get managed. So I think then any sort of tool, report, or dashboard it’s the quality of the conversation that surrounds it.
Some participants were more sceptical, as described by Mak (Research and Evaluation Lead):
I think in the context of this drive towards dashboards, I think there’s a real perception that if you put data in front of managers, they will know what to do with it. And I can tell you that that is absolutely not true. Even the right data put in front of the right managers in the wrong format will have no benefit.
The above use of the phrase ‘drive towards dashboards’ signals a shift in the larger institutional environment. However, Mak’s sceptical framing suggests he is doubtful of its added utility, later referring to it as a shift towards ‘being data decorated’. Similarly, while the push for standardised measurement and dashboards was Board-driven, several participants questioned the practicality of this given the limited time for reviewing evaluation findings in quarterly or monthly Board meetings. Nonetheless, many participants were hopeful about the learning potential of digital technologies in providing data to frontline service staff.
Reconfiguration of information
The reconfiguration of the location and accessibility of data towards ‘real time access . . . to be able to see what’s going well, but also what’s going badly’ (Xander, Tech Founder) has created subsequent shifts in who has power over the evaluation process. Such technology centralises evaluation data from various sources while simultaneously decentralising access. This reconfiguration has reduced the reliance on external consultants for data collection, storage, analysis and interpretation, transferring these responsibilities to the technology itself:
Mak (Research and Evaluation Lead) described how this had generated an entirely new way of understanding evaluation in his organisation.
Only recently has it occurred to us that as a research and evaluation unit, we have a business-as-usual function because you would expect that by and large research and evaluation is a project focused entity. But yeah, in the context of this type of work, where we can provide this continual improvement monitoring, there is capacity for us to have that business-as-usual kind of component.
This has also led to shifts within some organisations which formally had ‘Evaluation’ units but now have dedicated ‘Impact’ or ‘Outcomes’ units with staff focused on the evaluation framework, associated common data set and dashboards rather than evaluation projects. We observed that although this had disrupted evaluators and their teams, even those displaced by the changes largely enhanced or maintained the legitimacy of this shift. This legitimisation was largely enacted through the values of rigour and data quality which they presented as achievable through the automated collection and dashboards:
And so there are some tensions there. But I do think that common data set is a really good foundation because you know back in the day we couldn’t say ‘excellent, we serve X number of people of these varieties and these ways’ we couldn’t say it because individual programs might know. But across the organisation you it wasn’t a [collected] so I think that is that’s been a good thing.
In many cases, this shift was facilitated by internal evaluation workshops where frontline staff collectively reviewed data and identified areas for learning and improvement. Participants described the learning, enthusiasm, and practical insights gained from this process as ‘really amazing . . . to have that kind of conversation and the anecdotes and on the ground, experience coming out’ (Petra, Monitoring Evaluation and Learning Lead).
Accompanying the legitimisation of this new template was a sustained delegitimisation of the former template of evaluation projects conducted by external evaluators. As Steph (Research and Strategy Lead) described,
gone are the days where the academics are kind of the experts and hold all that expertise . . . I don’t think that’s beneficial for anybody, it doesn’t meet anybody’s needs and what it means is that as a service provider you’re left with something that maybe misses the mark and isn’t what you were after.
Although this quote directly targets academic evaluators, all external evaluators including consultants were equally scrutinised. Key concerns included separation from the programme, lack of impartiality and poor quality, together highlighting a perceived limited utility of evaluation projects by external evaluators. The core dissatisfaction and delegitimising concerns raised by participants are well captured within this longer excerpt from Ilya, a Principal Evaluation Lead with several decades of experience in the sector:
Do you ever outsource work to externals?
We have, and honestly, sometimes consultants they can, they can be quite crap. And even academics. Sometimes you get this sense that . . . I don’t know . . . They think it’s community sector and they can deliver crap.
Yeah.
At times it’s quite cross making and yeah, because come on.
Absolutely.
Which you start to then think, well often you might do external and particularly with the university people, it’s basically to have the university name for the credibility and sometimes that is a requirement of the funder, that they want a university partner, just to . . .
Yeah, OK.
And so that’s a bit irritating.
Yeah for sure.
So it can be problematic. And if you get consultants, it can be just cookie cutter, you know, they’re coming from a profit perspective to minimise the work really. So, they’ll bring some people in on it, just to write the proposal. So, you have you have the proposal with the people who are kind of senior and then you never see them again and it’s all the juniors who do the work.
As Ilya states, a university produced evaluation can be seen by funders as holding more gravitas. Likewise, Brenden (Evaluation Director) commented that for an evaluation report to have ‘credibility’ it must be produced by an external evaluator. On the other hand, participants noted that evaluators’ credibility should be questioned, as their reputation and repeat business depend on the satisfaction of the commissioning organisation:
You know before, we would say we’d go for a consultant because you want an independent view. But I’m increasingly really realising that that independent view isn’t independent, and we can actually be much more critical, and much more, umm, able to get down to the deep issues if we do the evaluations ourselves.
The references to low quality and limited critical analysis in this excerpt demonstrate the challenge of traditional evaluation models and external evaluators that the digital era is idealised to address. As Ilya later commented,
[they] don’t have that kind of connection. Like we work really closely with the programs, we know the programs, and I suppose we’re sort of invested in them in some way because we want the programs to be good and sometimes the best way of getting something to be good is to point out what’s not so good.
These quotes reveal social sector staff’s dissatisfaction with evaluators and their projects, attributing a lack of evaluation utility to the previous template of external evaluators conducting standalone projects. The alternative template, in this case the use of digital technology to automate evaluation, is the final element in institutional change:
There’s a whole kind of infrastructure there which will help us to be able to do much more intelligent historic comparisons and run reports in a much more a complete way. And I guess that’s the next level because then that enables us to make more informed decisions around particular client groups and cohorts.
Evaluation templates
Despite the shift away from external evaluation projects, some elements, in particular theory of change models and upwards accountability pressures, remain resilient. Previous research shows that visual models associated with theory-driven evaluation have become entrenched in the field due to its practical empiricism and evidence-based management (Ahearn and Parsell, 2024). According to Wayne and all other technology sector participants, a theory of change model is the mandatory first step for social sector organisations before starting to build an automated digital platform.
Reinertsen et al. (2022) highlighted that theory of change models, intended for flexibility and analytical reflection, may paradoxically lead to more standardisation and rigidity. Embedding these models into technology design and coding further institutionalises this approach, both through discourse that legitimises and promotes it (Phillips et al., 2004) and actual embedding into the technology code. Several of our participants spoke to the inability to revise theory of change models once designed because of ‘limitations with the funding for social services in Australia’ (Hyacinth, NFP Research Executive). The resourcing required to redesign both the theory of change and digital platform would be still larger.
Furthermore, participants described ‘piggybacking on’ funders’ ‘already established standard measures’ (Petra, Monitoring Evaluation and Learning Lead) when designing a theory of change and organisation-wide standardised measurement frameworks. Specifically, organisations often prioritised existing key performance indicators set by funders within the design of their standardised and automated evaluation framework:
Even if an organisation is large, it’s still that programmatic data from a particular intervention or initiative where the data flows through [. . .] because that’s still driven initially from government reporting requirements. So first of all that’s going to need to be able to collect that data to meet those, because largely we’re talking about government funded entities with a small philanthropic component.
While participants had often incorporated funder-mandated measures into their frameworks, they also frequently delegitimised these measures, citing their misrepresentation of true programme impact and potential inappropriateness. These value patterns often drew back to the logic of care:
Yes, we have to collect all these metrics, but that doesn’t tell you anything about if we’re meeting their other needs . . . and you know, there’s no point asking somebody ‘Do you like the food here?’ if they’ve got a peg feed [a gastric feeding tube]. That’s a useless question for them. But you know, they’re the mandated questions, so . . .
To mitigate this, several organisations incorporated lived experience consultations during the theory of change and impact framework design. They held focus groups with current or former clients to review the proposed models. As Lou (Outcomes and Improvement Director) explained, ‘But see who benefits if we get this right? The customer. So, we need to check with them, yeah’.
It is not an oversight that we have not discussed the qualitative paradigm until now. Rather, it was rarely raised by participants when asked about evaluation and outcomes measurement. When prompted, they often referred to qualitative data as ‘stories’ and dismissed it as biased and lacking rigour. However, this perceived lack of gravitas led some participants to suggest the need for external evaluators to capture qualitative data meaningfully. Even technology sector participants noted that while some platforms offer automated word count and sentiment analysis, funders prefer quantitative data largely due to ‘a time problem, so people want annual reports with more hard numbers’ (Xander, Tech Founder).
Xander and other technology participants described how automated graphs in their platforms use embedded cumulative calculation and data visualisation to infer increased outcomes. Social sector participants, like Noah (Data Officer), noted the performative accountability norms in quantitative measurement. He described a ‘faux philosophy of science’ held by public funders ‘they say they want to make evidence-based decisions, they say they want to spend the taxpayers money in a good way, but actually all they really want is [that the] line go up’.
Discussion
Although evaluation is institutionalised, its approaches, execution, and the conduct of evaluators remain open to criticism and change. Our analysis examined how actors frame the shift towards digital-era evaluation, and subsequently identified an impending radical change for the evaluation field. Our analysis reveals a radical change towards digital-era evaluation, alongside evolutionary changes, and points of stagnation.
In continuance of existing literature in the field, utility is a central value of evaluation (Alkin, 2004; Benjamin et al., 2023; Kupiec et al., 2023; Raimondo, 2018). However, the means for achieving it are contested. Although rigour, impartiality and validity are traditionally values associated with empiricism, they have been co-opted to legitimise digital technologies in the evaluation process. External evaluators, such as academics and consultants, were once seen as having greater expertise and accuracy. Participants in our study, on the other hand, undermined these qualities as part of their positioning of digital technologies as a means to enhance what can be learnt from evaluation. They undermined the quality, impartiality and connectedness of external evaluators’ work while elevating the capacity of technology to deliver more complex and nuanced in-house analysis. This shift was promoted as allowing those directly involved to have greater access to and ownership of findings, enabling more effective use of the data.
Although complaints about external evaluators being disconnected from the evaluand, and thus less able to produce meaningful findings and changes are not new (Patton, 2008), digital technology offers an alternative that was previously unavailable. As outlined by Greenwood and Hinings (1996), radical change occurs when there is deep dissatisfaction with the current system, and an alternative system can be identified. While we hesitate to label this shift as revolutionary, without hindsight current discourse strongly suggests that emerging changes in evaluation practice are radical.
Senior consultant Theo noted that while technology could allow evaluators to engage in more interesting, in-depth work by automating mundane tasks, these tasks are traditionally how junior evaluators are trained and introduced to the field. Nevertheless, the relationship between the technology sector and evaluation researchers indicates further opportunities for growth. Indeed, the technology sector participants have observed upskilling in evaluation and social science research among their developers. Other participants spoke to the need for evaluators to become more ‘tech savvy’. Therefore, although the profession of evaluation will certainly be affected by this radical change, it is possible that it will be a realignment rather than an extinction if ‘novel actors’ (Hinings et al., 2018: 53) with evaluative skillsets integrate into the technology industry.
A secondary source of potential change arises from the reorientation of power and ownership over data through digital technology transformations. This shift in access and control over evaluative data was framed as a key strength, providing individuals closer to the evaluand with direct access to findings and the ability to make immediate changes, potentially remedying past underutilisation.
Alongside this, we observed strong delegitimising discourse against top-down accountability and mandatory performance measures set by funders. While the limiting and derailing influence of such measures is well documented (Christensen and Ebrahim, 2006; Dahler-Larsen, 2014; Mathys et al., 2024), this awareness was heightened among our social sector participants. Indeed, technology sector informants reported that the charities they had worked with often revised their surveys and intake forms after dashboards provided them with their first-ever visualisation of their data. Social sector participants noted that direct access to data allowed them to identify and address deficiencies in their surveys and measurement approaches they had previously not been aware of. The subsequent strong opposition to ‘abysmal’ (Noah, Data Officer) standardised measures from funders indicates another potential source of change. By raising awareness of the deficiencies of standardised data among social sector actors who have more investment in social programmes, and negotiating power with public funders, they may be better able to advocate for improved or loosened measurement requirements. Indeed, identifying these issues through digital platforms may provide a strong counter-lever against entrenched forms of managerialism, especially given the significant legitimacy currently afforded to digital technologies (Hinings et al., 2018).
Despite greater data visibility and recognition of deficient standardised measures through digital technologies like dashboards, some sources of resistance to change persists in evaluation. All technology sector informants noted that having a theory of change model is mandatory before commencing technology builds. These models, imbued with market norms around causal change, privilege certain programme types (Ahearn and Parsell, 2024). A preference for longitudinal quantitative data was also observed, even though the qualitative paradigm may be better suited for incorporating lived- experience perspectives. Therefore, larger programmes with more data are prioritised within the outcomes framework and digitisation process, making it easier for them to meet accountability standards and benefit from digitisation, potentially creating a meta-constitutive effect. Consequently, large programmes that conform to standardised measures are more likely to show positive performance and gain from digitisation, while smaller, less conforming programmes may be left behind.
However, many social sector participants indicated that incorporating mandatory funder measures is the first step in designing their theory of change models. In other words, the mandatory measures enforced by a funder of one social service within an organisation are being picked up and implemented across the entire organisation in an attempt to measure common outcomes. This poses a significant threat to the utility of quantitative measures for both accountability and learning, as the original intent behind prescribing the measure is entirely lost. Furthermore, because performance measures not only assess but also define what constitutes ‘good’ performance, they inherently shape social programmes towards that definition (Dahler-Larsen, 2014). Thus, while the technology offers opportunities for streaming processes and saving time, we should use that time to carefully trace back and critically examine what is being embedded into these systems.
A limitation of this study is its focus on accountability within digital transformations, as seen in prior literature (Adhikari et al., 2023; Cavicchi and Vagnoni, 2022; Cordery et al., 2023). However, we also offer some insights into how digital technology facilitates learning and improvement by ‘driving information back to services [that] can action that information readily [and] provide managers with the ability to see things that they might not otherwise see at the level of service provision’ (Mak, Research and Evaluation Lead). In addition, as digital evaluation systems aim to increase the quality and consolidation of programme data (Bamberger et al., 2016), there is potential for future research to examine their impact on programme performance improvement.
It is important to note that this analysis reflects the practices of affluent Australian social sector organisations. The preference for digitisation and the legitimacy given to these efforts may be specific to the Australian context; it is possible that this specific manifestation of digitalisation of evaluation is unique to the social service sector. Given the limited contemporary empirical research that demonstrates this digital turn in the broad evaluation sector, we encourage scholars of evaluation to examine how, including if at all, similar movements towards the digitalisation of evaluation identified here are taking place in other sectors.
Unlike the large organisations examined in this study, smaller organisations with less funding may not yet be able to achieve this level of digitisation. However, technology sector informants mentioned supporting both large and small organisations. One informant’s technology organisation is entirely volunteer run, providing free services to small organisations, both in Australia and internationally. In addition, a social sector informant described how their organisation sells its standardised and digitised measurement system at-cost to others in their sub-sector. Although digitisation may currently be out of reach for some organisations, efforts are being made to reduce technology costs. Furthermore, mimetic isomorphic pressures are likely to drive the adoption of this innovation throughout the sector (DiMaggio and Powell, 1983; Han and Ito, 2024; Haveman, 1993).
Conclusion
Through examining how actors frame the legitimacy of evaluation within their discourse, we observe an inclination towards digital-era evaluation. Our theoretical orientation to this was that evaluation templates, when presented through discourse, were framed as more or less legitimate in comparison to alternatives. This framing is achieved by appealing to specific dominant values and aligning with them. In the case of digital-era evaluation, the legitimacy of existing evaluation methods was undermined and presented as incumbering the utility of evaluation in social sector organisations. Digital technologies, including automated data collection and analysis platforms like dashboards, are positioned to simplify and enhance the quality and control of evaluation findings. While digital transformation shifts power within evaluation projects, elements of the old evaluation templates persist. Specifically, standardising data as a first step in these digital transformations may further entrench undesirable evaluation practices. We urge evaluation theorists and practitioners to embrace digital-era evaluation so as to better understand its benefits and challenges, especially in the social sector. Ensuring effective evaluation is crucial for the continuity of the social sector but more importantly the quality of services delivered to beneficiaries, thereby making it essential to optimise the use of evaluative skill and of technology.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the Australian Research Council’s Centre of Excellence for Children and Families over the Life Course (Project ID CE200100025) and an Australian Government Research Training Program Scholarship.
Data availability statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
