Abstract
Traditionally, animal testing has been considered the ‘gold standard’ in determining potential effects of chemicals on humans. However, this dogma is increasingly being questioned, not only due to ethical and financial implications, but also because of the poor translatability of the results of animal tests to humans. Therefore, there is a need to modernise the concept of the gold standard and ensure that any new approach is flexible, adaptable and as future-proof as possible. Herein, we reflect on recent suggestions for updating, or redefining, the traditional gold standard of animal testing and propose a new definition. This proposal focuses more on the process of answering a specific question, using all available tools, rather than seeking to recapitulate an animal test. New Approach Methodologies (NAMs) provide an ever-expanding array of tools that can assist product development and play a key role in chemical assessment strategies. Ten recommendations for developing fit-for-purpose NAMs, to increase their acceptance and accelerate their adoption, are presented.
Introduction
For decades, results from animal testing have been heralded as the ‘gold standard’ for determining safety, efficacy and internal exposure of chemicals of interest. The journal ATLA (Alternatives to Laboratory Animals) recently invited submissions to a Virtual Special Collection (VSC) that aimed to capture current opinions on how the gold standard could be refined. 1 Herein, we reflect on the historical use of animal testing and key themes that are emerging from recent proposals to redefine, or modernise, the gold standard. Recommendations are also provided regarding the development, optimisation and promotion of New Approach Methodologies (NAMs) to increase their acceptance as replacements for animal testing. Condensing our argument to minimalist terms, we propose that a new gold standard could be defined simply as: a process by which a meaningful answer to a specific, accurately-defined question is obtained without using animal testing or animal-derived material.
Key to this proposal is the term “accurately-defined question”. For decades, researchers, regulators and other stakeholders have accepted answers that are readily obtainable as a surrogate for the answers that are desirable. For example, the efficacy of a new drug candidate in a mouse may be readily obtainable — however, the question we wish to answer is: How effective is the drug in treating disease in humans? Similarly, the half-life of a toxicant in the body may provide some pertinent information — however, the question that we wish to answer is: How much of the toxicant can reach its specific site of action and what is its concentration–time profile at that site? Deconstructing a ‘global’ research question into more specific sub-queries, answerable through the use of a strategic collection of alternatives, can help to replace animal use and provide more relevant information for a given query. What is needed is a definable process by which all potential alternatives are thoroughly explored, and whereby pieces of information are brought together to enable a decision to be made regarding product development or safety assessment.
Thus, implicit in our proposed definition, is the need to develop and adopt fit-for-purpose NAMs that can provide the foundation for a future-proof gold standard. NAMs are evolving rapidly, and a review of all available techniques is beyond the scope of this article. The reader is referred to the indicated references for further information on: a general overview of NAMs and their applications;2–4 in silico modelling (including (quantitative) structure–activity relationships ((Q)SARs), similarity searching and read-across; 5 microphysiological systems, including organoids and (multi)organ-on-a-chip technology;6–8 in chemico testing; 9 and (multi)omics.10,11
To date, NAMs have been used in product development, particularly within the plant protection and pharmaceutical industries12,13 — however, the main application of NAMs has been in safety assessment. Notably, the EU Cosmetics Products Regulation (Regulation (EC) No. 1223/2009), banning the marketing in the EU of cosmetic products or ingredients tested on animals, 14 has served as a key driver for NAM development. Many lessons have been learned from using NAMs to replace or reduce animal testing in safety assessment, and there is now increasing potential to broaden the application of NAMs to other areas. Looking into the future, ideally NAMs would be used as replacements for animal testing across all sectors and applications where animals are still used. Whilst we are not currently in a position to replace all animal experiments — let alone replace all animal-derived materials in assays — it is important to recognise these as key goals that need to form part of a future-proof, globally-harmonised Three Rs strategy. Maintaining a focus on the replacement of both animals and animal-derived materials in the long term, will help to ensure that current efforts and resources are directed accordingly.
Animal assays as the gold standard?
Historically, the term ‘gold standard’ was used to refer to a monetary system where currency was linked to a fixed quantity of gold. Whilst this international standardisation served its purpose well for almost 100 years, it was ultimately succeeded by other methods. No country now uses the gold standard as a direct link to currency value. This practice has become obsolete, as alternatives that are more suited to the intended purpose have become available — a fitting metaphor for animal testing and its replacement. Historically, when there was a need to determine potential effects of chemicals on humans, animal tests were the default method of assessment. These tests served their purpose well at the time, and provided very valuable insights into the interactions occurring at the interface of chemistry and biology. They remain of value where no suitable alternatives are currently available — however, the accolade of being a gold standard is increasingly being challenged for several reasons, including: ethical concerns; financial costs; and the poor translation of results from observations in animals to those in humans.15,16 Delay in product development is a significant ethical concern in the pharmaceutical industry, where misdirection of resources (due to over-reliance on animal testing) and subsequent delays in getting new drugs to market, results in fewer treatments being available. This results in a significant cost in terms of human health and longevity. 17
A revision of what we perceive as a gold standard is required, and this needs to be demonstrably valid for contemporary applications in research, development and safety assessment. Validity of individual assays is generally assessed in terms of their relevance, reproducibility and reliability. In the development of NAMs, these criteria are often stated as being key to their acceptance — however, confidence in the relevance, reliability and reproducibility of animal assays is arguably misplaced. Animal studies may lack relevance where significant inter-species differences in uptake, transport, metabolism and efficacy render a chemical non-toxic in humans, despite demonstrable toxicity in animals. Conversely, many drug candidates that have been shown to be of therapeutic benefit in animals, ultimately show no efficacy — or may even elicit significant toxicity — in humans. Sewell et al. also question the use of animals as a gold standard, given that the use of rodents in safety assessment has been reported to have a true positive human toxicity predictivity rate of only 40%–65%. 18 Similarly, Ouedraogo et al. report lack of concordance, both within and between species, for a range of toxicity endpoints. 19 Should the methods designed to replace animal testing be held to a higher standard than the original test methods themselves, or are there more appropriate criteria against which to judge the validity of an approach?
Humans are exposed to a plethora of chemicals, such as pharmaceuticals, environmental contaminants, cosmetics, industrial and household chemicals, constituents of medical devices, food additives, food contact materials, etc. There is an ever-increasing demand from both the public and regulators to (re-)evaluate chemicals by using state of the art methods. Consequently, there is a need to establish robust and reliable mechanisms for assessing safety and efficacy, that will remain relevant in the future as science and society evolve. There is currently significant momentum in this area, with many stakeholder groups and organisations promoting workshops or roadmaps focused on the replacement of animals and animal-derived material.20–22
Some key themes arising from these efforts are summarised below, as highlighted within ATLA’s VSC on Redefining the Gold Standard and other relevant publications. These themes focus on some of the identifiable needs that are crucial to the successful future development of the science, namely: — formulating the relevant question; — identifying the means by which the question may be answered; — engaging with people and organisations who can instigate change; — seeking legislative and regulatory change; and — investing in NAMs.
Key themes in redefining the gold standard
Formulating the relevant question
The gold standard must be able to provide robust and reasoned answers to questions concerning chemicals of interest, whilst recognising that protection of humans, animals and the environment is of paramount importance. Relevant questions are context-dependent; examples include: — Is this chemical safe under the anticipated conditions of use? (This question is applicable to all chemicals, from those present in cosmetics to those used for industrial manufacturing). — How should the Point of Departure (or Reference Point), to be used in safety assessment, be selected? — Is this drug effective and selective in action? — Does this drug have a suitable pharmacokinetic profile in humans? — Is there a potential for this chemical to bioaccumulate in environmental species, or persist in compartments such as soil, water or atmosphere? — Where repeat dosing scenarios are likely, how may effects change over time? — Is aggregate exposure from multiple sources a concern?
A method that predicts the outcome of a particular animal assay is unable to answer these more pertinent and complex questions. As stated by Ankeny et al., “the questions that we pose and how they are framed are as important as the answers that result.” 23 This philosophy resonates with recent innovations in industrial practice, where protection of human health is considered the motivation, rather than the ability of a new assay to predict results from an animal test.24,25 Whilst it was initially considered logical to compare data from newly developed non-animal methods to results from animal tests, problems with this approach soon became evident. Simple endpoints such as skin or eye irritation may be amenable to this approach, but for the more complex endpoints, such as developmental and reproductive toxicity, suitable comparisons from assay outputs are not possible. 26 Generally, one-to-one replacement of animal tests with non-animal alternatives, with validation using animal data, is neither realistic nor practicable. The desire is not to predict the outcome of the animal test, but rather to reach a decision based on human-relevant science.
Worth et al. demonstrate how this may work in practice with the concept of ‘equivalent protection’, arguing that, rather than attempting to redefine a gold standard, it may be better to bypass the concept entirely. Instead, the authors focus their strategy on questioning how to conduct classification and labelling, and risk assessment exercises, to obtain an answer that is protective of human health whilst avoiding animal testing. They proposed ‘read-across with a twist’ — i.e. reading across the risk management outcomes, rather than the more standard approach of reading across adverse outcomes. This would enable new methods to be judged on “mechanistic relevance, reproducibility and importantly their ability to inform the right decisions, without trying to predict the outputs of dated and highly variable animal studies”. Worth et al. assert that truly focusing on protection rather than prediction, would help to expedite validation and acceptance of alternatives and would aid implementation of the EU Commission’s Roadmap toward the phasing out of animal testing in chemical safety assessments. 26
Identifying the means by which the question may be answered
In seeking to answer any question regarding safety, efficacy or exposure to a chemical of interest, it is essential to avoid any innate bias regarding how best to find relevant answers. NAMs can provide a wealth of information that, when combined within a weight-of-evidence or tiered testing strategy, can provide meaningful answers. As the array of NAMs expands, it is increasingly difficult for individual researchers to maintain a sufficient breadth of knowledge to encompass all new developments. However, support is available to assist researchers in identifying appropriate sources of information. These may include use of historic or biomonitoring data, literature searching or formal systematic reviews. Ratajeski and Miller have published guidance specifically on conducting literature searches for alternative methods. 27 Literature searching tools are increasingly exploiting artificial intelligence (AI) to perform more detailed searches, with some tools explicitly focusing on identifying alternatives to animal testing. 28 However, AI for literature searching needs to be used with caution. Tosi conducted a study to compare results from generative AI reviews with human-led reviews. Use of AI expedited the review process, enabling researchers to focus more time on critical analysis, but AI suffered from hallucinations, inaccurate referencing and lack of comprehensiveness. Consequently, a hybrid approach, combining AI searching with human oversight, was recommended. 29
Effective literature searching may identify suitable alternatives, but it is essential to ensure a thorough investigation is conducted. To assist the process, Dukes et al. have published a practical ‘Replacement Checklist’ to be used by researchers, Animal Welfare and Ethical Review Bodies (AWERBs), Animal Welfare Bodies (AWBs), Animal Ethics Committees, and journal editors and reviewers.
30
The checklist prompts users to critically assess whether or not replacement opportunities have been thoroughly explored and provides a template for recording the process. The questions are formulated as ‘What, Where, How, When, Who and Why?’:
—
What subject area(s) did the search(es) cover?
—
Where was information obtained?
—
How was the search conducted?
—
When was info published, and search(es) completed?
—
Who was approached for advice?
—
Why were results of the search(es) rejected?
Under each of these main question headings are a series of sub-queries to provide more explicit guidance on searching; the complete checklist is available online. 30
Implicit in any search, is that the data or methods are accessible to a wide audience. The FAIR Principles (Findability, Accessibility, Interoperability and Reusability) were originally defined for data 31 and later expanded to encompass in silico tools for toxicology and other areas. 32 The philosophy behind the FAIR Principles is relevant to a range of alternatives methods, with findability and accessibility being of paramount importance. The European Partnership for the Assessment of Risks from Chemicals (PARC) has been proactive in developing tools to assist the implementation of the FAIR Principles in the context of Safe and Sustainable by Design framework. 33
Several initiatives have aimed to organise and/or map existing NAMs so that they are accessible, and their potential uses and applications are more apparent. The adverse outcome pathway (AOP)-Wiki project provides a framework for organising information available on NAMs, as well as other resources that may assist in filling data gaps. 34 Ontologies, providing standardised vocabularies that support semantic interoperability and machine readability of AOP components, can be combined with in silico methods (e.g. QSARs). These well-annotated, ontology-driven AOP data enhance predictive toxicology and support regulatory decision-making with reduced reliance on animal testing. There are several organisations that provide lists of available NAMs and their applications. For example: the 3Rs resource library — available from the NC3Rs; 4 Norecopa’s fully searchable databases and guidelines; 3 and the Re-Place database, 35 which provides an overview of NAMs, along with key names and organisations with expertise in use of the techniques.
Engaging with people and organisations who can instigate change
Public opinion has been indispensable in focusing attention on the Three Rs and driving changes in policy. This has been achieved through exercising consumer choice in purchasing and direct petitioning of those representing the public. For example, in 2023, a European Citizens’ Initiative, ‘Save cruelty-free cosmetics — Commit to a Europe without animal testing’, resulted in the European Commission committing to the creation of an EU Roadmap toward the phasing out of animal testing for chemical safety assessments. 22 Various workshops, with the aim of identifying challenges and solutions to animal-free safety assessment, have brought together hundreds of relevant stakeholders, including Members of Parliament, representatives of EU Commission agencies, animal welfare non-governmental organisations (NGOs), industry representatives, academics and leaders of large international research consortia, all of whom can make an effective contribution to change. (Regulatory or legislative changes are critical to the process and are discussed separately below.)
Changes in policy begin at a grass-roots level, with pressure applied through individuals and organisations taking collective action. Advocacy groups have the advantage of being flexible, adaptable and able to forge connections between disparate groups, encouraging buy-in from stakeholders and facilitating progress in replacement. 36 Groups that advocate for the use of animals in research, efficacy and safety testing (where no alternatives are yet available) also play a key role in determining trust in NAMs, establishing their potential use and limitations, and supporting decisions as to where future efforts are best directed. 37 Charitable organisations that promote animal welfare are too numerous to mention. However, these organisations play an important role in advocacy and galvanising action toward replacement. Activities such as organising workshops and conferences, help bring together a wide range of experts with a common aim, who can collectively move the science forward. One example is the recent conference on ‘Best practice in non-animal methods’, held in York, UK in March 2025, which was jointly organised by Replacing Animal Research, 38 The Humane Research Trust 39 and the Centre for Human Specific Research. 40
Another interdisciplinary expert workshop was recently held in Switzerland, and specifically focused on how universities could take the lead in accelerating animal replacement. 41 Workshop attendees identified common issues affecting universities that can result in inertia, rather than action, in replacing animals. These issues include the continued use of animals in taught courses and a bias toward traditional (animal) methods in research. Early career researchers (ECRs) are more likely to adopt practices that they are familiar with, when they later become independent researchers. They are often reluctant to challenge the practices of more senior colleagues in their establishment, leading to a vicious circle of continued animal use. Therefore, there is a clear need to ensure that ECRs are aware of, and trained in the use of, NAMs. Deckha et al. propose that universities use their role as pioneers, to steer future research toward the use of alternatives. This could be achieved through: incorporation of pro-replacement terminology in university statements; redirecting the curriculum; encouraging changes of values within science faculties; and establishing units tasked with transitioning to non-animal alternatives. 41
Continuing use of animal testing, despite alternatives being available is referred to as ‘animal-methods bias’. This has been identified in research conducted at universities and other organisations, 41 as well by those making funding decisions, 42 and in publishing research findings.43,44 In this context, results from animal methods are viewed more favourably than those from non-animal alternatives. This has led to requests from reviewers for results from NAMs to be validated against animal assays; this negates the entire exercise of using alternatives. The Coalition to Illuminate and Address Animal Methods Bias (COLAAB) provides advice for authors on opposing such requests. 45 Similarly, Madden et al. provide advice for reviewers and editors to avoid such bias when assessing manuscripts. 46
National centres for the Three Rs are important organisations that advocate for change at a national level, whilst working collectively to harmonise Three Rs strategies internationally. Neuhaus et al. recant the history of these organisations and summarise key activities of more than 50 Three Rs centres across Europe.47,48 Notably, the UK NC3Rs coordinates a NAMs network that brings together researchers and developers in academia and industry, as well as regulatory end-users, to accelerate the adoption of NAMs through promoting dialogue and information exchange (see https://nc3rs.org.uk/3rs-resource-library/nams-network). 49
APCRA (Accelerating the Pace of Chemical Risk Assessment; https://apcra.net/) is an international collaboration between governments, to share knowledge regarding new hazard, exposure and risk assessment methods for chemical safety evaluation. Their work involves creating a common understanding of the current status of NAMs, including an assessment of realistic benchmarks of performance in different regulatory contexts. The organisation arranges workshops and case studies of mutual interest to address limitations to the uptake of NAMs. Recently, Wambaugh et al. published an overview of how to integrate toxicokinetic modelling to understand how chemical exposures translate into tissue concentrations, how long substances persist in the body, and through which routes they are eliminated. However, for most chemicals, toxicokinetic (TK) data remain unavailable. To help bridge these critical data gaps, researchers and regulatory experts from the international APCRA consortium presented a flexible framework to evaluate the applicability of NAMs in toxicokinetics for addressing key chemical risk assessment needs using a high-throughput toxicokinetic (HTTK) approach. 50
Large-scale international research consortia, such as the ASPIS cluster of projects, can also exert significant influence due to the collective expertise of so many partners working together on Three Rs activities. 51 ASPIS itself comprises three projects — ONTOX, 52 PrecisionTox 53 and Riskhunter 54 — with the collaboration of over 300 scientists and 70 organisations. The PrecisionTox consortium (Working Group 6; Regulatory Analysis & Application) undertook semi-structured interviews with 32 stakeholders, including representatives from industry, regulators and policy makers from both the European Union (EU) and other regions. Their report highlights numerous socio-technical barriers to the uptake of NAMs. Interviewees reported that a focus on hazard assessment, rather than exposure-based approaches, was one of the barriers. 55 Such issues could be addressed through the integration of NAMs that harness information not only on hazard, but also on exposure, providing greater scope for their future application and regulatory acceptance.
International organisations, such as the OECD (Organisation for Economic Co-operation and Development) can provide harmonised guidance on the use and acceptance of results from NAMs. The OECD plays a leading role in advancing the development and regulatory acceptance of NAMs through the coordinated efforts of its Working Party on Hazard Assessment (WPHA) and the Working Party of the National Coordinators of the Test Guidelines Programme (WNT). 56 The WPHA focuses on promoting the integration of NAMs into chemical hazard assessment frameworks, supporting scientific collaboration and data sharing among member countries. In parallel, the WNT works to develop and validate OECD Test Guidelines that incorporate NAMs, facilitating international harmonisation and regulatory uptake of non-animal testing strategies. 56
The OECD’s Mutual Acceptance of Data (MAD) system underpins international regulatory collaboration by ensuring that safety data generated in one member country, in accordance with OECD Test Guidelines and Good Laboratory Practice (GLP), are accepted in all other member countries. As NAMs become increasingly incorporated into OECD Test Guidelines, the MAD framework plays a vital role in facilitating global acceptance and reducing duplicative testing, particularly supporting the transition toward more ethical and efficient chemical safety assessments. Ultimately, it is the regional regulatory authorities that are required to make decisions based on NAMs. These organisations therefore have significant influence on NAM use and there is a plethora of publications evidencing their commitment to the transition to alternatives.
The European Food Safety Authority (EFSA) has significantly advanced the integration of NAMs into chemical risk assessment. In 2022, EFSA published a strategic roadmap outlining its vision for adopting NAMs in next-generation risk assessments by 2027. This roadmap emphasises the use of mechanistic data, AOPs, and technologies such as in vitro models, PBK (physiologically-based kinetic) modelling, omics and in silico tools, to reduce reliance on animal testing. 57 EFSA has applied NAMs in several pilot projects, including an evaluation of the immunotoxicity of perfluorinated substances (PFAS) by using immune cell assays and computational models, 58 and the use of 3D liver spheroid models for genotoxicity testing of bisphenol A (BPA) alternatives. 59 It has also developed case studies in pesticide and nanomaterial assessments, 60 exploring the use of QSARs, exposure modelling, and AI-driven tools. In parallel, EFSA has worked with international partners, including the OECD and US Environmental Protection Agency (EPA), on guidance for developmental neurotoxicity testing using NAMs. 61 Further to this effort, Wood et al. have recently published an overview of NAMs applicable for food safety assessment and a strategy to increase implementation of NAMs for this purpose. 62 Within the UK, the Food Standards Agency (FSA) and Committee on Toxicity of Chemicals in Food, Consumer Products and the Environment (COT) have organised a series of workshops on how to implement NAMs in safety assessment. 63 This has led to the development of a UK roadmap toward the greater regulatory use and acceptance of NAMs. 64
A change in mindset for those involved in instigating change needs to occur, whereby NAMs are viewed as the default option, rather than seeking to conduct or recapitulate animal tests. Capacity building is required in order to increase the number of personnel who are trained in the use and understanding of NAMs. 64 This need can be partially met by modernising the education provided by universities, 41 but more targeted training from in-house courses and learned societies will also be required. To address the lack of training in NAMs, the British Toxicology Society has developed its ‘Skills Gap Initiative’ to identify and address training needs. 65
Seeking legislative and regulatory change
One over-riding conclusion from Ankeny et al.’s article on what can be learned about replacement from history, philosophy and social studies of science, is that “NAMs are most likely to be rapidly adopted if they are encouraged or required by regulation”, and this has been demonstrated previously. 23 A notable example of regulatory changes leading to adaptation is the EU Cosmetics Products Regulation (Regulation (EC) No. 1223/2009), which banned the marketing in the EU of cosmetic products or ingredients tested on animals. 14 Despite alternative methods being available for many years (e.g. for identifying skin sensitisers), uptake was not consistent until the ban came into force. It is currently reported that 45 countries — including all EU states, the USA, Canada, Australia, New Zealand and Brazil, among others — have now banned the testing of cosmetics on animals. 66
A more recent example of regulatory intervention is the testing of parenteral therapies for pyrogens. Pyrogenicity testing is compulsory for all injectable drugs, vaccines, medical devices, etc. The rabbit pyrogen test, which uses live adult rabbits, was developed in 1942; in 2021 it was estimated that 400,000 rabbits were still being used worldwide for this test, despite an alternative method being approved for use in 1977. 67 Initially, alternative methods relied on the use of clotting factors obtained from horseshoe crabs. Concern has been raised regarding the ethical use of these crabs, which are listed as ‘vulnerable’ on the International Union for Conservation of Nature (IUCN) red list of threatened species. 68 Additional concerns regarding sustainability and variability led to the development of recombinant bacterial endotoxins tests (rBET). Following the establishment of rBET in the European Pharmacopoiea, the Physician’s Committee for Responsible Medicine hosted discussions to address barriers to using this test for US Food and Drug Administration (FDA) requirements. 69 The rabbit pyrogen test will now be omitted from the European Pharmacopoeia as from 1st July 2025, with global efforts ongoing to replace the use of animals or animal-derived products in pyrogenicity testing entirely.
These examples demonstrate how continued public pressure and engaging with stakeholders can result in legislative changes, albeit at a rather slow pace. They also show that disparity of regulatory practice across different geographical regions, whilst problematic, can be usefully leveraged, as change in one region may precipitate change in another region. The need for internationally harmonised agreements and legislative changes has been highlighted previously.18,70
The pharmaceutical industry has made significant progress in the Three Rs — however, Harrell et al. suggest that industry and regulators would be more accepting of alternatives if they were validated and ‘globally accepted’. They report numerous examples of where the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH) guidelines have contributed to the Three Rs. 13 At workshops to discuss the EU Commission’s Roadmap toward the phasing out of animal testing in chemical safety assessment, participants also emphasised the importance of international harmonisation of legislation or regulatory guidance. It was noted that future regulatory frameworks would need to be sufficiently flexible to accommodate ongoing improvements in the science. 22
Investing in NAMs
Unsurprisingly, calls to increase funding for the development of NAMs were ubiquitous among the articles discussed above. The need for better funding for NAMs was previously highlighted in a UK All-Party Parliamentary Group (APPG) report in 2022. 71 The report found that funding for NAMs in the UK represented 0.2–0.6% of biomedical research funding (i.e. 0.02% of the total research and development funding) for 2019–2020. Fortunately, in the following five years, investment in NAMs increased, with schemes such as the non-animal methods infrastructure grants (via the UK NC3Rs) that specifically focus on projects to develop replacement methods such as NAMs. 72 The need for investment to accelerate the development, standardisation and application of NAMs in the shorter term, has been recognised. However, it has been proposed that this initial outlay would, in the longer term, result in a significant return on investment, as NAMs will save time, resources and animal use in developing products in the future. 18
A new gold standard
Given the breadth and complexity of questions to be addressed in relation to the development and safe use of multiple product types, it would be a fallacy to attempt to define any individual non-animal technique, or collection of techniques, as a new gold standard to replace the erroneously assigned ‘gold standard’ of animal testing. What is needed is a flexible, adaptable process for obtaining an answer, based on the use of a carefully selected collection of NAMs, to meet a specific information requirement (e.g. product development, safety assessment, environmental exposure, etc.). The process should be consistently applied, ideally globally-harmonised, and future-proof. Whilst NAMs are currently used more for chemical safety assessment, ideally in the future, the process would be applicable to any scenario where information is sought, that would traditionally have been obtained by using an animal test. The ultimate goal would be to only use alternatives that avoid the use of all animal-derived materials (i.e. complete replacement alternatives) — however, it is recognised that this would be a longer-term ideal. Hence, we propose that a new gold standard could be defined as: “a process by which a meaningful answer to a specific, accurately-defined question, is obtained without using animal testing or animal-derived material.” Such a process is outlined, in principle, in Figure 1. A new gold standard: a process by which a meaningful answer to a specific, accurately-defined question is obtained, without using animal testing or animal-derived material.
Key to this definition is that every aspect of how the data are obtained, remains flexible and adaptable to the changing landscape of NAMs. As new NAMs are developed, these may be selected, or older assays deprecated, without changing how the process operates. Constantly performing a thorough evaluation of potential new sources of data is fundamental to this process, as indicated in step 2. Any, and all, techniques can be used to address the information requirement, provided their selection is adequately justified in step 5. Historical data, biomonitoring or human data are all potential sources of information — however, NAMs are likely to be the most commonly-used tools. There is an ever-expanding array of NAMs available, but if these assays are to be used to generate the required information, then a concerted effort is still needed to increase the acceptability of the methods. The review by Hope and Bailey identifies barriers to the wider utilisation of NAMs, including concern over the lack of validation and standardisation, the lack of funding, and reliability concerns. 73 Hope and Bailey also list some recommendations for future action, centred on addressing these issues. 73
Ten key recommendations for increasing the acceptance and accelerating the adoption of NAMs are presented below; these have been developed from consideration of some of the major themes identified above.
Recommendations to increase the acceptance and accelerate the adoption of new approach methodologies (NAMs)
Many organisations have been working for decades on developing and/or promoting NAMs. Numerous strategies have been trialled for combining results from NAMs in order to reach a decision. These include: Next Generation Risk Assessment; 74 Defined Approaches; 75 and Bioactivity Exposure Ratios. 76 However, there remains reluctance, from some researchers and in particular from regulators, to accept results from individual NAMs or combinations thereof. Herein, we present ten recommendations to increase confidence in the use of NAM data and promote acceptance of the results from these methods.
Define the current landscape of NAMs
A scoping exercise to establish what is currently available could be achieved in the short term, although continual horizon scanning would be needed to monitor developments, trends and identified needs in the future. Data concerning existing NAMs can be captured in terms of: — What information does the NAM provide? — For which purposes may this information be used? — Are the data standalone, or how best can they be integrated with other information? — For which chemistries can this method be applied, and is it possible to broaden the chemical scope of the method? — Is the technology reliable, reproducible and scalable to high-throughput? — What are the cost implications for further development and scale-up? — Has the method been formally ‘validated’ or evaluated in terms of fitness for a given purpose? — Is the method considered fit for regulatory purposes? If so, in which regions and how can this be harmonised globally? If not, what would be required to bring it up to an acceptable standard?
Aside from organising the information on available NAMs, there is also a need to identify where there are gaps in the technology — i.e. those data needs for which no NAMs are currently available. Through the mapping of the existing NAM landscape, we can identify where the gaps exist and focus future NAM development on addressing these gaps.
Ensure correct problem formulation
Correct problem formulation or ‘asking the right question’ is more complex than it may appear. There are many diverse questions, depending on the context. For example, in chemical safety assessment, questions may be: What data would we need to ensure protection of human health and the environment? or Are there specific groups in a population that may be more at risk from this chemical? In drug development, a common question is: Which drug candidate should be advanced to the next stage? In product optimisation, in many industries, the question may be Which formulation is optimal? Thus, we need to determine the ultimate purpose of the investigation, and which data could provide a satisfactory answer.
The question needs to be clearly defined and specific, addressing What precisely do we need to know? or What is the knowledge gap? This avoids more misleading questions, such as: What can we measure? or Which assay is commonly used? The solution needs to focus on how best to answer the question at hand, avoiding entrenched bias toward traditional (animal-based) approaches. Recent articles have highlighted the problem, particularly for ECRs, of trying to introduce use of alternative methods in place of animal tests at institutions where animal use is firmly established. 41 This issue is exacerbated where the publication of results is an important part of the process — for example, in the academic sector. Publication bias, where certain journals or reviewers are more favourably disposed to animal assays, or may request that animal assays are conducted to confirm the results of a NAM, can negatively impact a researcher’s future prospects. 45 All such bias should be avoided in formulating the appropriate question.
Engage with relevant personnel, stakeholders and organisations
The above questions can be summarised as What specific information is really needed? This leads to two related questions: Who can help us obtain the information? and Who else needs this information? To answer these questions, it is essential to communicate effectively with a range of personnel — in-house, as well as external organisations and stakeholders.
Given the rapid evolution of NAMs, it would be impossible for individual researchers to keep abreast of all developments in the various fields. However, prior to beginning any research study, it is important to determine where and how NAMs may be employed. Librarians and research support personnel have a wealth of expertise to leverage. They may assist in finding the most up-to-date information on alternatives and their applications, so improving study design, particularly if approached early in project development. 27
Contract Research Organisations (CROs) that specialise in NAMs, can provide answers to research and development questions, as well as engage with regulatory authorities to promote acceptance of the results from these methods. The number of research projects developing NAMs, generating data or validating NAMs has vastly increased in recent years. Examples include the ASPIS cluster of projects, 51 EU ToxRisk (focused on chemical safety assessment) 77 and e-TRANSAFE (focused on drug development). 78 This translates to hundreds of organisations with shared interests, as well as great potential for expert knowledge elicitation. Meetings, workshops and conferences can bring together experts across different disciplines and provide a platform for discussing NAM-related developments. This includes the success stories others can learn from, as well as the more salutary lessons learned from failure.
Learned societies, charities, advocacy groups, and small and large research organisations, can all contribute to these efforts. In many cases, it is ultimately the decision of a regulatory authority as to whether results from a NAM are considered acceptable. Liaising with these stakeholders throughout the process helps to develop a mutual understanding of what would be acceptable, and this helps to direct future efforts. It must be recognised that each of these stakeholders will have a different mission and varying needs, and hence a different vision for how a particular NAM should be utilised. Researchers are attempting to find new ways to answer societally important questions; regulators need to protect consumers and the environment; CROs need to establish a working business model; and funding organisations need to demonstrate that their funds are being used appropriately to effect change. Aligning these different expectations, as far as possible, will help to ease the transition to NAMs.
Ensure available NAMs are reliable, reproducible, fit-for-purpose and accessible
It is implicit in developing NAMs for real-world use, that these will be evaluated according to research community standards, relevant to the specific area. Whilst universal agreement on what makes a NAM ‘valid’ is not easy to obtain, within a specific field it is easier to determine ‘fitness-for-purpose’ by using accepted criteria for reliability and reproducibility. Note that a given NAM may be suitable for one purpose (such as screening or guiding decisions in product development) but unsuitable for another purpose (such as regulatory decision making). General guidance for good practice in conducting and reporting in vitro assays has been published, 79 as well as guidelines for more specific assay types.80,81 Similarly, guidance for the development, assessment and reporting of in silico models has been available for many years.82,83
NAMs must also be accessible to users. The FAIR Principles, established in 2016, called for scientific data to be accessible for both human and automated systems. 30 More recently there have been efforts to apply these Principles to in silico predictive toxicology models, with these models being made more accessible through initiatives such as BioModels and QSARdb.org. 32 The FAIR Lite Principles (derived from the original FAIR Principles) provide a simple and useful checklist, comprising four criteria. These are applicable to all types of computational toxicology models, to establish the extent of their adherence to the FAIR Principles. 84 Other sources of information on available alternative methods include: COLAAB; 85 the RE-place project; 35 EURL ECVAM’s dataset on alternative methods to animal experimentation; 86 Norecopa; 3 and the UK NC3Rs 3Rs resource library. 4 Quality assurance of existing NAMs, to ascertain their reliability, reproducibility and accessibility, should be achievable within the short- to medium-term. It is important to note that we are not seeking to identify a single NAM (or collection of NAMs) that could be considered a ‘gold standard’ — as any such method would become obsolete in time. The proposed new gold standard is the process by which we can obtain relevant data without using animals. The use of well characterised NAMs is central to this, although the specific NAMs that should be selected depends on the question to be addressed.
Increase confidence in NAMs by establishing benchmarks and accounting for uncertainty
Despite an abundance of new and increasingly sophisticated techniques, there is still a lack of confidence in the results that NAMs can provide, or how these can be interpreted in the context of risk assessment or product development. The formal process of validating an assay or model is cumbersome and outdated. The process is too time-consuming and inflexible to be applied in the rapidly expanding field of NAMs. Pragmatic ‘benchmarking’ would be more appropriate, whereby a more flexible approach could be adopted, depending on use or regulatory requirement. This would involve a more flexible process, demonstrating suitability for purpose (including a definition of the applicability domain) and ensuring that either the method is fully transferable between laboratories, or that trusted laboratories (e.g. national laboratories for standards or CROs) are approached and agreement secured to conduct studies in a reproducible manner.
A framework for establishing confidence in NAMs, published by van der Zalm et al. in 2022, 87 comprised the following five elements: fitness for purpose; human biological relevance; technical characterisation; supporting information regarding data integrity and transparency; and independent review. These elements should demonstrate how the evidence could be integrated to support (or oppose) use of NAMs for a given purpose. The aim of the framework was to encourage more holistic chemical assessment without relying on comparisons to animal testing. Increased confidence should be achieved through better understanding of the biological mechanisms underpinning a (toxicological) response, leading to improved protection of human health.
Any method will be associated with a degree of uncertainty in the results obtained, and this needs to be identified and characterised. Patterson and Whelan describe two types of uncertainty: (i) aleatory, or irreducible, uncertainty due to randomness or variability in the system; and (ii) epistemic uncertainty that can be reduced by learning more about the system. 88 Uncertainty needs to be minimised as far as possible, with any residual uncertainty, and its source, being clearly communicated. Achar et al. report on a framework for categorising sources of uncertainty in in silico models, 89 and there are specific examples of methods to identify, assess and reduce uncertainties in QSARs and read-across.90,91 Magurany et al. provide a pragmatic framework for the application of in vitro NAMs in risk assessment. 92 Within this framework they identify multiple sources of uncertainty in both in vitro assays and within human populations, and provide guidance on how these can be managed in a NAM-based risk assessment strategy.
Develop meaningful, real-world exemplar studies
Case studies are commonly used to demonstrate the effectiveness of a new method — however, these need to be carefully designed such that their robustness and ability to answer a specific question can be showcased. Another effective demonstration of suitability can be achieved via ‘retrospective comparator studies’. These involve comparing the outcomes for a decision based on traditional methods (using available animal data) to outcomes for a decision based on NAM-derived data. These can help to identify where the approaches may be readily applicable, or where further work is needed to ensure protection of humans and the environment. A notable example is the work of Baltazar et al., who reported a next-generation risk assessment case study for coumarin. 25 Here the authors used maximum concentration in plasma (Cmax), derived from a physiologically-based kinetic (PBK) model, and a Point of Departure derived from in vitro assays to determine a margin of safety for coumarin. Comparing the margin of safety derived using NAMs to that obtained with traditional methods showed that using NAMs was “at least as protective as the risk assessment based on traditional approaches”. 25 Publishing such case studies, which show that the methods are protective when compared to in vivo studies, and submitting these as evidence to regulators, increases exposure of the methods and helps to increase acceptance. Another possibility is the submission of ‘dual data packages’, where NAMs data are compared to traditional data types to determine whether the risk assessment outcomes are equally protective. Overall, the broad regulatory acceptance of NAMs is predicted to be a longer-term achievement. 18
Use appropriate tools for reporting and organising NAM data
NAMs can provide a wealth of information regarding the internal exposure and potential effects of chemicals on humans, other animals and the environment. An individual NAM often represents only a single aspect of the system. The power of integrating NAM data is that putting together a collection of inputs and responses provides unparalleled insight into the mechanisms behind a given response. This is best exemplified by Adverse Outcome Pathways (AOPs), where the route from exposure to response is broken down into a sequence of events. This starts from a Molecular Initiating Event (MIE) — i.e. the interaction of a chemical with a biomolecule (such as a protein) — which then elicits an effect (such as an inflammatory cascade or activation of the immune system). This ultimately leads to an adverse outcome within the organism.
Each step in this pathway can be associated with a particular NAM. Data derived from a sequence of NAMs, each associated with an individual component of the pathway, provides evidence of the effects observed. The AOP-Wiki provides a template for organising and storing information relating to the AOP in an openly available format for any researcher to use. 34 AOPs are useful for hazard identification — however, quantitative relationships are required for hazard characterisation and risk assessment. Paini et al. have published a workflow for organising evidence into a framework for quantitative AOP (qAOP) characterisation. 93 These tools enable incremental increases in knowledge to be harnessed and applied across other areas. This helps to advance our mechanistic understanding and enable more robust predictions of response. Whichever NAMs are employed, it is essential to capitalise on the output of the assays, with appropriate tools used for data capture and reporting, to make the acquired data or methods accessible to others.
Make changes to legislation or regulatory requirements
Previous studies have shown that technological innovations alone, generally do not elicit behavioural changes in researchers with regard to their animal use. Therefore, despite the availability of suitable NAMs, they are less likely to be adopted unless there are changes to regulation, legislation and/or policy. 23 In relation to the testing of chemicals, EU Directive 276/33 states “The use of animals for scientific or educational purposes should therefore only be considered where a non-animal alternative is unavailable.” 94 Similarly, the UK legislation specifies that researchers must “ensure that the specified programme of work does not involve the application of any regulated procedure to which there is a scientifically satisfactory alternative method or testing strategy not entailing the use of a protected animal”. 95 In the USA, the FDA Modernisation Act authorises the use of alternatives in drug safety and efficacy testing and removes the requirement for animal testing to obtain a licence for biosimilars. 96 More recently, in April 2025, the FDA announced a plan for the phasing out of animal testing requirements for monoclonal antibodies, and other drugs, to be replaced with NAMs. 97
When tasked with assuring the long-term health and safety of consumers, animals and the environment, it is understandable that regulators wish to err on the side of caution. However, it must be appreciated that NAMs are not animal experiments and will not provide the same information. Regulators need to determine whether or not the information from NAMs is sufficient to answer a given question without relying on animal testing data which is wrought with problems of variability and lack of fidelity. Any changes to legislation require a realistic assessment of what the science is capable of, as well as a framework that is sufficiently flexible to adapt to the changing landscape of NAMs. Attempts to set deadlines for the phasing out of animal testing, without determining how information needs will be met, will ultimately lead to setbacks. For example, the US EPA’s plans to reduce mammal testing by 30% by 2025, and end mammal testing by 2035, were at one time abandoned, due to concerns that the alternatives could not meet the required standards. 98 It is heartening to note that the US EPA has recently re-committed to the goal of ending animal testing by 2035. 99
Clearly, the protection of consumers and the environment remains of paramount importance. However, regulators have a moral obligation to use their authority to promote and use NAMs, where these are available, and drive the development of new NAMs where methods are lacking.
Increase education and capacity building in NAMs
Another long-term goal is to better educate researchers and stakeholders as to the capabilities and applications of NAMs, in order to overcome the reluctance to move away from animal methods. The (re-)training of researchers toward the mindset of ‘alternatives first’, and how to formulate a research question in the context of what can be gained from NAMs, is essential. It is inherent in human nature to be more comfortable with the familiar, and this presents an additional psychological obstacle to replacement. Although UK and EU legislation mandate that alternatives to animals must be used where a scientifically satisfactory method is available, researchers may perform inadequate literature searches due to lack of training or experience. A ‘Replacement Checklist’ has been devised, in order to overcome these shortcomings. 30
Aside from being unaware of the diversity and capabilities of NAMs, there is a more serious threat to future scientific research. This has been referred to as the ‘skills decline’ and has been noted across a range of scientific disciplines. 63 Across all sectors, there is a need for targeted training opportunities to address the skills shortage. Colleges and universities cannot address this shortfall alone; it will require concerted effort from all stakeholders, including industry, regulators and learned societies.
Selectively invest in NAMs
Investment in NAMs will be essential to success — however, careful consideration is required regarding how best to direct future research efforts. In the short term, it may be beneficial to focus efforts on those methods that are likely to yield high value results more rapidly. For example, focusing on those technologies that are at a stage where they can be more readily applied to real-world scenarios for filling data gaps, rather than developing techniques of more ‘academic’ interest. Researchers should foster a collaborative, rather than competitive, approach to developing relevant NAMs and avoid potential redundancy of techniques. It is not intended that this approach should stifle innovation or ‘blue-sky’ thinking — rather that demonstrating successful applications of NAMs increases their credibility overall. Lessons learned from earlier case studies can provide a template for the successful development of improved NAMs in the future.
Allocation of resources is typically determined by governments and research funding organisations (including research councils) and charities. Here again, we observe the tendency to default to the familiar, rather than embracing new ideas. This can be a vicious circle, whereby researchers may continue to write applications biased toward traditional testing, as applications based on newer techniques can be perceived as higher risk and thus less likely to be funded.
Historically, NAMs have been less well-funded when compared to research involving animals. 71 Charitable organisations and animal advocacy groups have played a key role, for many years, in supporting novel research into alternatives. More recently, mainstream funding organisations have shifted focus toward a greater emphasis on alternatives. It is important to recognise that funding NAM development is investing in the best science for now and into the future. It is no longer acceptable to continue to use, without question, animal testing methods that are approaching 100 years old, when new techniques are available. Investment in NAMs presents a tremendous opportunity for governments and industries to demonstrate their commitment to green chemistry, sustainability and the principles of ‘safer by design’. 100 NAMs not only offer a better way of conducting scientific enquiries, but also present an opportunity for more ethical, targeted and future-proof investments. Investment criteria need to be sufficiently flexible to exploit new opportunities presented by the developing science, as well as being responsive to economic circumstances and societal and political agenda.
Conclusions
Given the failings, lack of reproducibility and translatability of animal tests, it is questionable that these should ever have been considered a ‘gold standard’. Hence, a new definition of what should be considered the gold standard in determining safety, efficacy and internal exposure of chemicals, is long overdue. Here we have proposed the following definition: “a process by which a meaningful answer to a specific, accurately-defined question is obtained, without using animal testing or animal-derived material”. The concept of a one-to-one replacement of an animal test with a single in chemico or in vitro assay or in silico prediction is unrealistic as, in most cases, this would not accurately capture the complexity of the response from a living organism. However, we can logically integrate information from an array of resources, in order to answer more pertinent questions.
NAMs are already used successfully to fill data gaps in product development and, in particular, in chemical safety assessment. There is much knowledge to be leveraged from our current use of NAMs; this should be exploited to facilitate the application of NAMs to all scenarios where animals are used. Ensuring that all opportunities to use alternatives are fully explored, forms part of the strategy for transitioning to NAMs. Further investment in developing appropriate, fit-for-purpose NAMs, as well as demonstrating their suitability, is essential for increasing their acceptance and ensuring their wider adoption in the future.
Footnotes
Ethical approval
Ethics approval was not required for this article.
Informed consent
Informed consent was not required for this article.
Funding
This work was supported by The Humane Research Trust CIO and the project RISK-HUNT3R: RISK assessment of chemicals integrating HUman centric Next generation Testing strategies promoting the 3Rs. RISK-HUNT3R has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 964537 and is part of the ASPIS cluster.
Declaration of Conflicting Interests
The Authors declare that there is no conflict of interest.
Disclaimer
This work reflects only the authors’ views; it does not reflect the views of the European Food Safety Authority, and the European Commission is not responsible for any use that may be made of the information it contains.
