Abstract
Artificial Intelligence (AI) has pervaded everyday life, reshaping the landscape of business, economy, and society through the alteration of interactions and connections among stakeholders and citizens. Nevertheless, the widespread adoption of AI presents significant risks and hurdles, sparking apprehension regarding the trustworthiness of AI systems by humans. Lately, numerous governmental entities have introduced regulations and principles aimed at fostering trustworthy AI systems, while companies, research institutions, and public sector organizations have released their own sets of principles and guidelines for ensuring ethical and trustworthy AI. Additionally, they have developed methods and software toolkits to aid in evaluating and improving the attributes of trustworthiness. The present paper aims to explore this evolution by analysing and supporting the trustworthiness of AI systems. We commence with an examination of the characteristics inherent in trustworthy AI, along with the corresponding principles and standards associated with them. We then examine the methods and tools that are available to designers and developers in their quest to operationalize trusted AI systems. Finally, we outline research challenges towards end-to-end engineering of trustworthy AI by-design.
Introduction
Artificial Intelligence (AI) has permeated every aspect of daily life and is fundamentally altering the landscape of business, economy, and society, redefining interactions and connections among stakeholders and citizens [89, 43]. Organizations utilize AI advancements to enhance predictions, refine products and services, foster innovation, boost productivity and efficiency, and reduce costs, among various other advantageous applications. Many forecasts from market analysts estimate considerable increases in investments in AI software that are expected to reach approximately $300 billion in 2027 [70, 54].
It is essential to emphasize, however, that the utilization of AI also presents significant risks and obstacles, prompting concerns regarding the trustworthiness of AI systems, encompassing data, algorithms, and applications [58]. Instances of biased, discriminatory, manipulative, unlawful, or human rights-violating AI deployments have exacerbated these concerns, leading to low levels of trust and acceptance. as evidenced in e.g. a recent study of more than 17,000 people from 17 countries globally [57]. The study found that while individuals express greater confidence in the ability of AI systems to deliver accurate and dependable results and offer valuable services, they doubt the safety, security, and fairness of AI systems, as well as their commitment to upholding privacy rights. Moreover, although people acknowledge the numerous advantages of AI, only half of the respondents believe that these benefits outweigh the associated risks. To address these concerns, the study found that individuals expect regulatory measures to be implemented for AI.
Indeed, recently various governmental bodies have issued regulations and principles for trustworthy AI systems. As an example, within its digital strategy, the European Union enacted the AI Act [48] to govern AI, aiming to create improved conditions for the advancement and application of this technology. Similarly, the US White House issued an executive order on AI [135], setting forth fresh standards to enhance AI safety and security, safeguard privacy, and promote equity and civil rights. In addition, during the AI Safety summit in November 2023 more than 28 countries from across the globe agreed to the Bletchley Declaration on AI safety [4], which establishes a shared understanding of the opportunities and risks posed by frontier AI and the need for governments to work together to meet the most significant challenges.
Numerous companies, research institutions, and public sector organizations have released principles and guidelines aimed at promoting ethical and trustworthy AI practices. One study analysed 84 documents on ethical principles for AI [77], while another examined and compared 22 guidelines, highlighting overlaps but also omissions [61].
However, despite this proliferation of guidelines and frameworks from different organizations, it is still a challenge to implement and operationalize trustworthy AI in practice due to its complexities [25]. The implementation and operationalization of ethical AI encompass various facets in both theoretical and practical research. This includes the design, development, deployment, testing, and evaluation of approaches, all of which are underpinned by advanced AI techniques [91, 38]. Recently software toolkits, approaches, and algorithms have been developed in order to support the assessment and enhancement of several trustworthiness attributes, such as fairness, explainability, etc. [80, 88, 137, 119, 24].
The present paper aims to explore this evolution in supporting the trustworthiness of AI systems. Our analysis starts from an overview of the properties of trustworthy AI, and the related principles and standards. We then examine the various methods and tools that are available to designers and developers in their quest to operationalise trusted AI systems. Finally, we outline research challenges towards end-to-end engineering of trustworthy AI by-design.
Trust in AI and trustworthy AI systems
Trust in AI systems
The attitudes of users toward AI systems are important in real-world AI applications. The level of trust that final end-users put in an AI system directly impacts the degree of adoption of the system. “Trust” is a multifaceted term with diverse definitions across different scientific disciplines, including psychology, sociology, economics, and computer science. Presently, there exists no standardized definition of trust [2]. Related research has found more than 300 definitions in various research areas [125]. Trust can take various forms; for example, interpersonal trust has been described as “if A [the trustor] believes that B [the trustee] will act in A’s best interest, and accepts vulnerability to B’s actions, then A trusts B” [75].
Nevertheless, when trust is directed towards a technological artifact rather than interpersonal relationships, the extent to which individuals place trust in technology hinges upon their beliefs concerning its technical attributes. This distinction has sparked debate, suggesting that technology (and consequently an AI system) cannot be trusted but only relied upon – thus it has been argued that we can only talk about the reliability and not the trustworthiness of technology [75]. Related literature argues that this objection is raised by recognizing the ‘duality of trust’ in technology, where humans rely on the technology itself and trust the technology supplier, hence trust in technology is shaped by perceptions of its functionality, utility, and reliability [40, 96]. Therefore, the attribute of trustworthiness implies that the use of an AI system that is deserving of trust is based on reliable evidence.
In contrast to other technological artifacts, AI systems present a peculiarity: they employ machine learning to recognise patterns in the training data which are then used to generate algorithms for supporting or making decisions. Since these algorithms were not explicitly developed by humans but depend on the training data, it may be the case that end users assign intention to the AI systems. Arguments have been made that viewing AI systems as possessing intentions runs the risk of treating them as moral entities, implying that they bear ethical responsibility for their decisions and actions [78]. This perspective obscures the ethical obligations of AI developers and could potentially enable developers to evade responsibility and accountability for the systems they create. While AI systems may bear causal responsibility for decisions or actions, it must be clarified that it is the AI developers who are ethically accountable for them [78].
The definition of trustworthiness of AI according to the International Standards Organization is “the ability to meet stakeholders’ expectations in a verifiable way” [72]. This definition underscores the unique characteristics of AI systems, particularly their potential autonomy and their complex interactions with the social environment. The inherent uncertainty in the impact of AI systems emphasizes the need for their trustworthiness, to ensure that they act in alignment with the expectations of their users and the society at large.
Trust in AI systems is considered “layered” [83, 115] since one has to consider all layers: the trust of data [81], the trust of technology [129], the trust of humans supervising or relying on it [27], the trust of organizations developing and deploying it [85], and finally the trust of the bodies regulating it [56].
Therefore, it becomes crucial to gain a clear understanding of the complexities underlying the trust dynamics between AI systems and their users, along with the prerequisites for crafting and implementing systems that demonstrate trustworthy attributes. The questions then become (i) which are the properties of trustworthy AI; and (ii) how can these properties be verified systematically?
Properties of trustworthy AI systems
According to the OECD, trustworthy AI refers to AI systems that adhere to the OECD AI Principles. These principles encompass AI systems that uphold human rights and privacy, are equitable, transparent, explainable, resilient, secure, and safe while ensuring accountability among all involved actors [111]. Representing the first AI standard at the intergovernmental level, these principles were endorsed in May 2019 by the 37 OECD member countries and five non-member countries, and further backed by the G20 in June 2019 [111]. The OECD AI Principles advocate for five values-based principles to guide the responsible management of trustworthy AI: inclusive growth, sustainable development, and well-being; human-centric values and equity; transparency and explainability; resilience, security, and safety; and accountability.
On the other hand, the European Union and the United States have developed a joint roadmap on evaluation and measurement tools for trustworthy AI and risk management [140]. This roadmap takes practical steps to advance trustworthy AI and uphold the shared commitment of the EU and the US to the OECD Recommendation on AI. The roadmap aims to establish a shared repository of metrics for assessing the trustworthiness of AI and methods for managing risks. Additionally, it has the potential to facilitate the development of collaborative strategies within international standards organizations focusing on Artificial Intelligence. The roadmap is informed by the efforts of the National Institute of Standards and Technology of the US Department of Commerce which has already developed the NIST AI Risk Management Framework and its related guides and tools [108] and the work related to the EU AI Act and he related deliverables of the EU High-Level Expert Group [47], such as the ALTAI Assessment List for Trustworthy AI [46].
The terminology of trustworthiness in AI is quite vague, ambiguous, and sometimes overlapping terms are used for various characteristics of trustworthiness which leads to confusing results. For example, another term that is commonly used refers to ‘responsible AI’, which addresses similar concepts, methods, and tools; see e.g. [10, 13, 18, 39, 124]. Actually, it has been argued that there is a risk that policymakers and the technical community could find themselves in what is referred to as the ‘Inigo Montoya problem,’ referring to a character in the novel and film ‘The Princess Bride’, specifically a scene in which Inigo Montoya understands the word ‘inconceivable’ differently than the main character, Vizzini, and says “You keep using that word, I do not think it means what you think it means” [118]. Similarly, the policymaking and technical communities are ascribing different meanings to the same term, leading to obvious problems in ensuring the trustworthiness of AI and its application. This may be due to several reasons ranging from the fact that related research is expanding and evolving rapidly, to the very nature of trustworthy AI, i.e. an emerging, multifaceted concept that is not bound within a singular research area.
It is quite fortunate that one of the first outcomes of the EU-US joint roadmap is an initial draft of terminology and taxonomy for Artificial Intelligence [45]. The draft terminology issued a list of 65 key AI terms essential to understanding risk-based approaches to AI. According to this terminology, the definition of trustworthy AI (based on a combination of the EU HLEG ALTAI [46] and the NIST AI RMF 1.0 [108]) should be lawful, ethical, and robust.
In parallel to the NIST AI RMF and the implementation of the EU AI Act, the International Standards Organization has recently (December 2023) published the ISO 42001 [73], which – similarly to the NIST AI RMF – is voluntary and is intended to be adaptable and scalable based on the needs of the organization that adopts it and its size. ISO 42001 outlines the requirements and offers guidance for instituting, executing, sustaining, and enhancing an AI management system within an organization’s framework. It can be regarded as a complement to the NIST AI RMF, since both ISO 42001 and NIST address policies and governance procedures that organizations should contemplate to comprehensively oversee AI systems.
Principles, standards, methods, and tools
Principles for trustworthy AI
Currently, there are more than 1,000 AI policy initiatives from 70 countries around the world, as well as over 170 emerging initiatives that address topics of reliable and trustworthy AI [113]. These have been registered in the database of national AI Policies and strategies of the OECD.AI Policy Observatory.
Table 1 presents some well-known approaches and guidelines from international as well as national organizations, while a recent evolution at the international scene is the establishment of a “Global Challenge to Build Trust in the Age of Generative AI” [112] by organizations like the OECD, the Global Partnership on AI, the IEEE Standards Association and UNESCO.
Standards for trustworthy AI
There are also some significant efforts towards the standardization of the trustworthy elements by standards organizations and certification societies, while the European Commission issued a new standardisation request to support the recent EU AI Act [17]. Table 2 presents the main standardization approaches of organizations like ISO, IEEE, etc.
Methods for trustworthy AI
Although the policy initiatives, principles, and guidelines are important, designers and developers of
International and national approaches principles and guidelines for trustworthy AI
International and national approaches principles and guidelines for trustworthy AI
Standards and certification programs for trustworthy AI
AI systems lack any clear actionable instructions on how to practically implement them [102]. The vague terminology and abstract nature of some of the principles and policy guidelines have not been helpful to data scientists, machine learning engineers, and designers who are the ones involved in operationalizing these principles in practice [21]. Hence several structured procedural frameworks, methods, and auditing procedures have been developed to assist organizations in their efforts to design, develop, and audit the trustworthiness of their AI systems; Table 3 presents indicative approaches.
The UC Berkeley Center for Long-Term Cybersecurity published a taxonomy of 150 trustworthiness properties and mapped them across the AI lifecycle stages of the NIST RMF, thereby creating a resource and tool for organizations developing AI, as well as for standards-setting bodies, policymakers, independent auditors, and civil society organizations working to evaluate and promote trustworthy AI [109].
This large number of diverse properties of trustworthiness and the different approaches that allow e.g. mitigation of negative effects, increase the need for structured approaches to the design and development of trustworthy AI systems. A recent initiative [107] that aimed at aiding the selection of the most suitable framework examined and categorized over 40 existing responsible AI frameworks. These frameworks were then mapped onto a matrix, facilitating organizations in comprehending, selecting, and implementing responsible AI in alignment with their specific requirements. The matrix comprises two dimensions: the user dimension, representing individuals responsible for implementing frameworks within organizations involved in building or utilizing AI systems, and the utility dimension, which organizes frameworks into three categories: components, AI lifecycle, and trustworthiness characteristics.
Another review outlined in [117] analyzes over 100 frameworks, process models, and proposed solutions and tools aimed at facilitating the transition from principles to implementation. Building upon the work in [105], this analysis underscores the emphasis of existing approaches on a select few ethical concerns such as explainability, fairness, privacy, and accountability. It also introduces a more refined segmentation of the AI development process and identifies areas necessitating further scrutiny from researchers and developers.
The abundance of conceptual principles, guidelines, and methods has been recently accompanied by many concrete software tools that attempt to address the need to move from ‘what’ to ‘how’, i.e. to move beyond ethical AI guidelines to concrete operational mandates and tools that enable better oversight mechanisms in the way AI systems are developed and deployed.
Various survey papers review the related technologies and tools. For example, [105] reviews tools and methods in order to help translate principles into practice, while [86] introduces a framework that consolidates the existing fragmented approaches to trustworthy AI into a unified, systematic approach. This approach encompasses the entire lifecycle of AI systems, spanning from data acquisition to model development, system development and deployment, and ultimately to continuous monitoring and governance. [138] focuses on four categories of system properties that are considered instrumental in achieving the policy objectives of AI trustworthiness, namely fairness, explainability, auditability and safety & security (FEAS). They further review the main technologies and tools with respect to these four properties, for data-centric as well as model-centric stages of the machine learning system life cycle. The authors of [88] concentrate on six dimensions crucial for attaining trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-being. For each dimension, they assess the associated technologies, outline their real-world applications, and explore the corresponding and conflicting interactions among these various dimensions. On the other hand, [80] analyses trustworthiness requirements (fairness, explainability, accountability, reliability, and acceptance) adopting a human-centered approach by examining different levels of human involvement in making AI systems trustworthy.
Procedural methods for trustworthy AI
Procedural methods for trustworthy AI
Software toolkits for trustworthy AI
In addition to academically available research efforts, major technology companies have started providing technologies and toolkits to support trustworthy AI. Table 4 outlines the efforts of major well-known corporations, some of which (like IBM and Microsoft) provide open-source versions of their toolkits.
The first is the AI Verify Foundation [6], a not-for-profit subsidiary of IMDA, the Infocommunications Media Development Authority of Singapore. This initiative seeks to leverage the collective expertise and efforts of the global open-source community to create AI testing tools that promote responsible AI practices. Key members of this foundation include industry giants such as Google, IBM, Microsoft, RedHat, Aicadium, Salesforce, among others. The foundation is responsible for the development of AI Verify, a framework and software toolkit designed for AI governance testing. AI Verify validates the performance of AI systems based on a set of principles and aligns with AI governance frameworks such as those established by the European Union, OECD, and Singapore.
The second is the LF AI and Data Foundation, a Linux Foundation project that supports and sustains open-source projects within AI and the data space [87]. Of relevance is the Trusted-AI that hosts LF AI Foundation projects in the category of Trusted and Responsible AI. Among them are IBM’s toolkits such as AI Fairness 360, AI Explainability 360, Adversarial Robustness 360, AI Privacy 360, etc.
While all these tools primarily concentrate on evaluating the trustworthiness of the AI system itself, there are recent research endeavours that redirect attention towards assessing the perceived trustworthiness of the development process. The rationale behind this shift is that while trustworthy AI is defined by system requirements, its practical implementation necessitates an understanding of its connection to specific measures throughout the development process. For example, [64] presents a concept for establishing a trustworthy development process for AI systems, introducing a framework derived from a semi-systematic analysis of AI governance activities. This framework aims to identify obligations and measures necessary to meet established AI ethics requirements and align them with the AI development lifecycle. Another effort in [121] focuses on requirements engineering and examines the applicability of ethical AI development frameworks for performing effective requirements engineering during the development of trustworthy AI systems.
Although much work has been done, there are still considerable research challenges to be tackled to safely guarantee trustworthy AI systems. In the following, we examine five such challenges: (i) the need to shift from human-in-the-loop approaches to modelling and supporting teaming of humans and AI systems; (ii) the quantification and monitoring of key trustworthiness indicators; (iii) the potential that recent trends in neuro-symbolic AI may generate for human-centric trustworthy AI; (iv) the methods and tools for supporting end-to-end trustworthy AI system engineering; and (v) the need to explicitly address the complexities and particularities of generative AI.
Shift from human-in-the-loop to human-AI teams
It has been argued that in order to guarantee trustworthy, responsible, and ethical AI, researchers and practitioners have to adopt a Human-Centered AI approach (HCAI) and consider hybrid human-AI intelligence [7]. This approach strives to develop AI solutions that mitigate discrimination and uphold fairness and justice, while also accurately reflecting human intelligence. It places explicit emphasis on human factors design to ensure that AI solutions are explainable, understandable, useful, and user-friendly [127, 128, 145]. For example, [131] introduces the Human-Centered Trustworthy Framework which aims to elucidate the connections among user trust, socio-ethical considerations, technical and design elements, and user attributes. This framework is designed to offer AI providers, designers, and other stakeholders, straightforward guidelines for integrating user trust considerations into AI design.
A research challenge lies in transitioning from the ‘human-in-the-loop’ approach to the ‘human-AI teaming’ approach. In this paradigm, AI systems collaborate with humans to accomplish tasks, often within larger teams comprising both humans and AI systems. These AI systems may exhibit varying levels of autonomy, operate in different contexts, and handle diverse tasks, leading to a broad design spectrum to consider. Consequently, designing and deploying AI systems that effectively collaborate with humans pose considerable challenges, including ensuring adequate levels of AI transparency and explainability to facilitate human situational awareness, and supporting seamless collaboration and coordination between humans and AI systems [41, 49].
Recent research has been exploring the role of agent reliability on human trust, the methods of communicating intent between human and AI agents, and ways that AI agents either work together with humans or work as trainers of humans [42]. New approaches that pave the way for innovative research in this direction include for example considering coordination in hybrid teams of humans and autonomous agents in many-to-many situations (multiple humans and multiple agents) with the use of trustworthy interaction patterns [97], modelling the maturity of collaborative human-AI ecosystems [106] separating the design choices in terms of the different decision tasks and evaluating the efficacy, usability, and reliance of approaches [84].
Such approaches hold significant potential, particularly in critical domains such as healthcare, transportation (including driving and aviation), military operations, and search and rescue missions. In these fields, the development of effective methods for seamlessly integrating AI with human operations is of utmost importance [104].
The path towards establishing trustworthy human-AI teaming, spanning from initial conception to the formation of high-performing teams, encompasses a wide array of areas including bi-directional situational awareness, human-AI interaction, intelligent decision-making, and human-AI operations [20]. Furthermore, it is noted that this research landscape is defined more by open research questions than by existing knowledge. Many of the questions that researchers will confront along this path are likely yet to be posed, let alone answered [20].
Quantify and monitor trustworthy indicators
A critical topic that generates interesting research relates to the quantification and monitoring of indicators of AI trustworthiness [144]. Although there may exist Key Performance Indicators (KPIs) and metrics for each separate attribute of trustworthiness like fairness or privacy, the challenge is twofold: (a) to adopt a holistic approach for measuring most (if not all) of trustworthiness characteristics in an integrated manner; and (b) to explicitly take into the peculiarities and specific features of the AI system in-use.
The first challenge centers on quantifying the trustworthiness of AI systems, builds upon existing metrics and indices and assigns scores to the various attributes. For example, [67] focuses on supervised machine and deep learning models, develops an algorithm that considers twenty-three metrics grouped into four pillars of trusted AI (fairness, explainability, robustness, and accountability), and aggregates the metrics to calculate a global trustworthiness score. Likewise, research conducted within the French Confiance.ai program [93] delineates various attributes contributing to the concept of trustworthiness. It delves into each attribute to identify associated Key Performance Indicators (KPIs), assessment methods, or control points, and establishes an aggregation methodology for these attributes.
The second challenge refers to the need to consider the context of the use of the AI system. Contextual factors influencing trustworthiness attributes include the criticality level of the application, the domain of the application, the anticipated use of the AI system, and the involved stakeholders, among others. This implies that in different contexts, certain attributes may take precedence, while additional attributes may be introduced to the list. For instance, a medical imaging system tailored for medical professionals may entail distinct trustworthy requirements compared to a human resource management application. In this case, multi-criteria decision support methods are well suited for assessing the various characteristics and may provide the appropriate instruments for aggregating individual preferences and scores; see e.g. [9, 90, 94].
Exploit neuro-symbolic AI
Many researchers have identified the need to integrate well-founded knowledge representation and reasoning with deep learning [62, 126]. This has spurred the development of neuro-symbolic computing, which has emerged as a promising area of research aiming to integrate robust learning within neural networks with symbolic knowledge representation and logical reasoning [63]. This trend seeks to leverage the parallels often drawn by AI researchers between Kahneman’s investigations into human reasoning and decision-making, as detailed in his book ‘Thinking, Fast and Slow’ [79], and the concept of ‘AI systems 1 and 2’. In this model, deep learning would correspond to AI system 1, responsible for intuitive and rapid decision-making, while symbolic reasoning would correspond to AI system 2, handling deliberate and logical reasoning [53]. Although some basic ingredients of neuro-symbolic AI have already been suggested, there are still many outstanding challenges, like the choice of an appropriate language, the need for standard benchmarks, etc. [53].
The adoption of neuro-symbolic AI holds promise as a potential solution for enhancing the reliability, robustness, and trustworthiness of AI systems. Furthermore, neuro-symbolic AI can aid in enhancing integration by addressing bias, improving data quality, aligning AI with human values, and furnishing human-comprehensible explanations for AI-generated predictions. A recent review [99] yielded a total of 54 papers that employed neuro-symbolic methods with an emphasis on trustworthiness (and identified a clear focus on interpretability), while [120] claims that neuro-symbolic AI can assist operations across critical domains with high assurance and trust by helping to provide robustness to adversarial perturbations and assurance by analysing heterogeneous evidence towards safety and risk assessments. Furthermore, [55] seeks to showcase the suitability of neuro-symbolic AI for crafting trustworthy AI by introducing the CREST framework. This framework illustrates how Consistency, Reliability, user-level Explainability, and Safety are established through neuro-symbolic methods, which leverage both data and knowledge to meet the demands of critical applications such as healthcare and well-being.
Although the review in [99] highlights a noteworthy dedication of trustworthy neuro-symbolic AI to enhancing interpretability, it also identified a noticeable portion of work aimed at improving other aspects of trustworthiness, which underscores an opportunity for future research to extend the benefits of neuro-symbolic AI to other critical dimensions of trustworthiness.
Develop and adopt trustworthy AI system engineering methods
To guarantee the trustworthiness of their AI system, designers and developers should not consider it as a requirement to be satisfied ex post, after their system is deployed; on the contrary, the main attributes of trustworthiness need to be addressed from the early start of the conception and design of the system. The engineering of trustworthy AI systems needs to consider the trustworthiness requirements of their stakeholders. This is not the current status quo. For instance, machine-learning applications are typically defined according to optimization and efficiency criteria rather than considering quality requirements that align with the needs of stakeholders.
Rigorous specification techniques for the development and deployment of AI applications are essential. These techniques are being explored within the U.K. Research and Innovation (UKRI) Trustworthy Autonomous Systems (TAS) program. This program undertakes cross-disciplinary fundamental research to guarantee that systems are safe, reliable, resilient, ethical and trusted [1]. This work gives rise to notable research challenges, including the formalization of human-understandable knowledge to render it interpretable by machines; the specification and modelling of human behaviour, intent, and mental state; and the modeling of social and ethical norms pertinent to human-AI interaction [1].
Another notable approach in this direction is followed by the French Confiance.ai project [28] which adopts model-based systems engineering for the assessment of trustworthiness from the early stages of design up to the deployment and operation of the AI system [12, 11, 93]. The approach is an adaptation of the Arcadia method [143] and hence is built around four perspectives: Operational Analysis (the engineering methods and processes, the operational need around the Trustworthiness Environment), System Analysis (the functions of the Trustworthiness Environment), Logical Architecture and Physical Architecture (abstract and concrete resources of the Trustworthiness Environment). The system of interest in this case is the Trustworthy Environment, the tooled workbench to be delivered by the Confiance.ai research program and the overall ambition is to obtain an applicable end-to-end engineering method [12].
Towards trustworthy and responsible generative AI
The recent explosion of foundation models and generative AI models and applications and their deployment across a wide spectrum of industries is considered to have a tremendous impact on productivity and the way of work. For example, McKinsey and Company estimate that generative AI could add the equivalent of $2.6 trillion to $4.4 trillion annually to the global economy – by comparison, the United Kingdom’s entire GDP in 2021 was $3.1 trillion [95]. Nevertheless, generative AI presents various ethical and social concerns. Problems such as the absence of interpretability, bias and discrimination, privacy breaches, lack of model robustness, dissemination of fake and misleading content, copyright infringement, plagiarism, and environmental consequences have been linked to the training and inference processes of generative AI models.
Installing guard rails with appropriate tooling has become imperative, in order to address these problems and has expanded the requirements for AI trustworthiness. New requirements include, for example, verifying and validating GenAI models before they become available, or reporting in a more interoperable and standardized way on the testing that has been done to improve transparency and accountability.
In response to these concerns, the US National Institute of Standards launched a Generative AI working group to devise a profile of AI Risk Management Framework (RMF) tailored specifically for generative AI. This initiative aims to establish four sets of guidelines covering pre-deployment verification and validation of generative AI models, digital content provenance, incident disclosure, and governance of generative AI systems [110]. The AI Verify Foundation proposed a novel model AI governance framework for generative AI, aiming to introduce a systematic and balanced approach to address the concerns associated with generative AI [5]. This framework outlines nine dimensions to be collectively considered to cultivate a trusted ecosystem: accountability, data, trusted development and deployment, incident reporting, testing and assurance, security, content provenance, safety and alignment R&D, and AI for social good.
Ongoing research is dedicated to aiding data scientists and machine learning developers in constructing generative AI systems that prioritize security, privacy preservation, transparency, explainability, fairness, and accountability. The objective is to mitigate unintended consequences and address compliance challenges that could pose harm to individuals, businesses, and society at large [82].
Conclusions
The remarkable advancements in Artificial Intelligence and its widespread integration into nearly every aspect of daily life underscore the importance of prioritizing ‘trust’ as a fundamental principle in the design, development, and monitoring of AI, rather than considering it optional [19]. Various national and international bodies have introduced a range of principles and regulations that AI systems must adhere to in order to earn trust. While different terms like AI validation, assessment, auditing, or monitoring may be used, the essential objective remains to ensure that AI systems function effectively within their intended operational parameters and comply with regulatory guidelines [136, 132].
This paper explored the alternative approaches, standards, methods, and tools that recently emerged in efforts to enable trustworthy AI. Although much work has been done already, there are still open research issues for end-to-end AI engineering that enables ‘trust-by-design’ [92, 98]. The field of AI trustworthiness assessment is rapidly evolving, new attributes and evaluation metrics may emerge and new avenues for research will be explored. Ensuring AI trustworthiness will be a critical element in our quest to develop AI for social good [52].
Footnotes
Acknowledgments
This work is funded by the EU Horizon Europe programme CL4-2022-HUMAN-02-01 under the project THEMIS 5.0 (grant agreement No.101121042) and by the UK Research and innovation under the UK governments Horizon funding guarantee. The work presented here reflects only the authors’ view and the European Commission is not responsible for any use that may be made of the information it contains.
