Sage Journals: Discover world-class research

Abstract

The right to artificial intelligence (AI) explainability has consolidated as a consensus in the research community and policy-making. However, a key component of explainability has been missing: extrapolation, which can reveal whether a model is making inferences beyond the boundaries of its training. We report that AI models extrapolate outside their range of familiar data, frequently and without notifying the users and stakeholders. Knowing whether a model has extrapolated or not is a fundamental insight that should be included in explaining AI models in favor of transparency, accountability, and fairness. Instead of dwelling on the negatives, we offer ways to clear the roadblocks in promoting AI transparency. Our commentary accompanies practical clauses useful to include in AI regulations such as the AI Bill of Rights, the National AI Initiative Act in the United States, and the AI Act by the European Commission.

Keywords

Automated systems AI AI regulations extrapolation transparency machine learning

Introduction

A consensus has consolidated in the research community and policy-making about the right to reasonable explanations for people affected by decisions made by Machine Learning (ML) and Artificial Intelligence (AI) models (Coyle and Weller, 2020; Wachter and Mittelstadt, 2019). In 2020, the National Artificial Intelligence Initiative Act in the United States recognized the need to improve the reliability of AI methods. The European Commission (2021) released a highly sophisticated product safety framework to rank and regulate the risks of AI-driven systems, following up on the General Data Protection Regulation of 2018. Most recently, The White House Office of Science and Technology Policy (2022) released The AI Bill of Rights. These efforts aim to pinpoint the right to explanation. However, one fundamental element of the right to explanation has been neglected: extrapolation, which AI and ML models frequently perform. We propose that regulations incorporate articles requiring AI and ML models to report, for each decision or prediction that they make, whether they have extrapolated or not, and in which directions.

The concept that ML models often extrapolate is a foundational insight to any future research. According to the current regulations, AI models may be required to provide an explanation about their decisions, sometimes in the form of counterfactuals, that is, a recourse explaining what actions can be taken to change the decisions/outcomes (Wachter et al., 2018). However, the information as to whether a model extrapolates or not can still be kept hidden by default, concealing the more fundamental information regarding the suitability of the model.

AI and ML, broadly defined, is a set of mathematical methods automating the learning process. A model learns from a training set, then uses the learned phenomenon to make decisions and predictions in the world at large. In a medical setting, a model may learn from the clinical outcomes of a cohort of patients, and possibly predict with some accuracy for new patients who walk through the door of a hospital. It would be commonsensical to inquire how a new patient compares with the cohorts of patients in the training set and whether the new patient's information falls within the range of information in the training set. Extrapolation is a mathematical concept describing just that. In an extreme case of extrapolation, a new patient could have some rare and complicated form of disease that the model has never seen. Therefore, the model's output for this patient may not be reliable. If a nurse encounters a patient with features that he has never seen before, he may elevate the situation to an expert physician. Likewise, an ML model should report if it has gone beyond its training to obtain a result.

Measuring extrapolation

In math and computational geometry, there are well-defined algorithms for verifying whether a model is extrapolating, and if so, in which directions and dimensions and to what extent. Any AI model is trained on a training set, that is, a finite dataset from which the model learns. A training set, however small or large, forms a convex hull. The convex hull of a set is the unique minimal convex set containing all the points in the set. In three-dimensional space, one can think of a convex hull as a dome structure. In relation to a given dome, any point would fall either within or outside it, and the outer shell of the dome would be equivalent to the boundaries of the convex hull. In mathematical terms, any point in the space will either belong to a convex hull or not. This distinction, for convex hull membership, defines the contrast between interpolation and extrapolation. Note that in common applications of AI, one usually deals with many more dimensions than the dome example above, and to figure these relationships, we have to rely on algorithms from computational geometry. The concept of convex hull dates back to at least Isaac Newton (Newton, 2008). Extrapolation also has a rich literature in pure and applied mathematics (Brezinski and Zaglia, 2013) and cognitive science and psychology (Yousefzadeh and Mollick, 2021).

Going back to our main subject: after being trained, an AI model will be deployed to process new inputs, such as information about a new patient. Any new input has a geometric relationship with the convex hull of the dataset on which the model is trained, that is, it will fall either within the convex hull of the training set or outside it. When a new data point is outside the convex hull of its corresponding training set, a model will need to extrapolate to process it. Conversely, the model will interpolate when a new data point is within the convex hull of its training set.

Moreover, projecting a sample onto the convex hull of a training set shows how a single case relates to the whole set. The farther the case is from the convex hull, the larger the extent of extrapolation. The extent of extrapolation for a data point is its distance from its corresponding convex hull. For example, consider a model trained on data gathered from 1000 patients who are all white, male, and aged between 30 and 60 years. If this model makes a decision for a 20-year-old black woman, it has to extrapolate in three dimensions, those of race, gender, and age. If the model processes the information for a 2-year-old white male, its extent of extrapolation will be 18 years in the dimension of age because this patient is 18 years younger than the youngest (closest) sample used in training the model. Would any of these extrapolations be appropriate? Only medical experts can provide a reliable answer about the suitability of these extrapolations.

Addressing the concerns of domain experts instead of circumventing them

In the past two decades, data collection from various realms of life, together with growing computational power, has allowed the practice of learning from data to spread widely, leading to the emergence of a field known as data science, ML, and AI. This widespread practice can be viewed as a democratization of mathematical modeling and data analysis as researchers from one discipline often contribute to other fields by deploying AI and ML tools. Yet, this democratization has sometimes happened at the expense of domain expertise and interpretability.

Our transparency proposal will help address domain experts’ concerns and thereby increase their trust in automated systems. Determining extrapolation adds a negligible computational burden while helping resolve issues of distrust. For example, physicians may not be willing to use automated systems in the medical setting unless a model provides adequate explanations for its recommendations and abstains from making decisions when it encounters unfamiliar data. Increased transparency can be a step toward earning the trust of domain experts. With this perspective, our proposal would not be considered a roadblock, rather a path to address the concerns of domain experts.

Not all decisions are extrapolations: When do extrapolations matter?

In a given case, whether a model has extrapolated is a piece of information lying at the heart of the right to explanation. In automated decision making, if a model is making vital predictions about a patient with features not similar to any sample it has seen before, the model should be mandated to report, and in certain applications, it may be proper to elevate that extrapolated case to human experts.

We propose to make it transparent whether models extrapolate or not for a particular input. Opposition of this recommendation might request evidence that would show all extrapolations were undesirable. We argue that this request is beside the point. Some extrapolations can be reasonable. The point is that this evaluation should be done by domain experts. Such evaluations require information about extrapolation, which is what we hope to make available. Consider laws that require governments to make all expenditures known to the public. Transparency in this context does not mean that all spending by a government is inappropriate. The transparency allows the public and stakeholders to identify inappropriate expenses and reveal corruption. Similarly, the information about extrapolation should be made available to the stakeholders first, before any assessment of its desirability. Resistance to the transparency proposal means keeping the extrapolation information hidden.

As a proposal for transparency, the extrapolation framework is also interlocked with arguments about accountability and fairness. Transparency precedes the other two, yet extrapolations performed by the models are by default concealed. Fairness may have many complex and competing definitions (Barocas et al., 2019). Nevertheless, this lack of transparency can be conceived as one aspect of lacking in fairness as it would be unfair to subject people to automated systems while basic information about the suitability of the system is still hidden.

Why is it important to scrutinize the extent of extrapolation?

In the research community, there have been discussions about whether ML models interpolate or extrapolate. Some assume that models are predominantly interpolating between their training samples (Belkin et al., 2019; Webb et al., 2020) and do not often extrapolate. All the datasets we have investigated, as we will explain below, prove to be extrapolating frequently enough to be taken seriously.

On the other hand, a group of researchers recently reported that in datasets with more than 100 features, learning always amounts to extrapolation (Balestriero et al., 2021). This notion is realistic, but two issues arise. First, it leaves out many applications where datasets have less than 100 features. Second and more importantly, this position can be used to trivialize extrapolation. Some scholars have argued that since extrapolation happens frequently, it must be trivial. Our results show the opposite. If we continue to believe that extrapolation is trivial, people affected by it may not be entitled to know about this fundamental issue.

Many applications of AI and ML are based on datasets with 10 to 50 features. Extrapolation in such applications is neither trivial nor negligible. When we studied, for example, the adult income dataset (Dheeru and Karra Taniskidou, 2017), a benchmark case for studying social applications of ML, about half of its testing samples required some extrapolation. Some of these extrapolations might be considered negligible, but for a considerable portion of testing samples, the extent of extrapolation is far from negligible. We have seen the case of a woman in the US workforce originally from Thailand, with postgraduate education and in a managerial position, but in the lower-income bracket. The training set of this dataset did not have any sample close to her, so significant extrapolation in the dimensions of age, native country, race, education level, and weekly work hour had to happen by any model trained on this dataset. We have projected this woman's information to the convex hull of the training set and seen that in these dimensions, both collectively and individually, the projections significantly differed from hers. What we have seen is not just an outlier here—such levels of extrapolation are neither rare nor predominant. Consider another case in the healthcare domain. We have investigated a dataset from the Veterans Affairs Healthcare System (Justice and Tate, 2019) with more than one million patient records. Having performed a five-fold cross-validation, we found that about 15% of patient records in the testing set required extrapolation. For many cases, extrapolation was too extensive from the medical perspective to be considered proper. These trends persist in all the other datasets we studied. Extrapolation cannot be dismissed as trivial. In any respect, the affected person should have a right to know that the model extrapolated when it made that decision for her.

The AI Act

Article 10, paragraph 2(g), of the AI Act by the European Commission requires “identification of any possible data gaps or shortcomings.” Extrapolation can be considered a “shortcoming.” Paragraph 3 then mentions that datasets should be relevant and representative, but the article does not suggest a way to quantify relevance and representativeness. Extrapolation cases may be considered a measurement for a model's shortcoming in representation.

We suggest that the following clause be appended to article 13 of the AI Act: any decision made by an AI system should come with information on whether the model has extrapolated. If extrapolation is performed, AI systems should also report the attributes of extrapolation.

If article 13 reveals whether a model has extrapolated or not, article 10 can be the basis for scrutinizing the extrapolation information.

The AI Bill of Rights

The AI Bill of Rights, published in October 2022, recognizes the right of individuals to adequate explanations. The bill takes one step further and, under the section “Human alternatives, consideration, and fallback,” proposes: “You should be able to opt out, where appropriate, and have access to a person who can quickly consider and remedy problems you encounter.” This aligns with our transparency proposal. In this clause, the term “where appropriate,” however, is vague and open to different interpretations.

The bill or similar legislative efforts will gain more clarity if clauses about AI extrapolations such as the following are included: A stakeholder should know whether the automated system has extrapolated when making decisions about that stakeholder, in which dimensions, and to what extent. Extrapolation can be a basis for a stakeholder to opt out of an automated system.

Reveal extrapolation, not prohibit it

Extrapolation may lead to good and bad decisions. We do not suggest the prohibition of extrapolation. In certain applications, models may extrapolate, inevitably. Certain types of extrapolation may be justified by experts, while others not. Mathematically, there are ways to ensure a model extrapolates in a desirable way in certain directions. But for a complex model, it may not be easy to determine how extrapolations may be acceptable beforehand. The problem is that no magic criterion would tell us when extrapolation is undesirable determining when and where extrapolation is justified requires domain expertise. Instead of prohibition, we suggest transparency to pave the way for experts to scrutinize cases of extrapolation.

In the example of transparency for government expenditures, it is impossible to devise a magic rule that automatically identifies all misappropriations and corruptions. However, making expenditures known to stakeholders (in this case, the public) is possible. Once such information is available to the stakeholders, they can scrutinize it.

Conclusions

The community recognizes that AI and ML models may have shortcomings and biases (Eshete, 2021; Rudin, 2019; Rudin, 2022). Models that fall under the umbrella of AI and ML are usually complex mathematical functions that are difficult to interpret (Yousefzadeh and O’Leary, 2022), hence the proper name “black-box models.” Requiring explanations about the rationale behind the model's decisions has entered the public policy domain and regulations, but the knowledge about whether a model has extrapolated or not has been neglected. This shortcoming would undermine the effectiveness of the current versions of AI regulations. Our proposed articles and clauses for the AI Act and the AI Bill of Rights would make a modest contribution to the collective efforts behind these documents.

In the absence of AI regulations on extrapolation transparency, individuals and civil society should consider using the existing legal system and regulations to seek transparency and scrutinize the decisions made for them by automated systems.

Extrapolation is not the only way a model may make unjustified decisions, and extrapolation is one of the many pieces of information that one needs to know about AI and ML models. Nevertheless, transparency about extrapolation can be a crucial step toward fairness, empowering the people affected by automated systems.

Footnotes

Acknowledgement

The authors thank the anonymous reviewers for helpful comments.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Xuenan Cao

Roozbeh Yousefzadeh

References

Balestriero

Pesenti

LeCun

(2021) Learning in high dimension always amounts to extrapolation. arXiv preprint arXiv:2110.09485.

Barocas

Hardt

Narayanan

(2019) Fairness and Machine Learning: Limitations and Opportunities. fairmlbook.org. http://www.fairmlbook.org.

Belkin

Hsu

, et al. (2019) Reconciling modern machine-learning practice and the classical bias – variance trade-off. Proceedings of the National Academy of Sciences of the USA 116(32): 15849–15854.

Brezinski

Zaglia

(2013) Extrapolation Methods: Theory and Practice. Amsterdam: Elsevier.

Coyle

Weller

(2020) Explaining machine learning reveals policy challenges. Science 368(6498): 1433–1434.

Dheeru

Karra Taniskidou

(2017) UCI machine learning repository. URL http://archive.ics.uci.edu/ml.

Eshete

(2021) Making machine learning trustworthy. Science 373(6556): 743–744.

Justice

Tate

(2019) Strengths and limitations of the veterans aging cohort study index as a measure of physiologic frailty. AIDS Research and Human Retroviruses 35(11–12): 1023–1033.

Newton

(2008) The Mathematical Papers of Isaac Newton. Cambridge, UK: Cambridge University Press.

10.

Rudin

(2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1(5): 206–215.

11.

Rudin

(2022) Why black box machine learning should be avoided for high-stakes decisions, in brief. Nature Reviews Methods Primers 2(1): 1–2.

12.

The European Commission (2021) The AI act. URL https: //artificialintelligenceact.eu/the-act/.

13.

The White House Office of Science and Technology Policy (2022) Blueprint for an AI bill of rights: Making automated systems work for the American people. URL https://www. whitehouse.gov/ostp/ai-bill-of-rights.

14.

Wachter

Mittelstadt

(2019) A right to reasonable inferences: Re-thinking data protection law in the age of big data and AI. Columbia Business Law Review 2019(2): 494–620.

15.

Wachter

Mittelstadt

Russell

(2018) Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harvard Journal of Law & Technology 31(2): 841–888.

16.

Webb

Dulberg

Frankland

, et al. (2020) Learning representations that support extrapolation. In: International Conference on Machine Learning. PMLR, pp. 10136–10146.

17.

Yousefzadeh

Mollick

(2021) Extrapolation frameworks in cognitive psychology suitable for study of image classification models. Workshop on Human and Machine Decisions at NeurIPS.

18.