Sage Journals: Discover world-class research

Abstract

Christoph Molnar. 2020. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, Lulu.com, pp. 318, ₹6690.

The book Interpretable Machine Learning: A Guide for Making Black Box Models Explainable is a popular book on interpretable machine learning, which is a very interesting and relevant topic in today’s world. It has become very important to managers/decision-makers of various organizations as it helps in understanding the decisions/results of various black box machine learning models. The author of the book is Christoph Molnar. He is a statistician, machine learning expert, and writer. He has also written other books related to machine learning, such as Modeling Mindsets: The Many Cultures of Learning from Data.

The book starts with the Preface by the author, where he writes about the motivation for this book and gives a brief overview of the book. The book has many short stories, which makes it more interesting to learn the various concepts related to interpretable machine learning. It describes the machine learning and gives a formal definition of it. “Machine learning is a set of methods that computers use to make and improve predictions or behaviors based on data.” For better understanding and readability, the author has defined a set of terms that are used in the book in the introduction.

In the next part of the book, the author talks about interpretability and its importance. However, there is no mathematical definition of interpretability. A (non-mathematical) definition is that interpretability is the degree to which a human can understand the cause of a decision. Another one is interpretability, which is the degree to which a human can consistently predict the model’s result. Interpretability is important because humans just do not trust the model machine learning models. They always look for reasons. The authors have classified interpretability methods based on various criteria, such as intrinsic or post hoc and model-specific or model-agnostic. Then, the author has defied the scope of the interpretability, which can be broadly grouped into local and global. Evaluating of interpretability is difficult as there is no real consensus about what interpretability is in machine learning. The author also gives a brief introduction about the properties of explanations and what makes it more human-friendly. To demonstrate the interpretability of various machine learning models, the authors used real datasets that are freely available online. He uses different datasets for different tasks: classification, regression, and text classification. Most of the datasets are from the UCI machine learning repository.

In the next part, the authors give a brief description of various interpretable models. He starts with linear regression. The linear regression model predicts the target as a weighted sum of the feature inputs. The linearity of the learned relationship makes the interpretation easy. He discusses various technical details of linear regression. Next, he discusses logistic regression. Logistic regression models the probabilities for classification problems with two possible outcomes. It is an extension of the linear regression model for classification problems. He discusses the need for logistic regression and how it works. Next, he discusses generalized linear models (GLMs) and generalized additive models (GAMs). GLMs are extensions of linear regression that work on non-Gaussian outcomes. Similarly, GAMs are also extensions of linear regression, but they work on non-linear models. Next, he discusses the very famous decision tree. He gives an example to explain the workings of a decision tree. He also mentions its advantages and disadvantages. Next, he discusses decision rules. A decision rule is a simple IF-THEN statement consisting of a condition (also called antecedent) and a prediction. For example, if it rains today AND it is April (condition), then it will rain tomorrow (prediction). A single decision rule or a combination of several rules can be used to make predictions. Next, he discusses RuleFit. The RuleFit algorithm learns sparse linear models that include automatically detected interaction effects in the form of decision rules. Lastly, he discusses other interpretable models, such as the naïve Bayes classifier and k-nearest neighbours.

In the next part, the author discusses various model-agnostic methods. The great advantage of model-agnostic interpretation methods over model-specific ones is their flexibility. The author starts with a partial dependence plot (short PDP or PD plot), which shows the marginal effect one or two features have on the predicted outcome of a machine learning model. Next, he discusses individual conditional expectation, which plots display one line per instance, showing how the instance’s prediction changes when a feature changes. Next, he discusses accumulated local effects, which describe how features influence the prediction of a machine learning model on average. Accumulated local effects plots are a faster and unbiased alternative to PDPs. Next, he discusses feature interaction. If a machine learning model makes a prediction based on two features, we can decompose the prediction into four terms: a constant term, a term for the first feature, a term for the second feature, and a term for the interaction between the two features. Next, he discusses the permutation feature importance, which measures the increase in the prediction error of the model after we permuted the feature’s values, which breaks the relationship between the feature and the true outcome. Next, he discusses global surrogate. A global surrogate model is an interpretable model that is trained to approximate the predictions of a black box model. Next, he discusses local surrogates. Its models are interpretable and are used to explain individual predictions of black box machine learning models. Local interpretable model-agnostic explanations are a popular local surrogate model. Lastly, he discusses Shapley values and SHAP (SHapley Additive exPlanations). Shapley values are a method from coalitional game theory that tells us how to fairly distribute the “payout” among the features. SHAP is based on the game’s theoretically optimal Shapley values.

In the next part, the author discusses example-based explanations. Example-based explanation methods select particular instances of the dataset to explain the behaviour of machine learning models or to explain the underlying data distribution. He has discussed various techniques. Counterfactual explanation: it describes a causal situation in the form: “If X had not occurred, Y would not have occurred.” Criticism is a data instance that is not well represented by the set of prototypes. Then, the author discusses about the neural network interpretation. A neural network is one of the most complex machine learning models, so its interpretation is quite useful. He has discussed various techniques of neural network interpretation. Learned features: Deep neural networks learn high-level features in the hidden layers. Pixel attribution (saliency maps): Pixel attribution methods highlight the pixels that are relevant to a certain image classification by a neural network. Adversarial example: It is an instance with small, intentional feature perturbations that cause a machine learning model to make a false prediction. Prototype: It is a data instance that is representative of all the data. Influential instances: It is an instance where deletion from the training data considerably changes the parameters or predictions of the model.

Overall, the book provides a very comprehensive understanding of the interpretable machine learning models. It is quite useful for managers and researchers as it gives a detailed practical and theoretical understanding of various interpretable machine learning models. In my view, it is advisable to read the book.

Book review: Christoph Molnar. 2020. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable

Abstract