Abstract
The aim of this article is to assess the impact of Big Data technologies for insurance ratemaking, with a special focus on motor products.The first part shows how statistics and insurance mechanisms adopted the same aggregate viewpoint. It made visible regularities that were invisible at the individual level, further supporting the classificatory approach of insurance and the assumption that all members of a class are identical risks. The second part focuses on the reversal of perspective currently occurring in data analysis with predictive analytics, and how this conceptually contradicts the collective basis of insurance. The tremendous volume of data and the personalization promise through accurate individual prediction indeed deeply shakes the homogeneity hypothesis behind pooling. The third part attempts to assess the extent of this shift in motor insurance. Onboard devices that collect continuous driving behavioural data could import this new paradigm into these products. An examination of the current state of research on models with telematics data shows however that the epistemological leap, for now, has not happened.
This article is a part of special theme on The Personalization of Insurance. To see a full list of all articles in this special theme, please click here: https://journals.sagepub.com/page/bds/collections/personalizationofinsurance
Introduction
In the middle of the eighteenth century, a debate between Bernoulli and d’Alembert illustrates the epistemological shift that accompanied the emergence of probability and statistics. The inoculation of smallpox to children was examined as a potentially revolutionary treatment, although it was known to cause the death of some. Bernoulli published a study demonstrating that, provided the mortality rate caused by the vaccination remains below 11%, it improves the life expectancy of the population as a whole, by three years on average. He therefore concluded that vaccination is scientifically desirable. For d’Alembert, Bernoulli’s so-called demonstration does not stand due to the high risk of dying for the individual in the short term (Colombo and Diamanti, 2015); d’Alembert still reasoned in the individualistic approach of the previous centuries. Bernoulli by contrast inaugurated the statistical thinking that accompanied the emergence of modern states (Desrosières, 2008a; Hacking, 1990) and, with them, insurance mechanisms that manage aleatory events, or events that behave like a roll of dice at the individual level but exhibit some regularity at the collective one (Ewald, 1986).
Current technological developments combined with the collection of huge amounts of personal data lead many commentators today to claim that we have reached a new epistemological turn in data analysis. Pentland (2014: 9) thus emphatically contends that: ‘we are discovering that we can begin to explain many things – crashes, revolutions, bubbles – that previously appeared to be random “acts of God”.’ In the treatment of sickness for instance, this new apprehension of risk means that the statistical and collective approach (of vaccination) is now doubled by a focus on the individual through predictive medicine. The aim is to tailor the diagnosis and the treatment to each patient specifically, rather than to measure its efficiency on the aggregate (Herrero et al., 2016; Samerski, 2018).This is akin to the broader phenomenon of personalization, involving the adjustment of products/treatment/service to individual transaction histories in a variety of market settings (McFall and Moor, 2018: 195). McFall and Moor (2018: 205) thus define market personalization as follows: ‘the combinations of new technologies, correlative analysis techniques, and mythologies that define “big data”, notably through the use of transactional and purchase histories to make personalized recommendations and offers.’ This article does not deal with this kind of personalization but rather with the potential use of these techniques for the sake of risk pricing in insurance. We use the terms personalization and individualization interchangeably here, although in the literature ‘personalization’ is usually used to refer to marketing, whereas ‘individualization’ is applied where risk assessment is concerned. ‘Individualization’ would then be the application to the assessment of risk of technical practices called ‘personalizing’ in marketing.
Applied to risk assessment in insurance, personalization is more difficult to grasp. Insurance mechanisms are intrinsically collective, as they are built on the pooling of risks (Baker 2002: 6; Lehtonen and Liukko 2015: 158). In this strand of thought, O’Malley distinguishes between ‘socialized actuarialism,’ that aims at a large pooling with little or no distinction between levels of risk, and a more refined approach that becomes commonplace with neoliberalism. He dubbed the latter as ‘prudentialism’ or ‘privatized actuarialism,’ to refer to ‘a technology of governance that removes the key conception of regulating individuals by collectivist risk management, and throws back upon the individual the responsibility for managing risk’ (O’Malley, 1996: 197, emphasis added). O’Malley maintains that the new approach and the disciplinary incentives to modify behaviours so as to mitigate risks places increased responsibility of individuals. Written in 1996, his article does not refer to changing data analysis techniques that aim at quantifying this individualized risk. These new techniques are often defined as predictive analytics (Siegel, 2016). Yet while insurance has always been about prediction, risk measurement traditionally consisted in the transformation of individual uncertainty concerning the future into something stable, measurable and thus predictable, on the collective (Ericson and Doyle, 2004). Hence the new techniques, while bearing some familiarity with insurance mechanisms seem particularly challenging for traditional conceptions of insurance since they claim to predict the individual case rather than the group.
Despite the challenge, the paradigm shift seems in many ways already in action: in health insurance, some insurers have started calculating individual risk scores for the sake of managing costs and federal subsidies across the different Affordable Care Act marketplaces (McFall, 2019). In many countries telematics devices, and smartphone apps, are being used to collect behavioural and continuous data. Insurers have also started implementing Usage Based Insurance (UBI) products, thus giving them the tools, at least conceptually, to quantify the risk, based on each specific insured’s behaviour (Meyers and Van Hoyweghen, 2018; Bruneteau et al., 2012). The aim of this article is to assess the extent of the paradigm shift, on these motor products. While Meyers and Hoyweghen focused on the implied changes in conceptions of fairness, our aim here is rather to understand if and how the UBI products actually serve the calculation of an individualized risk premium. By risk premium we mean the actuarial measure of risk, as it is seen by the insurer via its models. It should be distinguished from the commercial, final premium paid by the insured that could potentially differ from the first. This article is a theoretical contribution to the literature that proposes to tackle some myths and conceptual views associated with predictive analytics, based on the exploration of existing empirical research. These studies are taken as an indication of current insurance practices. While we did not build models by ourselves for the sake of this article, our conclusions are informed by our personal experience as actuaries.
The first part demonstrates how insurance was built by the actual creation of homogeneity that was artificially obtained thanks to risk classification. Insurance thus accompanies the construction of statistical tools for the management of aleatory events at the collective level. The second part shows how current Big Data technologies claim to imply a reversal of perspective that is deeply at odds with the core of insurance practices. The last part empirically assesses the state of the art of Big Data in insurance practices and tests the hypothesis of an epistemological leap through a review of research articles on risk measurement with telematics. It shows their limited impact on risk pricing techniques. Whether such a position can be maintained in the long run remains an open question, the more so as all researchers have proved the relevance of telematics parameters for crash prediction and risk measurement.
The emergence of insurance mechanisms: Building a view on the aggregate
The aim of this part is to offer a theoretical framework derived from a Foucauldian historical epistemology in order to grasp the traditional meaning of ‘risk’ (to be contrasted in the next section with new potential approaches). It proposes to look at insurance mechanisms as a specific kind of rationality that accompanied the development of industrial societies. It aims at showing how ‘risk’ came to be conceptualized as a collective phenomenon by bypassing individual singularities.
The development of statistics during the nineteenth century shows the existence of a regularity at the collective level of events that cannot be explained at the individual one (Foucault 2009: 65–66). For Foucault, a new object of knowledge is taking shape, the population. Statistics thus helped develop a new management of collective phenomena at large; they also gave new tools to cope with uncertainty. By the end of the nineteenth century, in many European societies, accidents were perceived along Durkheim’s terminology as ‘social facts:’ that cannot be predicted at the individual level, but get some predictability at the collective level. Knowledge of aleatory events can be obtained on the aggregate, once the micro level is abandoned (Desrosières, 2014: 169). This dual understanding of knowledge (individual vs. collective) seems to characterize the period. In his 1896 introductory lesson on probability, using particles and the newly established laws of the kinetic theory of gases as metaphor, Poincaré states the following: You ask me to predict events that will occur in the future; would I unfortunately know the laws of these phenomena, I couldn’t manage without inextricable calculations and I would have to give up answering. Yet since I am lucky enough to ignore them, I will answer immediately. And, most extraordinarily, my answer will be correct. (Poincaré 1912: 3)
This allows the emergence of a level of reality that treats the individual as a case for the understanding of the group at the heart of insurance mechanisms (Foucault 2009, 60; see also Ewald, 1986). But this level is actually produced, we argue, by statistics as a practice, and the process of quantification implied by the new science. Statistical knowledge was built by the collection of data via questionnaires (e.g. censuses) and the quantification of the world (Hacking, 1990), that also imposed a vision of homogeneity among people. Actually, the homogenization occurs twice: once in the choice of what is not asked, therefore not quantified at all; and once in the averaging of what is measured and collected.
Quantification, confirms Porter, creates standardization and ‘averages away’ the noise of individuals: Inevitably, meanings are lost. Quantification is a powerful agency of standardization because it imposes order on hazy thinking, but this depends on the license it provides to ignore or reconfigure much of what is difficult or obscure. As nineteenth-century statisticians liked to boast, their science averaged away everything contingent, accidental, inexplicable, or personal, and left only large-scale regularities. (Porter 1996: 85, emphasis added)
The Belgian mathematician Quételet is known for being among the first to have applied probabilistic techniques – formerly used in astronomy – to human phenomena, and to have therefore universalized the use of probability calculus (Ewald, 1986: 147; Stigler, 1986: 161). Measuring the size of the torso of soldiers, Quételet noticed that it is distributed along a bell curve; until then this curve, formalized by Gauss, was used in astrophysics to model the error in measurement of the position of stars. For Gauss, the true position of the star is the one where he has a pick of observations, hence at the mean. By analogy, Quételet thus concludes that the deviation from the mean is also a form of error: the actual torso should be compared to the ideal torso of an ideal man, ‘the average man’ (Desrosières, 2008c) that represents the population as a whole. By so doing, he reduces the individual measure to its contribution to the average. But when he decides that the means represents the whole, and interprets the deviation from the mean as error, Quételet actually constructs the homogeneity of the group. This homogeneity is thus both an assumption and a construct. In the context of life insurance, McFall (2011: 676) even demonstrates that ‘the average man’ had to be performed by insurance agents, thus ‘translating averageness from a statistical concept to a commonsense idea.’
Insurance products were built and still function along these lines: they consist in the a priori definition of classes that are supposed to reflect identical risks. Talking about current practices, Paefgen et al. (2013) describe the process as follows: In order to differentiate the risk of insurance policies, actuaries use a set of rate factors to separate policies into groups (i.e., tariff classes). The construction of tariff classes is ultimately a clustering task. Each tariff class corresponds to a certain combination of rate factor categories or intervals in the case of continuous rate factors. For each tariff class, actuaries analyze historical claims data to arrive at a reliable estimate of the corresponding pure premium, that is, the minimum required payment per policy to cover the expected losses from its class. (p. 193) In most developed countries, insurers have implemented bonus-malus systems (BMS), which modify the premium according to past claims history. One of the main goals of BMS is to reduce adverse selection by including indirectly information that could not be taken into account explicitly such as respect of the driving code, alcohol use, mileage driven, etc. (Lemaire et al., 2016: 40, emphasis added)
Until very recently at least, insurance practice thus belonged to the scientific paradigm inherited from nineteenth century: it consisted in human quantification, i.e. the collection of a limited amount of individuals’ information via questionnaires, the constitution of classes based on these variables, the assumption of homogeneity within the classes that allowed for an apprehension of risk on the aggregate; very little could be said on the individual, but a lot could be deduced from the groups manually constructed.
Big Data and the reversal of perspective
With Big Data, a ‘revolution’ is supposedly taking place (Mayer-Schönberger and Cukier, 2013; Pentland, 2014) that transforms the data we collect (both what and how we collect it), and the manner in which it is treated and used. Researchers have already pointed to the ‘mythology’ involved in the claim that it could ‘generate insights that were previously impossible’ (boyd and Crawford, 2012: 663). While being critical of this grand discourse, boyd and Crawford (2012) argue that in the social sciences ‘big data creates a radical shift in how we think about research (…) It reframes key questions about the constitution of knowledge, the processes of research, how we should engage with information, and the nature and the categorization of reality’ (p. 665, emphasis added). This part aims at addressing a similar potential shift in insurance and at showing the conceptual contradiction involved in a field where statistics have for decades accompanied the knowledge and management of risks.
The novelty of the data at stake is often characterized along the ‘three Vs:’ volume, velocity and variety (Billot et al., 2017; Kitchin, 2014). In relation to the previous part and insurance, we would rather insist here on the process of collection itself that has the three Vs as a consequence; the how is changing, implying changes in the what, as will be shown below. The ‘datafication’ described by Mayer-Schönberger and Cukier (2013) entails that data is generated through online navigation, captors and/or bodily sensors. Big Data consists of online traces, ‘bread crumbs’ as Pentland (2014: 8) defines them, that transform human behaviour into something natively numerical, supposedly delivering exhaustive information on the individuals at stake (Kitchin, 2014: 1). Where insurance is concerned, these technologies should, or could, be involved in health, homeowners and automobile products. In all these domains, the collection of data via sensors has become possible: bracelets or wearables that capture bodily information (Gilmore, 2016; Lupton, 2016); home automation sensors for the prevention of fire, leak and moisture (Kulesa, 2016); telematics captors collecting location, speed and acceleration in vehicles. The way data is collected with these devices is radically different from the questionnaire’s era. In the digital age, the questionnaires are indeed generally perceived as an obsolete, cumbersome and inaccurate process for data collection (Arnoux et al., 2017; Schwartz et al., 2013; Yarkoni, 2010; Youyou et al., 2015), whereas the online capture bears ‘the aura of truth, objectivity, and accuracy’ (Boyd and Crawford, 2012: 663).
This also changes drastically the nature of what is collected. Tracking the movements of the car or the body, the sensors provide behavioural and continuous data, two characteristics at odds with traditional insurance data. In automobile insurance for instance, underwriting information usually consisted of drivers’ demographic and cars’ technical details asked for upfront, at the issuance of the policy. The data was static, with the sole exception of the abovementioned bonus-malus system, where an update of claims history is performed at the time of the policy renewal. Moreover, as Ayuso et al. (2019), notice, ‘information about driving habits <were> not considered directly, on the grounds that driving style and intensity could not hitherto be measured objectively’ (p. 736, emphasis added). Behavioural data is now perceived as more trustworthy than demographic, static parameters (Paefgen et al., 2013: 193), a point that challenges the whole perception of risk in insurance. Indeed, Boyd and Crawford (2012: 667) rightly remark that ‘all researchers are interpreters of data’. When new data become available and are perceived as trustworthy, the imagination of the phenomena they represent (in our case, the risk of accident) is changing accordingly. Moreover, the possibility of collecting real time information is a further challenge for products where prices were usually updated once a year (Denuit et al., 2019), not to mention that they are priced upfront, before any such behavioural data are made available by the devices.
Finally, the bypassing of questionnaires also implies the automatization of most of the categorization: ‘if the recorded individual has come into full view, the recording individual has faded into the background, arguably to the point of extinction’ (Fourcade and Healy, 2017a: 11). The categorization, together with rules that lead to the final classification have thus become part of the algorithms’ computation (Burrell, 2016). Indeed, deep learning models at least are described by their users as capable, based on a very large quantity of observations, of extracting patterns from the data without human intervention: ‘the key aspect of deep learning is that these layers of features are not designed by human engineers: they are learned from data using a general-purpose learning procedure’ (LeCun et al., 2015: 436). One of the major consequences of this transformation in the perspective of this article, is that the work of quantification, that conditioned the statistical analysis of the previous era, is also being bypassed by both the manner in which data is collected (without questionnaires) and the manner in which it is processed. In insurance, the a priori classification that allowed risk measurement might no longer appear technically necessary, or even legitimate.
The hypothesis of homogeneity within a class, paramount for risk measurement, becomes therefore difficult to maintain. In parallel with other domains, this is leading to a personalization of risk, in the form of ‘individual risk scores’ (McFall, 2019; Meyers and Van Hoyweghen, 2018). However paradoxical, the idea of adjusting the premium to the individual risk is not new; comparing current claims to the 1990s emergence of genotype information, Ewald (2014) shows the surprisingly similarity of discourses. The notion of ‘individual risk’ actually surfaced with growing computer capacities and neoliberal ideology in the 1980s (Frezal and Barry, 2019; Walters, 1981). Walters (1981) in a seminal address to the American Casualty Actuarial Society thus stated in 1981 that insurance is about the transfer of the individual’s risk to the insurer, without redistribution between insureds (p. 5).
However, limiting pooling to exact same risks, also called chance or probability solidarity with no subsidies between different risk levels (De Witt and Van Eeghen, 1984; Lehtonen and Liukko, 2015, 2011), was a hypothetical view rather than something thought achievable in practice. Applied to insurance, the Big Data paradigm promises to finally deliver this personalization. For Ewald (2014), we are moving from an era of ‘risk,’ where uncertainty was circumscribed thanks to a group approach, to an era of ‘data,’ focused on the individual: ‘such is the epistemological revolution: one can anticipate movements on the aggregate, but also, within the aggregate, the individual ones’ (p. 11, emphasis added, personal translation).
Rather than understanding movements and regularities on the aggregate, predictive analytics intends to apply to the individual case. Siegel (2016) thus tries to characterize former kinds of predictions (‘forecasting’) and current predictive analytics with Big Data: Predictive analytics is a completely different animal from forecasting. Forecasting makes aggregate predictions on a macroscopic level. (…) Whereas forecasting estimates the total number of ice cream cones to be purchased next month in Nebraska, predictive analytics tells you which individual Nebraskans are most likely to be seen with cone in hand. (p. 16)
Pushed to the extreme, individualizing risk with very high accuracy conveys the imaginary of being able to predict individual claims occurrence. In such a theoretical (and highly improbable case), the insurer would be able to classify people in a dichotomous manner, separating those that will have accidents from those that won’t. Although the models remain probabilistic, the very high level of accuracy achieved in specific domains seems to give some valence to the imaginary of perfect prediction in others, one that would almost eliminate uncertainty. While it is a promise that no one really expects to be delivered, reaching always higher accuracy remains the target of predictive analytics. Applied to insurance however, such a theoretically perfect knowledge would also mean its end; insurance was indeed historically built upon the recognition of the irreducible opacity of individuals. In Knight’s definition, there was always a remainder – ‘true uncertainty,’ beyond measurable risks (Lehtonen and Van Hoyweghen, 2014). Big Data technologies promise to lift this opacity by delivering regularities between individuals as look-alikes, without the need to resort to the aggregative viewpoint. But to what extent could such a promise be delivered?
The personalization of risk?
Since 2010, say Cardon et al. (2018), ‘deep neural networks provoke the same disruption in information communities dealing with signal, voice, speech or text’ (p. 3). Likewise, Charpentier et al. (2018) mention numerous applications in credit scoring, fraud detection and targeted marketing (p. 4). As mentioned in the previous part, various devices and the Internet of Things open the way for certain branches of insurance to move from a traditional apprehension of risks, based on classification and averages, to ‘the new paradigm’. Does the access to Big Data leads to an apprehension of risks without resorting to any a priori classification? The aim of this part is to appreciate the extent of this shift.
Telematics is the oldest connected device in insurance and should therefore have the most mature applications. 3 Besides, contrary to health insurance that is widely regulated against risk individualization (Ewald, 2014; McFall, 2019), automobile insurance regulation, currently at least, gives more freedom to the insurer. This might be due to the repeated promise, by both the industry and public institutions, that these devices, coupled to proper insurance products, could lead to significant reduction in car accidents and fatalities (Husnjak et al., 2015; Bruneteau et al., 2012; Tselentis et al., 2016: 364). 4 We will therefore focus on telematics and motor insurance, although conceptually at least, the potential shift is the same in the other domains.
This part is thus focused on the way telematics displaced – or not – risk apprehension and pricing in motor insurance. It is based on a review of predictive analytics articles published on UBI and the use of telematics data over the last decade. It is therefore an exploratory analysis of documents. A lot is said in blogs and insurer’s sites, that contributes to fueling the ‘promise of personalization’ (e.g. Perret, 2018; Sandquist, 2019); our choice was instead to focus on published quantitative analyses, together with a few secondary articles (Meiring and Myburgh, 2015; Sagberg et al., 2015; Tselentis et al., 2016, 2017). The corpus was obtained by searching for articles on telematics/UBI and ratemaking, in academic or actuarial publications. We might add here a notice and a disclaimer: our study shows that the disruption actually did not happen. All the articles considered here indeed isolate new significant variables to be added to existing models, thus contributing to a more granular segmentation; this explains the rather optimistic conclusions reached by the secondary articles. From our perspective however, there was no change in the epistemology. Besides, our article is not an attempt to explain why insurance practice did not change. Although some hypotheses are evoked in the conclusion, the scope of the article is to show how the new data is used, without actually changing existing models.
Given the importance of data analysis for insurance and the existence of telematics products for over 15 years, the number of published studies that amounts to a few dozens, seems scarce. Furthermore, astonishingly few articles have been published in actuarial journals; road safety research is more fruitful and we draw primarily on these sources. Others have noticed this issue before us, and suggest that, the data being proprietary to insurance companies, their access to researchers remains restricted (Ma et al., 2018; see also Baecke and Bocca, 2017). This would mean that the models exist but are not made public. We will evoke in the conclusion other reasons for the limited number of articles in general, and in the actuarial field in particular. But the main point might be, simply put, that nothing revolutionary has happened yet.
As McFall mentions, the pricing based on apps and Internet of Things is at odds not solely with the conceptual frame of insurance, but also with its infrastructure and working practices (McFall, 2019: 54). It is possible that despite the promise for a personalized price advanced by the UBI providers (Meyers and Van Hoyweghen, 2018), the actuarial models do not or not fully incorporate the data delivered by the new devices. For Bian et al. (2018), ‘insurers and researchers are still trying to find an appropriate path for UBI’ (p. 21, emphasis added). Besides, the shift from an a priori pricing based on static data to a continuous revision of rates, mentioned in the previous part, might be not solely a conceptual issue, but also a practical one. Here too, as Moor and Lury (2018) suggest for other domains, ‘the technology that facilitates personalized pricing <might be> currently somewhat ahead of its use’ (p. 510).
Since all the studies confirm the predictive power of the variables provided by the devices on accidents, there is however a consensus among researchers that a change needs to occur in the near future. As Verbelen et al. (2018) put it: This potentially high dimensional telematics data, collected on the fly, forces pricing actuaries to change their current practice, both from a business as well as a statistical point of view. New statistical models have to be developed to adequately set premiums based on an individual policyholder’s driving habits and style and the current literature on insurance rating does not adequately address this question (p. 2, emphasis added). For decades, constructing a pattern-recognition or machine-learning system required careful engineering and considerable domain expertise to design a feature extractor that transformed the raw data (such as the pixel values of an image) into a suitable internal representation or feature vector from which the learning subsystem, often a classifier, could detect or classify patterns in the input. (p. 436) The added value of involving industry experts in the development of the predictive model is investigated by augmenting the model with expert-based telematics variables. These are additional features that are not automatically extrapolated from the raw data. Instead, these features are created as a smart combination of metrics from which experts expect a significant impact on accident risk (e.g. night trips during the weekend). (Baecke and Bocca, 2017: 72, emphasis added)
All the variables are humanly created in this manner: besides days and night trips (and the necessary categorization of time slots implied), one might mention speeding, measured as a proportion of trips above a certain threshold fixed by the researchers (Lahrmann et al., 2012), distribution of trips between urban or rural area – a classification again imposed on raw data (Ayuso et al., 2019; Guillen et al., 2019; Verbelen et al., 2018). Further personalization occurs with the attempt to characterize driving styles. The data involves pure telematics data such as hard brakes, accelerations and cornering, counted as events above a threshold (usually pre-defined by the box provider). Additional information is sometimes merged with the telematics database on a trip basis before aggregating the information at the monthly/annual driver level. When trying to model ‘driving style,’ the context of the trip becomes relevant: researchers therefore add information concerning real time traffic speed on the same road segment or the per cent above speed limits (Ma et al., 2018; Winlaw et al., 2019).
Far from being an ‘agnostic data analytics’ (Kitchin, 2014: 4), the studies actually reproduce preconceptions of risky behaviours, further tested for significance. In some cases, the work of numerical translation is not immediate; Jin et al. (2018) for instance propose considering the ‘familiarity’ of the driver with their driving routes. This was quantified by the number of recurring routes taken on overall on a monthly basis. In the same strand of thought, some studies redefine ‘driving styles’ as predefined categories thanks to telematics. 7 The study of driving styles for the improvement of road safety has indeed a long history, going back to 1949 (Sagberg et al., 2015), where descriptive studies were undertaken. Among the relevant traits was ‘aggressiveness;’ which is being redefined with telematics as ‘risky speeding profiles (irregular, instantaneous and abrupt changes in vehicle speed), improper vehicle position maintenance (quick changes in lateral vehicle position) and inconsistent or excessive acceleration and deceleration (harsh take-off and braking)’ (Meiring and Myburgh, 2015: 30657). Adding video information (to detect line changes and tailgating other vehicles) at the trip level, Kumtepe et al. (2016) train a classifier to define aggressiveness in line with an external observer’s judgment; again, the measure is not inferred from raw data but added to it as a subjectively created category. Interestingly, what these examples suggest is that quantification (or the human construction of variables) has not disappeared but instead of being imposed upfront in the questions and possible categories of answers, it is built bottom up from the data itself.
Most of the time, the researchers recommend adding the new variables to the existing classification (Ayuso et al., 2019; Baecke and Bocca, 2017; Ferreira and Minikel, 2012; Guillen et al., 2019; Paefgen et al., 2013; Verbelen et al., 2018), as the new variables function best in combination with the traditional ones. Sometimes however (e.g. Ayuso et al., 2019: 737), the hybrid model is seen as temporary. For Weidner et al. (2017), in the transitory period the UBI data determines a discount on the traditional tariff (p. 229), which is where we stand now (Meyers and Van Hoyweghen, 2018). Some also highlight how telematics variables may become a necessary input to existing models in replacement of other variables that are being removed by regulation: 8
Insurance companies are facing difficult pricing decisions, as several variables commonly used are challenged by regulators. The EU now forbids the use of gender rating. Territory is being challenged in the US as a substitute for race. Insurers are being pressured to find new variables that predict accidents more accurately and are socially acceptable. (Lemaire et al., 2016: 66, emphasis added)
Taking this path, some researchers thus argue that gender is redundant when telematics devices are implemented (Verbelen et al., 2018). Yet none of them recommend replacing the existing models, at least for the moment. The intent is not to disrupt insurance practices but rather to refine the existing segmentation thanks to new parameters; thus adopting a classificatory logic, and as Paefgen et al. (2013: 193) put it: ‘ideally, one might derive a one-dimensional aggregated variable that adds only one further dimension to actuarial tables.’
Conclusion
The aim of this article was to assess the impact of Big Data technologies for insurance ratemaking, and the possibility that these would actually achieve the ‘promise of personalization’ announced by some actuaries in the 1980s. We purposefully restricted the meaning of ‘personalization’ to fully-fledged individual risk assessment, without the use of a priori defined classes. From this perspective, reflecting on the potential paradigm shift introduced by predictive analytics in medicine, Wilson and Nicholls (2015) contend that: There are significant challenges in moving from traditional genetics, with its focus on monogenic disorders with significant implications for health of a very small proportion of the population, to the development of genetic profiling approaches which are useful for screening, risk assessment, disease prevention, and health promotion. The idea of personalized medicine as fully individualized medicine has still to be realized, and is likely unrealistic. (p. 17, emphasis added)
There is therefore a tension between imaginaries of personalization, and the calculative devices currently used to assess risks. Following Callon and Muniesa (2003), actuarial tools can be defined as ‘collective hybrids,’ made of both human and machines. Telematics and wearables do not easily integrate with existing calculative devices such as the traditional pricing tools of the actuaries. Hence the performativity, when the personalization of risk is concerned, seems at the moment to come mostly from human imagination. Our claim that no paradigm shift has occurred in insurance is somewhat surprising considering the long history of data analyses in this field, and must be qualified in a couple of ways. First, because it is based on published articles that do not necessarily reflect the actual practices of all insurers. Second, because, despite the fact that telematics as a technology is 15 years old, the field remains dynamic and open to further developments; such an indication can be gathered from our corpus, where the number of articles increases over time.
There are many possible reasons for the current status quo. Understanding the practical causes for the non-occurrence of the shift was out of the scope of this article and might well be the object of another study. Some hypotheses can be advanced at this stage. The ‘personalization of risk’ obviously challenges the business model of insurers. It also bears a reputation risk, when for instance a driver faces a rate increase because of ‘aggressiveness,’ without the occurrence of a claim. Other reasons might be similar to those encountered in medicine, and deal with existing infrastructure and practices. Actuaries have no doubt tried over the last decade to assimilate the new predictive techniques (Ollivier, 2017). Yet they might be doing so without a fully-fledged abandonment of existing models and infrastructures, either for the sake of maintaining their specific expertise, or because they did not find the new one sufficiently convincing.
This article focused on the conceptual challenge that might in itself participate in the explanation: as we have tried to show here, insurance is based on the pooling of risks, with an underlying assumption of homogeneity. The technique of risk classification reflects this anchoring into a group-based approach. Predictive analytics, claiming to replace it by an individual one, conceptually jeopardizes the very possibility of insurance. Indeed, at the extreme limit of the axis of segmentation (however realistic this point might be), would be a situation where an individual insured would be known to be heading to an accident. More realistically, it would lead to very high rates for the riskier persons, to a point where insurance would become unaffordable to them.
Should the unrealistic scenario occur and crashes become predictable, however, car accidents would not be uncertain events any longer; they would therefore fall out of the scope of insurance, as they would not be ‘risks’ any more. From this viewpoint, the absence of actuarial models willing to consider the radical end of the spectrum is reassuring. The conceptual resilience of insurance so to speak, and the slow pace of change simply reaffirm that insurance is, and will remain, about the collective management of uncertain events, that demands unperfect knowledge on individuals.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Acknowledgments
We would like to thank Pierre François, Thierry Cohignac, and three anonymous reviewers for their insights on a previous version of this article.
