Abstract
Firms use A/B testing, a methodology to compare two or more versions of ideas to see which one performs better, to decide on features of their digital products and services. Although A/B testing brings benefits, some A/B testing can lead to digital exploitation (i.e., appropriating resources from users to increase firms’ performance). We explicate why A/B testing can lead to digital exploitation, suggest mitigation options by establishing Institutional Review Boards, explicitly seeking users’ consent, and implementing incentive schemes, and conclude with research directions on this topic.
E-commerce companies increasingly use A/B testing to stimulate purchases. Streaming companies use it to secure subscriptions, gaming companies use it to nudge users to purchase in-game products, and finance companies use it to get users to sign up for their loan services. While A/B testing can accelerate firms’ innovation process, increase performance, and create value for customers, we highlight a pervasive yet underrecognized phenomenon that we term digital exploitation based on A/B testing (henceforth, digital exploitation). By digital exploitation, we refer to cases where firms’ A/B testing identifies measures that appropriate resources from users to increase firms’ performance. In this commentary, we explicate what A/B testing is; outline how it can cause digital exploitation of users; propose that firms can mitigate the negative consequences of A/B testing by establishing Institutional Review Boards, seeking users’ consent, and designing incentive schemes; and conclude with future research directions on this topic.
What Is A/B Testing?
A/B testing is a methodology to compare two or more variations of ideas to identify better performing ones (Kohavi & Thomke, 2017). For instance, YouTube launched an A/B testing feature in 2023 called “Test & Compare,” which allows creators to test three thumbnails per video to see which one elicits more clicks. Similarly, Amazon offered an A/B testing tool called “Manage Your Experiments” in 2021, which enables sellers to A/B test versions of images, titles, and descriptions to identify those that increase sales.
A/B testing is a powerful method for two reasons. First, the underlying principles of A/B testing are based on experiments with random assignment of users to treatment and control groups, which can test causal relationships. Accordingly, firms can use A/B testing to identify factors that improve performance indicators, and optimize their digital products accordingly. Second, A/B testing is a big data-driven method. Firms can leverage A/B testing to collect data from large numbers of users and draw reliable conclusions based on analyses of such big data. Altogether, the advent of A/B testing can dramatically improve the accuracy of decision making in firms’ innovation activities.
How Does A/B Testing Lead to Digital Exploitation of Users?
A/B testing can lead to digital exploitation of users for two reasons. First, A/B testing facilitates firms’ collection of users’ data. Based on analyses of that data, firms can identify measures that improve their performance at the cost of users’ resources (e.g., money, time) (Bhargava & Velasquez, 2021). For example, through A/B testing, Amazon implemented a change that places sponsored products next to actual products in customers’ baby registries. These sponsored products look just like other products on customers’ actual wishlists, except for a small “Sponsored” tag on the top. This design change has caused users to unwittingly buy unwanted sponsored products that appear in baby registries, causing inconvenience both for the purchasers and those who created the registries. Similarly, tech companies use A/B testing to formulate designs that lead users to increase their screen time to unsustainable levels (Montag et al., 2019).
Once firms identify a measure that can increase performance, they tend to implement it, even if there are potentially detrimental implications for users. This tendency results from two factors: assured returns and competitive pressures. A/B testing provides credible information about the returns that firms can expect from implementing a measure, making it difficult to resist doing so. For instance, in an A/B test in 2013, Electronic Arts found that removing an offer message on the pre-order landing page can boost users’ purchase of Sim City by 43.4%. In addition, competitive pressures compel firms to adopt measures identified through A/B testing to maximize their returns. If a firm does not implement such measures to appropriate users’ resources, these resources will likely be claimed by competitors who implement similar changes. Firms are aware that the more money or screen time that users spend on an online App, the less they have for others.
Second, users are not made aware of their involvement in A/B testing. For instance, Facebook engaged 689,003 users in an A/B testing in 2012 without explicitly seeking their consent (Kramer et al., 2014). In this experiment, users’ news feed was manipulated by removing either all the positive or all the negative posts. Subsequently, the former group of users produced more negative posts and fewer positive posts, whereas the latter group displayed the opposite pattern. More broadly, measures identified through A/B testing can increase users’ addiction to products and services. For instance, optimization of recommendation algorithms through A/B testing can generate feeds that glue users to the screen, “infinite scrolling” for hours. As such, users not only participate in A/B testing unwittingly, they can also experience negative consequences during and after A/B testing.
Other digital technologies can intensify digital exploitation caused by A/B testing. For example, generative AI (e.g., Chat-GPT) can assist decision-makers to produce more ideas, which can then be the options evaluated by A/B testing. Thus, generative AI quickens the idea generation process, while A/B testing accelerates the assessment and selection processes, jointly producing an automated and efficient routine that can intensify digital exploitation. In addition, firms with better data analytics capabilities can better mine insights from A/B testing data to devise sophisticated measures that can intensify digital exploitation.
How Can Firms Mitigate the Negative Consequences of A/B Testing?
We propose three ways by which firms can mitigate the undesirable outcomes of A/B testing. First, firms can establish Institutional Review Boards (IRBs), which should include external members (e.g., researchers, consumer advocacy), to monitor A/B testing. There can be guidelines to decide which A/B tests are to be sent for IRB review. For each such test, IRB should vet the hypothesis, users who will be engaged, user data to be collected, and variations that will be presented. Second, firms should seek users’ consent for their participation and use of their data before including them in A/B testing. For example, a pop-up window can ask if users wish to participate in an A/B test, with a note like “We invite you to participate in a test of our new recommendation algorithm.” The setup should allow users to easily choose to agree or disagree. For those who wish to know more before deciding, the pop-up window should present a link to terms, conditions, and further information about the objectives and procedure of the A/B test. Third, firms can design incentive schemes, such as offering coupons for users who agree to participate in A/B tests, to compensate them for their time and unanticipated undesirable effects they might experience during and after A/B testing.
What Can Business Scholars Do?
The potential dark sides of A/B testing call for research attention from business and society scholars. First, future research can explore whether there are negative implications of A/B testing for other stakeholders, such as an organization’s employees or its organizational partners. Second, since restricting the usage of A/B testing can impede the growth of the digital economy, research can investigate how to devise prudent policies about A/B testing to balance the trade-offs between digital economy growth and the welfare of users and other stakeholders. Third, researchers can examine how digital exploitation can vary based on users’ demographics (e.g., income, age), product categories (e.g., health, education, games, gambling), and product features (e.g., algorithm, user interface). Fourth, studies can outline how standard-setting agencies (e.g., International Organization for Standardization) can develop standards to guide the appropriate application of A/B testing (Flyverbom et al., 2019). Overall, research on these topics will inform how firms, policymakers, and users can reap the benefits of A/B testing while limiting its negative consequences.
Footnotes
Acknowledgements
We would like to express our gratitude to editors Hari Bapuji, Frank de Bakker, and Simon Pek for their constructive feedback in the review process. We would also like to thank Sharique Hasan, Rembrand Koning, and Phanish Puranam for their much-appreciated feedback on earlier versions of this commentary. Standard disclaimers apply.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Tengjian Zou acknowledges the support by the National Natural Science Foundation of China (72202206, 72091312, 72232009) and the support by the Fundamental Research Funds for the Central Universities.
