Abstract
Using the Internet means encountering algorithmic processes that influence what information a user sees or hears. Existing research has shown that people's algorithm skills vary considerably, that they develop individual theories to explain these processes, and that their online behavior can reflect these understandings. Yet, there is little research on how algorithm skills enable people to use algorithms to their own benefit and to avoid harms they may elicit. To fill this gap in the literature, we explore the extent to which people understand how the online systems and services they use may be influenced by personal data that algorithms know about them, and whether users change their behavior based on this understanding. Analyzing 83 in-depth interviews from five countries about people's experiences with researching and searching for products and services online, we show how being aware of personal data collection helps people understand algorithmic processes. However, this does not necessarily enable users to influence algorithmic output, because currently, options that help users control the level of customization they encounter online are limited. Besides the empirical contributions, we discuss research design implications based on the diversity of the sample and our findings for studying algorithm skills.
Keywords
Many of the most popular Internet services (e.g. Google, Amazon, Facebook, Netflix, TikTok, Spotify) rely on algorithms (e.g. search results, recommendations) to personalize and customize users’ Internet experiences (Latzer et al., 2016; Latzer and Festic, 2019; Willson, 2017). An algorithm is a set of instructions that turn an input into an output (Hao, 2019). When we refer to algorithms we mean predictive algorithms, which based on a dataset learn their own instructions to predict the most successful (e.g. most relevant to a user) output to an input (O’Neil, 2016). What a user sees and hears online is often the result of algorithmic processes choosing from a large number of available alternatives. While algorithms offer users the convenience of finding relevant information, researchers have pointed out significant issues related to using such systems, ranging from social discrimination and biases, to manipulation and privacy threats (Beer, 2009; Introna and Nissenbaum, 2000; Latzer et al., 2016; Tufekci, 2014, 2015). This has motivated a growing number of studies to examine people's algorithm skills, in a variety of usage scenarios, focusing on whether people are aware of algorithms online, what understanding they have about them, and how awareness and understanding influence their usage behavior (Bucher, 2017; Cotter and Reisdorf, 2020; Dogruel, 2021; Gruber et al., 2021; Liu and Graham, 2021; Swart, 2021; Ytre-Arne and Moe, 2021). This is relevant, as research on Internet skills has shown that people possess different skills leading to inequalities in their ability to benefit from Internet use (Hargittai, 2007). We define algorithm skills in this paper as awareness of the possibility that customization can happen and understanding of how and why customization happens. The notion of algorithm skills does not examine awareness and understanding of algorithms separately, but includes their relationship in its analysis. Although extensive research has been carried out on describing people's awareness and understanding of algorithms, little attention has been paid to how algorithm skills enable people to use algorithms to their own benefit while also avoiding potential harms from them. Benefits include, for example, that people can adjust their expectations of algorithms and can assess more critically what content is shown to them. They can also influence algorithms so that the resulting content meets the user's needs better. Potential harms include one's content being evaluated in suboptimal ways (e.g. on a job search site) that may lower available opportunities (e.g. not being matched to a job or being contacted by a recruiter). One way to try to influence algorithmic systems is by manipulating what information the system has about the user (e.g. by deleting cookies or one's search history), because based on available personal data (e.g. search queries, search history, login data) algorithms assign relevance to results (Latzer et al., 2016). For this, users first need to be aware that personal data are collected online and they need to be aware of algorithms. Second, they need to understand that their personal data may influence the output of algorithms. Accordingly, our research questions are: RQ1: To what extent are people aware that online services collect personal data? RQ2: To what extent are people aware of algorithms when they use online services? RQ3: To what extent do people understand that there is a relationship between what personal data systems glean about them and the algorithmic output they receive? RQ4: To what extent do people make decisions about their usage behavior based on their understanding of the relationship described in RQ3?
We first provide a brief overview of existing research on people's perceptions of algorithms followed by a discussion of literature on people's knowledge of the collection of personal data by online services. Presenting results from the analysis of a diverse sample of 83 in-depth interviews in five countries about people's experiences with researching and searching for products and services online, we show that people tend to be aware that the systems they use collect information about them and also have a broad idea that algorithmic processes (i.e. customization and personalization of information) take place when they use online services. However, participants vary considerably in their understanding of how the information that systems have about them relates to algorithmic output, which then limits users in developing strategies to manipulate algorithmic outputs to mirror their usage preferences.
Research on algorithm skills
Introna and Nissenbaum (2000) in their analysis of ranking algorithms of search engines described already over two decades ago how search engines discriminate against website providers who have neither the financial capital nor the skills (specifically: knowledge of the existence and functioning of the ranking algorithms) to be listed among the top results of search engines. They concluded that this might lead to a centralization of power (i.e. access to users’ attention) for those with the resources and skills to meet the criteria of the ranking algorithm. Hargittai (2000) made a similar argument about how “portal sites” (i.e. content aggregators that were the precursors of search engines) favored certain content over others, and how those with more skills to understand the system would be better equipped to sidestep the limitations it posed on users. Further studies have researched how algorithms can foster biases and discrimination (see Bandy, 2021 for a review), for example, by showing different prices to different people for the same product or service (Hannak et al., 2014), or by personalizing advertisements so that they contain assumptions about criminal records based on a user's name (Sweeney, 2013).
Given that algorithms cause people to experience online content differently, an investigation into whether they are aware of algorithms and understand how user behavior can affect the content systems show them is warranted (Hamilton et al., 2014; Latzer and Festic, 2019). Existing literature on Internet skills has found that people possess different skills when it comes to, for example, the ability to find information, assess source and message credibility (Flanagin and Metzger, 2000; Metzger et al., 2010; Sundar, 2008), and understand privacy issues (Park, 2013) leading to inequalities in the ability to benefit from Internet use (Hargittai, 2007; see Litt, 2013 for a review). Internet skills are defined as a set of skills that enable people to use the benefits of the Internet effectively and efficiently (DiMaggio et al., 2004; Hargittai, 2002; van Dijk and van Deursen, 2014). Hargittai and Micheli (2019: 113–114) argue that the “awareness of how algorithms influence what people see” is another dimension of Internet skills and state: Those who understand that algorithms play a role in what content they see can both adjust their expectations and use strategies to find content in a way that sidesteps constraints imposed by platforms (Hargittai, 2000). Those who lack such awareness and understanding are more at the mercy of what sites are made available to them most prominently.
We define algorithm skills as a combination of: (1) awareness that algorithmic customization occurs in a given application; and (2) an understanding of the mechanisms (e.g. collection of personal data to assign relevance to results) and consequences (e.g. discrimination, manipulation, diminishing variety, the creation of biases, threats to data protection and privacy) underlying algorithmic processes. Algorithm skills allow people to use algorithms to their own advantage and to avoid harms they may elicit.
Ytre-Arne and Moe (2021) note that people's awareness of algorithms differs when it comes to asking them about algorithms in general versus focusing on the algorithms of one specific platform (e.g. TikTok). While for the former people develop abstract theories about algorithms, they focus more on operational theories in specific cases. A growing number of studies report on people's algorithm skills in different scenarios such as social media news feeds, recommendations on video platforms like Netflix, and voice assistant interactions (Bucher, 2017; DeVito et al., 2017, 2018; Dogruel, 2021; Eslami et al., 2015; Gruber et al., 2021; Klawitter and Hargittai, 2018; Rader and Gray, 2015; Siles et al., 2019). They show that people's algorithms skills are often based on personal experiences (Bucher, 2017; Cotter and Reisdorf, 2020; DeVito et al., 2018; Eslami et al., 2015; Gruber et al., 2021; Siles and Meléndez-Moran, 2021; Swart, 2021) and that people develop many different theories about how algorithms might affect what they see and hear (DeVito et al., 2017, 2018; Gruber et al., 2021; Lutz et al., 2021; Powers, 2017; Rader and Gray, 2015; Siles et al., 2019).
Comparatively less attention has been paid to the links of algorithm skills and user actions. Based on her research on people's imaginaries of algorithms on social media platforms, Bucher (2018: 117) argues “that algorithms do not just do things to people, people also do things to algorithms.” For example, by clicking consciously (liking certain posts, liking contradictory things) or posting consciously (the time of day, the use of particular words, the inclusion of pictures) users can strategically influence algorithms to their benefit based on their individual perceptions of the system. Similarly, they can attempt to avoid potential harms (e.g. through the use of an incognito browser tab or by consciously omitting certain information).
Rader and Gray (2015) introduce the notion of feedback loops, describing how people's interactions with an algorithmic system influence their usage experience and vice versa. Similarly, Siles and Meléndez-Moran (2021) describe how TikTok users adjust their usage practices (i.e. consciously liking or not liking certain content; time spent watching certain content; what profiles one visits) to influence the results of the TikTok algorithm. The authors use the notion of “passages” to refer to how the link between awareness of algorithms and usage practices keeps changing, describing a form of relationship between users and the algorithm. They conclude by asking for more research on how algorithm awareness “engenders certain actions and behaviors” (p. 24). Based on online discourse materials and interviews with members of a community of YouTubers, Cotter (2022) developed the concept of “practical knowledge” to describe people's ability to “to accomplish X, Y, or Z within algorithmically mediated spaces (practice) as guided by the discursive features of one's social world (discourse)” (p. 7).
Another line of work has looked into how awareness of algorithms relates to users’ resistance of using algorithmic services (e.g. Christin, 2017; Kellogg et al., 2020; Siles et al., 2020; Velkova and Kaun, 2021). For example, one study puts the focus on how people's professional background can influence their algorithmic imaginaries (Christin, 2017). It notes the impact of different predominant imaginaries among journalists versus legal professionals in the US and France on whether people use (journalists) or resist (legal professionals) algorithmic systems. Another study describes how employees aware of algorithmic control through management use different strategies to resist the algorithms (Kellogg et al., 2020). Velkova and Kaun (2021) take a different perspective and reflect on how resistance to algorithms can also lead to active engagement with algorithms. They present the case study of an artist who changed the algorithmic output of the Google Search algorithm based on her algorithm skills. While existing studies provide a first overview of how people develop algorithm skills, what algorithm awareness and understanding of algorithms people have, and how these affect their usage behavior in general, less attention has been paid to specific strategies that allow people to use algorithms to their own benefit.
One way for users to influence algorithmic output is to understand that what information systems know about the user plays a key role in algorithmic processes. That is because companies populate their algorithms with information about users to customize the content they show them (Freedman, 2020; Friedman et al., 2015; Just and Latzer, 2017; Latzer et al., 2016; Quick, 2020). However, few studies have explicitly researched people's understanding of the relationship between personal data collection and algorithmic output. In the following section, we summarize existing research on people's awareness that their personal data are collected when they use online services.
People's awareness of personal data collection online
Being aware that personal data are collected online allows people to recognize the role of such data in personalization processes. People are often uncertain about such processes (Auxier et al., 2019; Kang et al., 2015; Pleger et al., 2021). Existing research shows that people are not sure about the meaning of concepts such as “data protection” and “data security” (Pleger et al., 2021), about how personal data are collected and how they can be protected (Boerman et al., 2021; Kang et al., 2015), and that they often report a lack of understanding about what companies do with the collected data (Auxier et al., 2019). Additionally, people feel that they have little control over how their personal information is used (Auxier et al., 2019), feeling resigned to the fact that if they want to continue using online services there is nothing they can do against their data being collected (Hargittai and Marwick, 2016; Lutz et al., 2020). With regard to other Internet skills, research shows that people with higher Internet skills are better able to protect their privacy online (e.g. delete cookies or adjust privacy settings; boyd and Hargittai, 2010; Büchi et al., 2017), but literature has not made the connection to action that would nudge algorithms to function in a way that meets a user's preferences.
The literature presents mixed results about how awareness of the collection of personal data online affects people's Internet use. While existing studies do not look at how such awareness influences people's interactions with algorithms—the focus of this study—some work has investigated how awareness of the collection of personal data relates to privacy concerns, showing that people who are aware are also more likely to be concerned about the protection of their privacy (Baruh et al., 2017; Rader, 2014). Yet, one study surveying 600 Japanese social media users found that participants who were more confident in their ability to manage their personal data online were less likely to be concerned about the protection of their privacy (Morimoto, 2020). While a survey among US social media users showed that participants who were more concerned that their personal data are collected were less likely to use social media (Cain and Imre, 2021), a meta-analytical review of research on privacy management and privacy concerns indicated that people with better awareness of privacy risks and understanding of strategies to protect their privacy online are more likely to use the Internet (Baruh et al., 2017). In sum, there are conflicting results about how awareness links to behavior.
Overall, existing studies show that people are often uncertain about the extent to which their personal data are collected and used by online services. This uncertainty could be an obstacle to understanding the relationship between personal data collection and algorithmic output. Without this understanding, users might not be able to develop strategies to influence algorithms through controlling what they share with online services (e.g. by deleting cookies or their search history before proceeding with a particular action). Moreover, the literature suggests that people who are only aware of personal data collection without having an understanding of why their user data are collected are more concerned and might even limit their Internet use (Baruh et al., 2017; Cain and Imre, 2021). On the other hand, people who understand why their data are collected and know strategies to address this in ways that reflect their preferences are more confident in using the Internet. This emphasizes that people who understand the relationship between user characteristics and algorithmic output might not only know strategies to use this understanding to influence algorithms to their own benefit, but are also more encouraged to use online services overall.
Methods
To answer our research questions, we analyze data from a study using semi-structured in-depth interviews with a diverse group of adults.
Data collection
We conducted 83 one-on-one in-person interviews with adults 18+ in summer 2019 in five countries. Recruitment materials referred to a study about online experiences so as not to bias against or toward people who are more or less skilled with technologies. We recruited a diverse sample of respondents through our social networks using snowball sampling. To diversify the countries that are most often included in research about algorithms (i.e. the United States), we sampled participants from several countries: the United States (28), Germany (22), Bosnia (13), Hungary (11), and Serbia (9). We conducted the interviews in respondents’ preferred languages, taking advantage of research team members’ knowledge of the local language. During recruitment, we paid particular attention to having roughly equal numbers of men and women, and a diversity of ages.
Interviews mostly took place in urban areas (57) with a fifth (18) conducted in suburban towns, and 13 in rural communities. They averaged around 36 min ranging from 17 to 68 min. Respondents received US$20 or the equivalent purchasing-parity value in the local currency for participation. All interviews were audio-recorded and transcribed. Interviews conducted in German, Bosnian, Hungarian, and Serbian were translated to English for coding.
At the end of the interviews, we also asked participants to fill out a short survey with core demographic information (year born, gender, education, employment status) and their Internet experiences and Internet skills. The latter included a multiple-choice question about what an Internet cookie is as well as an oft-used Internet skills measure to give a sense of sample composition on these characteristics (Hargittai and Hsieh, 2012). The average age was 40 (min: 19, max: 78); half of participants were female, half male; just over half had a college education, just over a third had completed some post-secondary education without a degree, and ten percent had no more than a high school degree. Just over half were employed, a quarter were studying, and a sixth were retired. The occupations stated by the participants were diverse and included, for example, a coffee shop owner, a politician, an unemployed person, a forester, a housekeeper, an architect, several retired people, and several students (for more details, see Table 1 with details on participants’ sociodemographic background). In terms of their Internet skills, they averaged 3.1 on a 1–5 scale (standard deviation: 1.0) while 41% could choose the correct definition of an Internet cookie from among four options. These indicators suggest a fair bit of sociodemographic diversity as well as varied digital savvy.
Sociodemographic background of participants.
Interview protocol
To avoid biasing the results about participants’ awareness of algorithms, we did not ask people explicitly about algorithms; we did not mention the term “algorithm” during the interviews unless the respondent had already brought it up. Instead, we focused on people's experiences with online behaviors where algorithmic processes often take place. Researching products (e.g. on Amazon, eBay, and local equivalents) and travel services (e.g. accommodations, flights) are two of the most popular online activities in the United States and Europe (Eurostat, 2019; National Telecommunications and Information Administration, 2020). Websites addressing such needs rely on algorithmic processes to provide customers with often customized information (e.g. Burges, 2021; Latzer et al., 2016; Morrison, 2020).
To learn about people's algorithm skills when they research products and travel services, we first asked them about their experiences with such websites. Then we followed up with questions on information selection (how participants think a site decides what to show them and why they think the information is listed as it is), customization (whether participants notice differences in search results depending on when and with which device they search for information; and if yes, why that might be the case), and personalization (whether people think that different people may see different results for the same search, and if yes, why). Next, we focused on people's awareness of the personal data collection by online services through asking about data collection (what people think sites know about them), sharing (whether people know that other sites may have access to the collected information), and protection (what people think they can do about sites collecting information about them). Understanding how to protect against data collection can give people strategies to avoid and influence algorithmic processes.
Analysis procedure
We analyzed the data in three steps. First, a research assistant read through all the interviews, assigned the relevant section according to the six overarching topics (information selection, customization, and personalization as well as personal data collection, sharing, and protection), and wrote short summaries of each case. In a second step, we coded all interview sections on each topic in detail to identify differences and similarities between the participants’ answers. Throughout this process, we added inductive codes to the coding scheme whenever new elements appeared in the material and wrote memos based on the analysis. We sorted and summarized the codes to identify themes and to answer our research questions.
Results
In this section, we first report on participants’ awareness that systems collect personal data and their awareness that algorithmic processes occur when they use online services. Then, we describe participants’ understanding of the relationship between systems’ assumptions about personal data and their algorithmic output. Finally, we describe the behavioral implications of different levels of algorithm skills.
To what extent are people aware that online services collect personal data?
In response to our first research question, we found that almost all interviewees were aware that personal data may be collected while they search for products and services online. Participants mentioned different types of such data ranging from specific behavioral information (e.g. search history, usage behavior) to personal information (e.g. location data, login information, email address) as well as their “needs” and “interests.” Some used rather general overarching terms to describe this such as “a lot” or even “everything” when it comes to what information sites and platforms collect.
To what extent are people aware of algorithms when they use online services?
To assess whether respondents understood that algorithms are at work when they use various services, we analyzed whether participants were aware of customization practices by the sites and services they use. The majority of participants were aware that such customization happens. Asked about how sites select content, whether they ever noticed changes in their search results, and whether different people might see different results for the same search, most seemed to be aware that the information they see was somehow preselected for them by the system.
To what extent do people understand that there is a relationship between what personal data systems glean about them and the algorithmic output they receive?
Participants varied considerably in their understanding of how and why customization processes happen. The distinction between awareness and understanding is reflected in the answer of a 35-year-old nurse from Bosnia when asked whether she thinks somebody else could get a different result for the same search:
Yes.
And why would that be the case?
I do not know why that would be the case, but I think yes.
This respondent was aware that customization may happen, but explicitly noted that she does not know why it would happen.
It was noticeable that participants who were unsure and seemed to know little were more likely to think anything was possible (e.g. all data are collected, information is always personalized), rather than assuming that the capacity of technology is limited. Asked whether different people might see different results, a 53-year-old nurse from Serbia replied:
Well, it's possible, there are various kinds of data.
What do you think a site like Google knows about you?
I do not know. Whether it's our Internet that goes over TV and probably has our number, it can probably have all the information.
And do you know whether other websites have access to the same information?
It's that big kind of technology, so it probably does. The technology is so powerful, but I’m not an expert on it.
To this respondent, there is an abstract powerful technology that collects all information about its users, but how it does so or with what implications is elusive to her.
A 56-year-old commercial clerk and interpreter from Germany was aware that different people can see divergent results for the exact same search, but explained this by pointing to variations in people's Internet connections:
Do you think it can happen that someone else gets different results for the exact same search?
Could be, because maybe it's all related to the connections, that one has another connection, that one is faster, that one – I can imagine that well.
By connection do you mean Internet connection?
Yes, exactly, that the one has another Internet connection.
Knowing (i.e. being aware) that personal data are collected does not necessarily mean that participants are also aware of algorithms. This is how a 47-year-old male administrative assistant from the US replied to the question asking about what sites know about their users: “They know exactly […] where you’re coming from, or what you might have been going through, looking at, and [your search] history.” However, when asked whether someone else who searched for the same products or services on their own computer would get a different result, he seemed unclear about customization processes based on personal data. On the other hand, we found that almost every participant who was aware of algorithms was also aware of the collection of personal data. In other words, being aware of the collection of personal data is linked to understanding that customization happens.
Some of the participants who understood how personal data are used to customize content could apply this knowledge across different sites and platforms. A 29-year-old industrial designer from Germany explained: “Your browser always stores data. And depending on when you access something, what this cookie storage looks like, the things that you are shown change.” This understanding helps participants distinguish how and why algorithmic processes take place in different usage scenarios.
In other cases, understanding is highly situational. While some participants seemed to understand that personal data are collected for customization purposes in their answer to one question, they were not sure about it in their next response. For example, a 45-year-old bank employee from Hungary described that search results are personalized based on a user's search history. Yet, when asked whether someone else could get a different result for the same search, she was not sure. Similarly, a 75-year-old retired electrician from Germany understood that personalized ads are shown based on a user's search history, but did not know whether different people would see different results for the same search. A 23-year-old college student from Germany understood how information is customized on one site (i.e. a flight comparison site), but did not know how this worked on another site (i.e. an online clothing store). This shows that although participants are aware that information is customized and some understand how it may be customized, they do not always know in what instances such customization (i.e. algorithmic processes) may happen indicating that they are certainly lacking some level of understanding about algorithms.
Understanding how and why customization happens seems to be related to usage experiences, leading participants to develop theories to explain what they perceive. A 34-year-old economist from Serbia described her experience of how searching for the same information several times influences the ordering of future search results: [When I listen to the same music video on YouTube several times] it often happens to me that it pops up among the first ones [in future searches], so I think that it has to do with the number of views, that something we look at often is shown to us as a reminder. Like “click me again,” I think.
Thinking about the collection of personal data seemed to prompt some participants’ understanding of how such data are used to customize content. A 48-year-old massage therapist in the US did not show nuanced algorithm skills when asked about how the content he sees is selected. Yet, when asked if sites might share collected information between each other, he started to talk about cookies and how they are used to follow users across sites to personalize content: For sites to better serve you, they want access to your cookies. If you don’t allow it, then only certain information will be presented to you. If they have all your information, then they can probably give you an abundance of different types of stuff related to what you’re looking into.
Some participants related algorithms immediately to the collection of personal data. For example, the 23-year-old female bachelor student from Germany mentioned earlier answered the question about how a site decides what to show a user: “The sites where I buy my clothes probably decide what to show me based on the clothes I’ve already bought and my usage behavior. They probably have my preferences stored.”
We found that the term “cookie” was mentioned by participants in connection with algorithms and the collection of personal data. For example, a 28-year-old German female biomedical engineer said in the context of whether search results might change on a flight comparison website: “Either I’m imagining it or it's really like that, but I think it's really like that because of the cookies you’re saving, the price is increasing somehow.” And asked about her privacy protection strategies, a 20-year-old college student from the US also mentioned the term cookie: “There's the cookies settings. I have played with those sometimes, and then potentially there are other privacy settings I just haven’t used.” Therefore, we wondered whether awareness of cookies would suggest a deeper level of algorithm skills. After all, cookies play a central role in algorithmic processes (Freedman, 2020) and a basic understanding of their existence, their function, and how to get rid of them could indicate good algorithm skills.
We found that some participants who used the term “cookies” seemed not to know what cookies actually are. Indeed, according to the survey data we collected from all participants, 49 out of the 83 participants (or 59%) could not identify the correct definition of a cookie in a multiple-choice question. Clearly some people used the term as a buzzword they had heard related to privacy protection without fully grasping its meaning or significance. Asked about strategies to protect her privacy, a 39-year-old housekeeper from Germany replied: “Maybe it has to do with cookies, I don’t know what that is all about.” An important take-away here is that the mention of a term by a respondent does not necessarily signify fully understanding the underlying concept or knowing how to act on it. The fact that participants mentioned the term “cookies” could be due to the banners and pop-ups that are shown to users when they visit websites, which are a result of the General Data Protection Regulation (GDPR) from the EU, asking them to consent to cookie settings (Koch, 2022). But recognizing a term certainly does not equate with a full understanding of it.
To what extent do people make decisions about their usage behavior based on their understanding of how the personal data that systems glean about them influence the algorithmic output they receive?
We asked participants: “Is there anything you can do about sites collecting information about you?” and if yes: “What can you do?”. Together with earlier questions that gave us insights into a participant's awareness and understanding of algorithms, responses to these questions enabled us to uncover what actions people may take in an attempt to influence algorithms. This allowed us to analyze how people's understanding of the relationship between personal data and algorithmic output may influence their online actions. Some participants’ responses seemed to indicate that such an influence does exist. For example, a 50-year-old male from the US who works with electrical systems, said: Because I don’t need to [stop sites from collecting personal information]. Because the convenience of them knowing, “Hey, this is what I looked for yesterday. I’m back at it again….” The fact that I can sign in, and they have all that information there for me, already saves me time.
Similarly, a skilled 29-year-old German freelance industrial designer seemed not to be bothered by the fact that personal data are collected, because he appreciated the convenience of personalized information: So, I mean, it gives me the convenience that I can have a lot of information with one account, few passwords. And that's partly to my advantage that things are suggested to me that I find interesting and stuff like that.
Based on their understanding of the relationship between personal data and algorithmic output, these participants weighed the pros (customized content) and cons (collection of personal data and thus potential privacy violations) of use and made a conscious decision to accept that their personal data are collected because of convenience, but also because they saw value in personalized content.
A recurring theme in responses—across different skill levels—was to limit use in order to protect personal data. A 49-year-old science professor from the US had the following rather cynical response when asked whether there was anything he could do to limit sites collecting information about him: “Yeah, I don’t have to search for anything, but other than that, probably not.” A 55-year-old social education worker from Germany explained how she tries to avoid clicking on certain information online so that her data will not be collected: “What I’m already trying to do is not click as much as possible.” Additionally, in almost a quarter of all interviews, participants mentioned that they have no chance to protect their data and feel powerless or not skilled enough to do so. A 24-year-old retail worker in the US shared the following when asked whether she can do anything to protect information being collected about her:
I’m sure there is, I mean, I personally can’t.
I see. Like you don’t know.
Yeah. I’m sure there are software engineers who can do all that stuff.
Interestingly, even though some participants seemed to be very aware that their personal data are collected for customization processes and also knew strategies to protect against this, they still ended up feeling powerless. A 21-year-old college student from Serbia noted: “The database [from applications, sites, and platforms] keeps everything, every piece of data is analyzed whether we wipe them out or not. These data go to their database, and they stay there.” Some participants were so used to the fact that their data are constantly collected that they stopped caring about it. A 22-year-old male college student from Bosnia thinks that nothing can be done about personal data being collected and added: “For me, this is now something normal.”
Some participants felt that there was nothing they could do to prevent online services to collect their personal data except to stop or limit their Internet use—which was not a real option for most In this sense, they did not differ significantly from participants with lower levels of algorithm skills—as these respondents also reported feeling powerless in the face of personal data collection. Understanding the relationship between personal data and algorithmic output thus has its limitations. On the one hand, it enables participants to think consciously and critically about the implications of their Internet use and—to an extent—adapt their usage accordingly (e.g. by deleting their search history or by actively enjoying the benefits of algorithmic personalization). On the other hand, it does not necessarily give them the means to take full control of the collection and use of their personal data.
Discussion
This study has explored people's algorithm skills and how they enable people to use algorithms to their own benefit while avoiding harms from them. Thus, it contributes to existing literature by unpacking the links between awareness and understanding of algorithms while also considering how such awareness and understanding may influence user actions. Our interview data suggest that being aware about the collection of personal data links to understanding algorithms. Skilled participants understood that the information that platforms and sites can collect about them influences the content shown to them. Such skilled users shared the following characteristics. They:
Had a basic understanding that customized information is shown to them in different scenarios of use based on the collection of personal data. Understood that they can adapt their usage behavior to influence the information shown to them. Formed opinions about the pros and cons of algorithms.
However, our data also suggest that understanding the relationship between personal data and algorithms might not be enough to make strategic decisions of use, because participants were uncertain about what choices they actually had. Interviewees developed various individual theories to explain how information is selected on different platforms and websites mirroring others’ findings in the literature (Bucher, 2017; DeVito et al., 2017, 2018; Gruber et al., 2021; Lutz et al., 2021; Powers, 2017; Rader and Gray, 2015; Siles et al., 2019). What users missed was a comprehensive approach that would universally allow them to control the collection of their personal data and thereby influence algorithmic processes across different platforms and websites (Hargittai and Marwick, 2016).
Extending existing literature that has highlighted how users adapt to algorithmic environments in order to resist and use systems to their own benefits (Bucher, 2018; Christin, 2017; Kellogg et al., 2020; Siles and Meléndez-Moran, 2021; Velkova and Kaun, 2021), our research shows that participants often see their only option as severely limiting or stopping use of the Internet if they want to avoid customization and protect their data (Vertesi et al., 2016). Yet, as participants cynically pointed out, that is not a real option, since Internet non-use would bring many disadvantages and challenges to their everyday life, a point that echoes cynicism found elsewhere in the literature (Hargittai and Marwick, 2016; Lutz et al., 2020). The above indicates that the previously studied concept of privacy cynicism, describing “an attitude of uncertainty, powerlessness, mistrust, and resignation toward data handling” (Lutz et al., 2020: 1168) might also apply to algorithms. Users are aware of customization processes and care about them, but only have limited options and information to do something about them.
In addition to the empirical contributions, our study also provides research design advancements for future qualitative and quantitative work studying people's awareness and understanding of algorithms. For qualitative work, we highlight the importance of a larger-than-usual and more diverse sample to get a comprehensive picture of different perspectives. While qualitative research often relies on samples of 25–40 interviewees (Bucher, 2017; DeVito et al., 2018; Klawitter and Hargittai, 2018; Liu and Graham, 2021; Siles and Meléndez-Moran, 2021), we found it helpful to have a larger sample to gather more examples and evidence of algorithm skills (or lack thereof). Given the uncertainty that surrounds algorithmic systems, such expanded data collection can be helpful. The size and diversity of our sample helped highlight the problem of limited shared awareness of the possibility that customization can happen and limited shared understanding of how and why customization happens in algorithmic systems.
Additionally, we recommend prompting participants with different usage scenarios (e.g. shopping for products versus travel planning), since individuals showed varying skills related to different types of activities, platforms, and websites. Participants alternated between general and specific algorithm skills (Ytre-Arne and Moe, 2021). Some had a specific understanding of the algorithms on one platform and applied this to other platforms (e.g. applying their algorithm skills with Google Search to other platforms such as Booking and Amazon). Others seemed to have some general algorithm skills and applied these to specific platforms (e.g. being generally aware that algorithms influence the content of platforms and therefore assuming that this must also be the case when searching for something on Amazon).
The point about different contexts can also be relevant for quantitative work as it highlights the importance of including survey items reflecting different usage scenarios. This will help avoid distorting participants’ algorithm skills (e.g. someone who might know a lot about Google Search might know relatively little about Amazon, thereby appearing skilled when a survey question focuses on the general search platform and less skilled when an item focuses on the big retailer). For knowledge-based measures of algorithm skills, we discourage researchers from relying on the recognition of the term “cookie,” as familiarity with the term cannot necessarily be equated with actual understanding of its meaning. Our research shows that including questions about personal data collection is a good indicator for algorithm skills. Therefore, we recommend measuring algorithm skills with items focusing on personal data collection as well as customization processes, especially to distinguish between those who are only aware of algorithms and those who understand the relationship between personal data and algorithmic output.
A limitation of our work is that although we strove for a diverse sample of participants based on gender, age, and national background, we had a majority of higher-educated people. Research on Internet skills across different countries has shown that education is positively related to Internet skills (see Litt, 2013 for a review), meaning that certain perspectives—perhaps especially less informed ones—might be underrepresented in our data. Another limitation of the work is that our analysis of how users’ skills and actions relate is based on data about statements of intent and decisions (e.g. “I would delete cookies”) rather than actual actions. Thus, one path for future research would be to study people's actions in situ, incorporating the think-aloud technique (Nielsen et al., 2002).
Conclusion
A growing number of studies report on people's algorithm skills in different scenarios of Internet use (Bucher, 2017; DeVito et al., 2017, 2018; Dogruel, 2021; Eslami et al., 2015; Gruber et al., 2021; Hargittai et al., 2020; Klawitter and Hargittai, 2018; Rader and Gray, 2015; Siles et al., 2019). Yet, we know surprisingly little about what algorithm skills actually mean in terms of how skills might help people use digital media to their benefit while avoiding its harms. In this study, we investigated people's understanding of the relationship between systems collecting people's personal data and algorithmic output. Sites and services collect and use available personal data to customize the content they show their users (Freedman, 2020; Friedman et al., 2015; Just and Latzer, 2017; Latzer et al., 2016; Quick, 2020). We found that skilled participants used their understanding of personal data collection to make sense of algorithms online.
In the context of ongoing important discussions about bias, discrimination, and possibilities of manipulation in algorithmic applications (Bandy, 2021; Beer, 2009; Just and Latzer, 2017; Noble, 2018; O’Neil, 2016; Tufekci, 2014, 2015), our study contributes to understanding how users interact with technology in everyday life, and how algorithm skills influence their usage behaviors. Users seem to be aware of customization processes through algorithms—such as being aware of the collection of their personal data (Hargittai and Marwick, 2016)—but miss having convenient ways to control the level of customization of the content shown to them by platforms and websites. Instead of a patchwork of different approaches on different platforms and websites, regulators should push for uniform policies and technical functionalities that help users take control of their data and the information they receive in a uniform way.
Additionally, our results can inform future quantitative work that measures people's algorithm skills through survey items. Since people may recognize terms (e.g. cookie) without having a deeper understanding of what they mean, instruments need to be careful not to conflate awareness of the possibility that customization can happen with understanding of how and why customization happens.
The overall take-away, then, is that understanding the relationship between personal data and algorithmic output is an important element of algorithm skills. Algorithm skills allow people to use the Internet in an informed way, help them critically reflect on the information displayed to them, and allow them to develop strategies to use algorithms to their own benefit while sidestepping their harms. Yet, our research shows that the latter is limited, because there are currently no uniform and convenient options in place that help users control the level of customization they encounter online across sites and platforms. That said, being more skilled allows users to implement strategies that give them more control over what personal data get collected and potentially utilized for content customization. This means that more-skilled users have more agency compared to less-skilled users, contributing to digital inequality in how people of different abilities can benefit from their Internet uses.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
