Abstract
Privacy has been an important topic within the geospatial science community, particularly driven by the widespread adoption of geospatial technologies such as mobile devices and the vast amount of location data they generate. This has sparked considerable interest in location privacy that is specifically dedicated to the protection of location information. However, existing literature on location privacy mostly focuses on preserving anonymity and protecting against individual identifiability when using geospatial data. While this is undoubtedly valuable, it may prove insufficient in a landscape characterized by pervasive data collection and analytics. This article argues that the powerful capabilities of algorithms in data-intensive geospatial analytics allow for the profiling of groups of individuals and the prediction of sensitive information that has not been explicitly collected, without necessarily compromising individual identifiability. Nonetheless, these practices pose severe threats to privacy and can contribute to inequality in the treatment of certain populations, leading to structural and societal challenges. In response to these challenges, a collective approach should be embraced to address location privacy concerns. Both regulatory and technical practices need to acknowledge the interdependency of privacy, while individuals should cultivate an awareness of cooperation in privacy protection.
Introduction
Mature information societies are distinguished by the prolific generation of data to provide insight into human behavior (Floridi, 2016). In the past two decades, there has been a remarkable increase in the production of geospatial data referencing the physical locations of human subjects driven by interactions with mobile devices and the utilization of sensors, global positioning systems (GPS), and wireless communication (Lee and Kang, 2015). This unprecedented growth in geospatial data production has surpassed traditional survey data and exhibits unique characteristics of “volume, velocity, and variety” that have deeply changed the way geospatial data are analyzed (Kitchin and McArdle, 2016). In response to the expanding geospatial data, new techniques such as artificial intelligence (AI) have emerged, ushering in a new era of data-intensive geospatial analytics. The proliferation of data, combined with transformative techniques, has revolutionized various aspects of everyday life, ranging from personal navigation to social networking (Leszczynski and Crampton, 2016). It has also propelled societal transformation by informing urban planning, optimizing transportation, facilitating environmental management, providing valuable business insights, and enhancing public safety (Shi et al., 2021).
However, it has become evident that data-intensive geospatial analytics can result in substantial adverse consequences, particularly concerning location privacy that involves the protection of location information. The utilization of the growing amount of geospatial data has gradually brought about a shift in social values that accept privacy erosion as a trade-off for convenience, efficiency, and national security (Crampton, 2015; Dobson and Herbert, 2021). The extensive sharing of location data encompassing details about the whereabouts, residences, and workplaces of humans has become pervasive, and yet individuals possess limited knowledge of and control over how these data will be utilized. A notable illustration of the implications is the rise of unregulated surveillance and social sorting within the ever-expanding realm of location-based services and related industries (Lyon, 2003; Maras and O’Brien, 2023; Wright et al., 2010). These practices categorize individuals based on their physical locations or trajectories and can impact critical domains such as law enforcement, employment, and healthcare, perpetuating or even exacerbating existing inequalities.
The literature on location privacy is expanding to address the impacts caused by the emergence of geospatial data and technologies (Armstrong and Ruggles, 2005; Banerjee, 2019; Clarke and Wigan, 2011; Curry, 1997a; De Montjoye et al., 2013; Dobson, 2009; Swanlund and Schuurman, 2016; Taylor, 2016). Considerable efforts have been made to explore the conceptualization of location privacy (Armstrong et al., 2017; Duckham and Kulik, 2006; Keßler and McKenzie, 2018; Kwan et al., 2004), identify instances where location privacy can be compromised (McKenzie et al., 2016; Wernke et al., 2014), and develop methods to counter surveillance and privacy violations that specifically target or involve geographic locations (Jiang et al., 2021; Kamel Boulos et al., 2022; Lin and Xiao, 2023a; Swanlund and Schuurman, 2019). However, a major problem with the existing literature is its limit to an individualistic approach to privacy that emphasizes individuals’ control over access to their location information that can be used to identify them. This emphasis on individuals to maintain anonymity and resist identification has translated into many existing practices focusing on anonymizing personal data that contain personally identifiable location information through the removal, aggregation, or masking of such information to prevent privacy violations. For example, the European Union’s (EU) General Data Protection Regulation (GDPR) 1 regulates the processing of personal data so that data subjects cannot be identified. But once the data is anonymous and individuals are not identifiable, it is no longer personal data and thus falls outside the scope of the GDPR. Similar provisions can be found in other privacy laws and regulations, such as the Health Insurance Portability and Accountability Act of 1996 (HIPAA) 2 and the California Consumer Privacy Act of 2018 (CCPA). 3
While recognizing the value of such an approach to privacy, I argue that this is not sufficient, particularly in the context of prevalent data-intensive geospatial analytics. Data-intensive geospatial analytics relies on algorithmic profiling and predictions as essential components of its capabilities. Algorithmic profiling involves categorizing and grouping individuals based on their shared interests, utilizing the generated group information to inform decision-making (Kammourieh et al., 2017; Mittelstadt, 2017; Taylor, 2017; Vedder, 1999). Algorithmic predictions involve training models to learn the characteristics of individuals, usually from a small subset, who voluntarily share their location information and other auxiliary data, such as social media usage. These models can then be used to infer the location or other personal details of all other individuals, potentially without their knowledge or consent (Hildebrandt et al., 2009; Mühlhoff, 2021, 2023). Both algorithmic profiling and predictions can effectively leverage large sets of anonymized data, especially with advanced AI algorithms such as deep learning and neural networks, without necessarily revealing the identities of individuals. Jurgens (2013) provides an illustrative example where accurate location inference was achieved for nearly all Twitter users by propagating location assignments through the social network, utilizing the anonymized data from less than 10% of Twitter users. This instance demonstrates the limitations of the current understandings of location privacy that focus on individual identifiability and anonymity, because even without personal data being involved, the privacy of data subjects remains substantially exposed to threat.
The purpose of this article is to advocate for a collective approach to location privacy in response to the challenges posed by data-intensive geospatial analytics. In the philosophical theories of privacy, collective approaches have long recognized the potential negative consequences of handling personal data on others (Baruh and Popescu, 2017; Floridi, 2014; Mühlhoff, 2023; Roessler and Mokrosinska, 2015; Taylor et al., 2017). Privacy, from a collective standpoint, is seen as an indivisible collective value and social phenomenon that can only be fully realized through collaborative efforts by all members of society, and it is imperative that a comparable minimum level of privacy be guaranteed for everyone (Regan, 1995). I argue that such a collective approach to privacy should be seen as a valuable addition to the current perspective that considers location privacy as an individual good and focuses solely on individual anonymity. It should be noted that this article critiques the Western approach to privacy that prioritizes individualism. The rationale for focusing on the Western approach is not solely due to its prevalence in the location privacy literature but rather to critically assess a widely discussed viewpoint. I recognize the significance of other cultural perspectives and their contributions to the discourse on location privacy.
In the remainder of the article, section “Privacy, autonomy, and anonymity” reviews the prevailing approach to location privacy that focuses on individual anonymity. Section “Challenges of data-intensive geospatial analytics” discusses the challenges posed by pervasive data-intensive geospatial analytics, particularly arising from algorithmic profiling and predictions, to this individual-focused approach. In section “A collective approach to location privacy,” I examine a collective approach to location privacy that highlights the inherent codependency of privacy in regulatory practices and method development as well as features the collective efforts by individual members of society in addressing location privacy concerns. Section “Conclusion” concludes the article.
Privacy, autonomy, and anonymity
Privacy finds its roots in ancient Greek philosophical discussions, notably in Aristotle’s distinction between the public realm of the polis, associated with political activity, and the private realm of the oikos, related to family and domestic life. This fundamental distinction between the public and private realms yields an important theory of privacy as control-access, wherein privacy revolves around the control one has over access to oneself (Fried, 1984; Roessler, 2004; Westin, 1967). Godkin (1880) highlights the importance of controlling access to one’s private space and personal affairs, stating that nothing is better worthy of legal protection than private life, or, in other words, the right of every man to keep his affairs to himself, and to decide for himself to what extent they shall be subject to public observation and discussion.
The control access theory extends to personal data, as evidenced by the 1997 Information Infrastructure Task Force (IITF) created under President Clinton, which defines privacy as “an individual’s claim to control the terms under which personal information—information identifiable to the individual—is acquired, disclosed, and used” (Kang, 1998). The US Supreme Court affirms that privacy entails “control over information concerning his or her person,” as noted in its ruling in the case of United States Department of Justice v. Reporters Committee for Freedom of the Press. 4 The GDPR also regulates that “personal data should be processed in a manner that ensures appropriate security and confidentiality of the personal data, including for preventing unauthorized access to or use of personal data.” It is noted that GDPR and several other national privacy laws and regulations, such as India’s Digital Personal Data Protection Act (DPDP), possess extensive extraterritorial reach and protect the processing of personal data across international borders.
Since the 1990s, the notion of location privacy has emerged as scholars began to recognize privacy as a significant ethical concern associated with geospatial technologies, particularly geographic information systems (GIS) (Armstrong and Ruggles, 2005; Crampton, 1995; Curry, 1997a, 1997b, 1999; Elwood and Leszczynski, 2011; Pickles, 1995). The conception of location privacy is closely related to and influenced by the control access theory of privacy. Kwan et al. (2004), for example, introduce the term “geoprivacy,” which is equivalent to location privacy, and define it as the “individual rights to prevent disclosure of the location of one’s home, workplace, daily activities, or trips.” Similarly, Duckham and Kulik (2006) define location privacy as “the claim of individuals to determine for themselves when, how, and to what extent location information about them is communicated to others.” Kerski and Clark (2012) describe location privacy as “the ability of an individual to move in public spaces with a reasonable expectation that their location will not be recorded without permission for later use by a third party.” These conceptions all emphasize the ability of individuals to control access to their identifiable location information and resist identification to facilitate their free actions. Privacy, in these conceptions, is confined to individuals and specifically pertains to location data that can be identified as their own.
The approaches to privacy I have explored so far predominantly stem from a Western perspective. In broad terms, privacy in the Western context tends to be individual-centric, whereas Eastern countries, as well as other cultures such as African ones, have traditionally emphasized community values and, to some extent, attached a negative connotation to privacy (Nakada and Tamura, 2005). Historical influences, like Confucian values emphasizing hierarchy, duty, and respect for authority, have left their mark on many Eastern societies. Some Eastern cultures, notably in China, Japan, and South Korea, and especially those with authoritarian governance, have historically prioritized collectivism and communal harmony over individual rights and privacy, often involving more extensive government control and surveillance. However, it is noteworthy that even in the East, there is a growing acceptance of Western perspectives on privacy, valuing privacy for individuals and interpersonal relationships (Huang et al., 2021; Lin et al., 2013; Ma, 2023). While acknowledging these cultural variations in the understanding of privacy, this article primarily focuses on Western perspectives, as they are more commonly represented and studied.
The value of privacy for individuals
Extensive scholarly interest in the value of privacy for individuals and personal autonomy has emerged alongside the development of the control access theory, with its emphasis on choice and control and, implicitly, the autonomous individual, as well as the rise of liberalism since the 19th century (Etzioni, 2008; Triandis, 2018; Westin, 1967). Personal autonomy entails the practical relationship one has with oneself and the capacity for critical self-reflection on one’s own reasons, goals, values, and projects (Christman, 2003). Having control over access to one’s personal data empowers individuals to establish and maintain boundaries, withdraw from public views, and protect themselves from unwanted scrutiny or surveillance, thereby fostering personal autonomy. As Roessler (2004) emphasizes, such control is essential for an individual to “secure a horizon of expectations regarding what others know about him that is necessary for his autonomy,” as it acts as “a protective shield allowing the individual to act towards all possible unspecified third parties in accordance with his expectations concerning the level of information they each have.” In other words, the ability to safeguard personal data and maintain a private realm enables individuals to freely act, think, and decide without unnecessary interference or coercion from external forces. It is vital for individuals to exercise their autonomy and make independent choices aligned with their values and aspirations.
The focus on the importance of privacy for individuals, especially personal autonomy, has fueled a widespread pursuit of anonymity and resistance against individual identifiability. Anonymity, in its literal sense, refers to the state of being nameless, where one remains unremarked and blends into the undifferentiated crowd in the eyes of others. The association between anonymity and privacy is not new. Westin (1967) describes anonymity as a “state of privacy” that “occurs when the individual is in public places or performing public acts but still seeks, and finds, freedom from identification and surveillance.” Anonymity is also closely tied to personal autonomy, embodying the concept of what Georg Simmel called the “phenomenon of the stranger,” where individuals can freely express themselves without the constraints imposed by a known or authoritative figure (Simmel, 1950). In the digital age, with pervasive technology and data collection, anonymity has become increasingly important (Doyle and Veranas, 2014). It allows individuals to separate their identities from the data generated by their daily activities and collected and utilized by third parties. By anonymizing data to prevent individual identification, personal information remains protected from unauthorized access by others. This separation of individual identities and data is crucial for individuals to engage in activities, express opinions, or access information without the fear of being personally identified or facing potential consequences.
The quest for anonymity is reflected in many privacy laws, regulations, and protection practices. The EU’s ePrivacy Directive (ePD), 5 for example, states that “the Member States, providers and users concerned, together with the competent Community bodies, should cooperate in [. . .] taking particular account of the objectives of [. . .] using anonymous or pseudonymous data where possible.” Similarly, HIPAA regulates that “a covered entity or business associate may not use or disclose protected health information” except in certain conditions, where “protected health information” encompasses all “individually identifiable health information” regardless of the form or media, be it electronic, paper, or oral. In addition to these laws and regulations, a variety of contemporary privacy protection methods have been developed. Many of these methods are designed to align with a privacy principle called k-anonymity, which ensures each person in a data set to remain anonymous by being indistinguishable from at least k − 1 other individuals (Sweeney, 2002b). Sweeney (2002a) demonstrates two commonly used methods to ensure k-anonymity: generalization and suppression. Generalization involves replacing a value in a data set with a less detailed but semantically consistent value (e.g. replacing birth dates with birth months), while suppression entails withholding the release of a value (e.g. removing birth dates from the data set). In the past decade, another privacy principle called differential privacy has become a gold standard for privacy protection (Dwork and Roth, 2014; Dwork et al., 2006). It achieves privacy by adding noise to aggregated data, ensuring that any alteration to an individual’s data does not impact the aggregated data. Both k-anonymity and differential privacy have been integrated into various products, services, and system designs that involve the processing of personal data, serving a crucial role in promoting the anonymity of individuals in the digital age (Abowd, 2018; Ding et al., 2017; Google, 2023; Palantir Technologies Inc., 2021).
Location privacy and anonymity
Location privacy is closely intertwined with individual interests and personal autonomy, especially considering the substantial evidence that highlights the impact of geosurveillance. Geosurveillance pertains to the monitoring and surveillance activities that specifically focus on geographic locations, as defined by Dobson and Herbert (2021) as “the practice, usually electronic, of monitoring and recording the geometries, topologies, and attributes of places and human and physical entities both stationary and moving.” Informed by architect Samuel Bentham’s illustration of the Panopticon (Bentham, 2011) and philosopher Michel Foucault’s theories on surveillance and disciplinary power (Foucault, 1979), scholars have engaged in extensive discussions concerning the pervasive nature of geosurveillance and its potential to undermine individuals’ control over their identifiable location information, thereby impairing personal autonomy through power dynamics. Dobson and Fisher (2003) illustrate a scenario where “one entity, the master, coercively or surreptitiously monitors and exerts control over the physical location of another individual, the slave.” This control can take various forms, including behavior manipulation through nudging, administering punishment, or other means, which the authors coin as “geoslavery.” Central to this concept is the potential for an individual in a position of power to encroach upon the personal autonomy of another individual by obtaining access to and exerting control over their identifiable location information through ongoing surveillance.
The literature on location privacy emphasizes the importance of being able to hide within a crowd and retain a certain level of anonymity as a means to resist geosurveillance and uphold personal autonomy (Dobson and Herbert, 2021). Swanlund and Schuurman (2019) discuss three commonly used tactics for resisting geosurveillance and maintaining anonymity: minimization, which reduces the collection and storage of identifiable location information; obfuscation, which introduces random noise to confuse and obscure one’s physical locations; and manipulation, which adds specific noise to displace one’s physical locations. Among these tactics, obfuscation has garnered substantial attention in the literature (Armstrong et al., 1999; Kwan et al., 2004; Leitner and Curtis, 2004; Wieland et al., 2008; Zimmerman and Pavlik, 2008). A specific privacy principle known as spatial k-anonymity has emerged as a variant of k-anonymity for obfuscating geospatial data. Spatial k-anonymity requires that an individual’s geographic location remains indistinguishable from at least k − 1 other locations (Ghinita et al., 2010), ensuring that individual locations in a data set cannot be easily distinguished or linked to specific individuals. One common approach to achieving spatial k-anonymity is to introduce random noise to relocate the original location of an individual within a region containing k − 1 other individual locations (Allshouse et al., 2010; Charleux and Schofield, 2020; Hampton et al., 2010; Hasanzadeh et al., 2020; Kounadi and Leitner, 2016; Seidl et al., 2016; Zhang et al., 2017). Another approach, which is not mentioned in Swanlund and Schuurman (2019), is to perform aggregation by grouping every k individual locations into one (Kounadi and Leitner, 2016; Lin, 2023b; Lin and Xiao, 2023a). By concealing identifiable location information within a cluster of similar locations, these methods eliminate the risks associated with individual identification through location data. In addition to spatial k-anonymity, variants of differential privacy have been developed to ensure the anonymity of individual data subjects in geospatial data. One practical variant is geo-indistinguishability, which aims to add noise for protecting location-based services and data products (Andrés et al., 2013; Lin, 2023a). The US Census Bureau has also devised a mechanism based on differential privacy called the TopDown Algorithm for the privacy protection of geographically aggregated census tables (Abowd, 2018; Abowd et al., 2022). The implementation of these privacy principles and methods ensures the preservation of location indistinguishability, thereby protecting individual anonymity.
Challenges of data-intensive geospatial analytics
Data-intensive geospatial analytics is a multidisciplinary field that focuses on the processing, analysis, and visualization of large and complex geospatial data sets. It combines elements of geospatial analysis, big data technologies, and data science to extract meaningful insights and patterns from vast amounts of spatial data. In the context of AI, particularly within the emerging realm of geospatial artificial intelligence (GeoAI), which applies AI techniques to geospatial data and analysis, algorithms exhibit exceptional proficiency in uncovering patterns and trends within extensive geospatial data sets, a capability that proves indispensable for profiling individuals and other entities, as well as facilitating the prediction of forthcoming or unforeseen events and conditions based on historical geospatial data (Janowicz et al., 2019; Li, 2020). This trend toward data-intensive analytics poses substantial challenges to the prevailing privacy protection paradigm that heavily focuses on individual interests, autonomy, and anonymity. Specifically, two aspects of analytics using big data, namely algorithmic profiling and predictions, contribute to these challenges.
Algorithmic profiling
Profiling describes the “systematic and purposeful recording and classification of data related to individuals” (Büchi et al., 2020). Geospatial data have long been utilized in profiling, particularly through the practice of geodemographic analysis, which emerged during geography’s quantitative revolution in the mid-20th century and initially found application primarily within the marketing industry (Curry, 1997b; Goss, 1995; Harris et al., 2005). Combining GIS with demographic data, geodemographic analysis generates detailed profiles of populations in different neighborhoods or regions. These profiles encompass factors such as age, gender, education level, and income, and serve to identify those who are most likely to engage in increased purchasing behavior. Consequently, geodemographic analysis facilitates activities such as direct mail marketing, advertising, credit scoring, business site selection, and retail planning.
While traditional geodemographic analysis predominantly relied on governmental survey data such as census data, voting records, and housing registries (Batey, 1995), the advent of large spatial databases labeled as geospatial big data has revolutionized the field. Geodemographic analysis and geospatial big data are both characterized by their market orientation and the privatization of data and analytics. The use of geospatial big data, which promises to offer “more diverse, continually accessible data sources,” has played a vital role in enhancing the accuracy and effectiveness of geodemographic analysis in comprehending population dynamics and behavior (Dalton and Thatcher, 2015; Webber et al., 2015). AI algorithms have significantly amplified this effect by enabling unprecedentedly effective pattern recognition from the data (De Sabbata and Liu, 2019; Singleton et al., 2020). In addition, the prevalence of big data has facilitated large-scale surveillance of thousands or even millions of people, thereby significantly expanding the scope of geodemographic analysis. This expansion is not limited to specific countries or cities but can extend to global-scale monitoring (Singleton and Spielman, 2014).
The integration of algorithmic profiling and data-intensive geospatial analytics has spanned beyond marketing and has become prevalent in various areas such as crime prevention and health monitoring (Longley, 2012; Singleton and Spielman, 2014). However, this rapid development has also given rise to substantial ethical concerns, particularly surrounding the classification of populations or groups as “problematized” and “risky” by algorithms (Cevolini and Esposito, 2020; Crampton, 2007, 2008; Mann and Matzner, 2019). To illustrate, consider a simplified example of a group labeled as “low-income neighborhood with high crime rates.” Being identified as part of this group can influence many decision-making processes and has negative implications for its individual members, such as discrimination in job applications, limited access to financial services, or increased surveillance by law enforcement agencies. As an exemplar from the real world, Jefferson (2017) demonstrates how the digital categorizations of urban spaces can lead to racialized carceral power dynamics and unequal policing and punishment. The consequences of such categorizations and profiling extend beyond mere labels; they shape the lived experiences and opportunities of individuals. Furthermore, the integration of AI algorithms, including emerging large language models (LLMs), into the profiling and categorization processes amplifies these concerns. LLMs, with their deep language understanding and data processing capabilities, can inadvertently reinforce existing biases present in the data they are trained on. As a result, the risk of perpetuating harmful stereotypes and discriminatory practices through algorithmic decision-making becomes increasingly pronounced.
The concerns about algorithmic profiling in data-intensive geospatial analytics are especially relevant in today’s “risk-based society,” characterized by new forms of juridical governance where the focus shifts from actual crimes committed to assessing the potential danger individuals may pose (Crampton, 2007). According to Foucault (2000), “dangerousness” entails society considering individuals based on their potential behavioral tendencies rather than merely their actions or violations of specific laws. The widespread use of algorithmic profiling in geospatial analytics has reinforced the notion of a risk-based society by encouraging the reduction of social identity to measurable characteristics that can be algorithmically classified. It emphasizes the risks and potential dangers associated with specific groups rather than considering the unique circumstances, experiences, and aspirations of individuals (Hildebrandt et al., 2009). This may exacerbate the tendency to view individuals as part of a homogenized and stigmatized group solely based on their location and activities as reflected in the data, and can even give rise to chilling effects that stifle personal expression and curtail individual freedoms (Büchi et al., 2020).
Algorithmic profiling can be seen as an invasion of privacy as it can be used to reveal intimate details of individuals’ lives and undermine their control over access to personal matters (Lyon, 2003; Mittelstadt, 2017; Taylor, 2017). However, the current state of privacy regulations and protections falls short of effectively addressing this concern, largely due to an emphasis on protecting identifiable individuals and ensuring anonymity. Currently, data needs to include identifiable information to be legally classified as personal data for strong limits on processing and exchange (Van der Sloot, 2018). The capacity of individuals to manage data about themselves ends once these identifiers are permanently removed. This individualistic focus of existing privacy protections, which assumes that privacy violations occur only when identifiability is present, can become problematic in the context of algorithmic profiling. Profiling often targets large groups of people or even the entire population, rather than specific individuals. Even if individuals attempt to blend in within a group to maintain anonymity, the existence of the group itself cannot be concealed, and the group members remain susceptible to actionable profiling without being personally identified. A relevant contemporary example is the use of LLMs such as ChatGPT, which can employ user input information to create profiles and categorize individuals based on the data provided (e.g. employment, health conditions), without necessarily requiring knowledge of the individual’s identity. Such a practice, though receiving attention (Poritz, 2023), remains largely unaddressed in the current legal and technical privacy landscape. Given the expanding magnitude and scale of geospatial data, such a threat to privacy has reached far beyond an individual concern to a societal and structural problem that has the potential to perpetuate biases, discrimination, and inequality at a systemic level.
Algorithmic predictions
With the rapid advancement of AI, algorithmic predictions have emerged as one of the most crucial applications within the realm of data-intensive geospatial analytics (Finlay, 2014). Unlike profiling, which relies on descriptive analysis using existing data collected about individuals, algorithmic predictions entail a form of predictive analysis concerning inferred information that has not been explicitly recorded. Such practices involve the utilization of data-driven predictive models, including both AI and traditional statistical models, to make inferences about any individual based on available data (Mühlhoff, 2023). In this context, predictive models refer to data processing systems that take input of readily available auxiliary data of an individual (referred to as “features”), such as one’s social media usage, to produce inferences regarding an unknown, typically hard-to-access, aspect of the individual (referred to as “target variables”), such as future behavior (e.g. whereabouts, potential purchases, online preferences) (Chong et al., 2017) and unknown personal attributes (e.g. gender, sexual orientation, political affiliations) (Kuang et al., 2022; Lin and Xiao, 2023b). To achieve this, predictive models employ a process called “pattern matching” to compare an individual’s features with data from a number of other individuals that the model has been trained on. The training data include both the features and target variables of these individuals, who voluntarily share their data possibly because they are unaware of how their data will be utilized or simply hold the belief that they have nothing to conceal. Mühlhoff (2023) illustrates an example of algorithmic predictions on Facebook, where a model trained on the subset of users who explicitly state their sexual orientation in their profile can be used to predict the sexual orientation of any Facebook user based on their usage data, such as their “likes.” It is important to note that such a prediction may be made even for users who do not provide consent or are unaware of the information being inferred about them.
Geospatial data are commonly incorporated into algorithmic predictions to make inferences about various aspects of an individual, including the past, present, and future locations, as well as other personal factors such as income level (Lee and Kang, 2015). This ability to draw inferences is underpinned by what is often referred to as the “First Law of Geography,” which suggests that “everything is related to everything else, but near things are more related than distant things” (Tobler, 1970). In the context of geospatial analytics, this means that when an individual’s location data is collected or shared, it becomes easier to make inferences about them based on the behavior or characteristics of nearby locations or individuals. The trend of using geospatial data for inference and predictions has become more pronounced with the emergence of mobility big data, which captures the movement of individuals in physical space as a result of the widespread adoption of mobile devices (Xu et al., 2016). For example, Blumenstock et al. (2015) illustrate that the historical mobile phone usage of an individual can be used to predict his or her socioeconomic status. Similarly, Kuang et al. (2022) showcase the application of machine learning methods to predict demographic characteristics such as age and gender of mobile users. In addition, the availability of “geotagging” features on social media platforms that allow users to label their real-time locations has contributed to the trend. For example, literature has shown the possibility of utilizing geotagged tweets from a small portion of Twitter users to infer locations for any non-geotagged tweet, even for those who have never used geotags before (Jurgens, 2013; Li et al., 2019).
Algorithmic predictions can be controversial. These practices carry a substantial risk of inferring sensitive information about individuals who may not willingly share such details. This is particularly concerning in cases where the information pertains to personal aspects such as sexual orientation, which can leave individuals vulnerable (Hildebrandt et al., 2009; Mühlhoff, 2021). Consequently, similar to profiling, algorithmic predictions have the potential to amplify and exacerbate the exploitation of individual vulnerabilities through automated algorithms, resulting in unequal treatment in accessing essential economic and social resources such as employment, education, knowledge, healthcare, and law enforcement (O’Neil, 2017). In addition, Yeung (2017) argues that such predictions can even lead to what the author refers to as “hypernudges,” which involve the use of constantly updated big data to predict user preferences and manipulate user behavior, such as suggesting a restaurant for dining or recommending a route to drive. While beneficial in certain situations, hypernudges through the widespread applications of algorithmic predictions also raise ethical concerns about encroaching upon the freedom of individuals to make personal choices without external interference (Lanzing, 2019).
The practice of algorithmic predictions, which involves predicting personal and undisclosed information using readily available data without the knowledge or consent of individuals, can be viewed as a violation of privacy, as it ultimately undermines their control over access to themselves and their private matters (Mühlhoff, 2021, 2023). Modern statistical and machine learning techniques facilitate such predictions by leveraging correlations between features and target variables in data contributed by a subset of users on a digital platform. Consequently, individuals tend to find themselves in a situation where the data provided by a minority of users sets the standard for predicting information about all users on the same platform. However, the current privacy regulations and protection practices have not adequately addressed this issue. Existing approaches primarily focus on protecting anonymity as the primary safeguard, overlooking the fact that the impact of algorithmic predictions extends beyond personal identification. It is essential to recognize that one does not need to be personally identified to experience the consequences of algorithmic predictions. The potential outcomes, such as unequal treatment and hypernudges, can affect individuals irrespective of their explicit identification. Even if an individual deliberately chooses to remain anonymous and refrains from disclosing certain sensitive information, the sensitive information can still be inferred and have an impact on the person. Modern privacy protection requires a shift toward a shared awareness that one’s own data could potentially harm others and should encompass more than just giving control to individual users over the collection and use of their personal data.
A collective approach to location privacy
Privacy approaches can be categorized along a continuum of individualism and collectivism, which determines where the primary responsibility for social action resides. The collectivist approach recognizes the potential negative effects of one’s own data on others, leading to the belief that individuals should not have unrestricted control over disclosing personal data to modern data companies (Baruh and Popescu, 2017; Floridi, 2014; Mühlhoff, 2023; Roessler and Mokrosinska, 2015; Solove, 2023; Taylor et al., 2017). Aaker and Williams (1998) describe collectivist cultures as prioritizing the collective (“us”) over the individual (“me”), emphasizing interdependence among individuals and striving to blur the boundaries between oneself and others. In this context, I argue that a collective approach to location privacy entails recognizing that privacy concerns extend beyond individual interests and encompass broader societal implications. It emphasizes the interdependencies and shared interests of communities in protecting their location privacy. Integrating this collective approach into current practices of location privacy protection, which predominantly focus on individual privacy and anonymity, is crucial, especially in the face of challenges posed by the growing amount of geospatial data where privacy can easily become vulnerable despite individuals being granted control over their personal data.
Building upon the insights of Baruh and Popescu (2017), considering location privacy from a collective perspective can be approached from two directions. The top-down direction focuses on a centralized or regulatory-driven approach that highlights privacy as a collective value rooted in codependency. Conversely, the bottom-up direction empowers individuals in privacy protection, recognizing privacy as a collective social phenomenon requiring cooperation. By combining both directions, a comprehensive collective location privacy protection framework can be established that balances regulatory compliance with individual empowerment.
Top-down: Privacy codependency
Privacy codependency recognizes privacy as a collective value, acknowledging that the level of privacy in a specific context is influenced not only by individual choices but also by the choices made by others. In other words, privacy codependency highlights the fact that the privacy of one person is intricately connected to the privacy of others, and that privacy cannot be fully enjoyed by an individual unless “all persons have a similar minimum level of privacy” (Regan, 1995). Embracing privacy codependency, it becomes crucial to implement practices that address collective location privacy protection, considering both regulatory and technical aspects.
In the regulatory aspect, a substantial body of literature has highlighted the concept of “group privacy,” which emphasizes that algorithmically defined groups have a distinct right to privacy that is not reducible to the privacy of their individual members (Floridi, 2014; Hildebrandt et al., 2009; Kammourieh et al., 2017; Mittelstadt, 2017). Loi and Christen (2020) further introduce the concept of “inferential privacy,” highlighting that group privacy should extend beyond descriptive analysis and consider the impact of predictive models on groups. Mühlhoff (2023) expands on this notion by suggesting that existing regulations should not limit themselves to explicitly recorded personal information but should also encompass inferred information that can possibly create inequality of treatment. Mühlhoff also proposes that the availability of consent as a legal basis should be restricted to situations where the consequences of the consent decision exclusively affect the consenting individual (Mühlhoff, 2023). This body of literature provides a solid foundation for legislators to reconsider the fundamental nature of location privacy and the legal status of group profiles. It also urges legislators and policymakers to address the issue of predictive models and the inferred information that can be obtained about both groups and individuals.
In terms of practical legal strategies to protect group privacy, Mantelero (2017) suggests the use of an approach based on prior risk assessment by data protection authorities, such as the US Federal Trade Commission (FTC), to supervise and regulate the potentially detrimental applications of big data (discrimination, unfair consumer practices, unauthorized predictions) and to protect collective interests. Similarly, Solove (2024) argues for a shift in the focus of privacy laws from the nature of personal data to prioritizing an examination of its use, potential harm, and associated risks. Specifically, Solove suggests that legislative frameworks should be grounded in the contextual use of personal data, with protections calibrated in accordance with the potential harm and risk inherent in those particular uses. Citron and Solove (2022) further elaborate on the typology of privacy harms and the conditions under which privacy harms should be deemed necessary for regulatory intervention. Specifically, the authors classify privacy harms as physical, economic, reputational, psychological, autonomy, discrimination, and relationship harms, which not only recognizes the multifaceted nature of privacy harms but also serves as a foundation for developing targeted legal frameworks. Citron and Solove also propose an approach to aligning the enforcement goals with remedies, which help address the need for a nuanced and proportionate response to the various forms of privacy harm. An insightful real-world application of this approach can be observed in the EU’s Artificial Intelligence Act, where AI-induced harms are directly addressed, ordered from impermissible to those that require careful consideration. Notably, social scoring and real-time facial recognition—which are closely related to geosurveillance and privacy—are considered highly harmful and are outright banned due to their potential adverse effects.
In addition, the concepts of “group” and “group rights” have been explored in the context of implementing group privacy through legislative measures. Pagallo (2017), for example, suggests that a practical definition of a group should revolve around a collection of ontological and epistemological predicates that characterize the group, such as a category marked by predispositions toward specific illnesses or behaviors. This parallels practices in national law, where various anti-discrimination laws strive to ensure equal treatment of groups, defined by factors such as sex, race, language, and religion, in political participation, employment, consumer transactions, and more. A proposal for a new data protection regulation presented by the EU Commission in January 2012 also recognizes a group right to lodge complaints to complement today’s personal data protection. Specifically, it acknowledges that a data subject can be targeted and harmed due to their membership in a given (racial, ethnic, genetic, etc.) data group, and therefore, it makes sense to grant such a group, or “any body, organization, or association which aims to protect data subjects’ rights and interests,” a procedural right to a judicial remedy against the data controllers, processors, or supervisory authorities. This consideration aligns with the notion of a “private right of action,” particularly in cases where demonstrated group-level harm requires enforcement.
In the technical aspect, approaches to algorithmic auditing have been developed and implemented on socio-technical platforms (e.g. Twitter, Facebook) to identify, mitigate, and rectify potential biases, discriminatory outcomes, and ethical concerns arising from algorithmic profiling and predictions. Notably, extensive research has been conducted to examine and audit the predictive capabilities of personal data shared by users on platforms such as Twitter, both for users and non-users (Garcia, 2017; Garcia et al., 2018; Sarigol et al., 2014). These studies provide valuable insights that can inform socio-technical platforms about reevaluating their data sharing policies to prevent mass violations of user privacy. To ensure the continued effectiveness of these efforts, it is imperative that ongoing algorithmic audits be conducted to detect and rectify biases and discriminatory outcomes across socio-technical platforms. These audits must be characterized by their rigor, independence, and regularity, adapting to the ever-evolving landscape of algorithms. Equally vital is the commitment to transparency and accountability, as the results of these audits should be made readily accessible to the public. Such transparency not only bolsters user trust but also serves as a tangible manifestation of the platform’s dedication to ethical and responsible data practices.
Privacy-enhancing technologies (PETs) constitute a diverse array of tools and techniques developed for various digital contexts, many of which today have gone beyond the mere assurance of anonymity provided by techniques such as k-anonymity and spatial k-anonymity. One notable privacy innovation is differential privacy, as previously mentioned, which ensures that the inferences that can be drawn about one individual from another’s data are limited, providing an enhanced level of privacy protection, particularly against algorithmic predictions (Ghosh and Kleinberg, 2017). By adopting differential privacy, it is promising that socio-technical platforms can establish strong safeguards and mitigate the risks associated with location privacy breaches that may occur with predictive models and location information. Another promising PET is federated learning, a decentralized approach to machine learning where the training of predictive models occurs locally on users’ devices (Konečný et al., 2016). Only model updates (not raw data) are communicated to a central server, which reduces the need for central data collection and minimizes the exposure of individual data, thus mitigating privacy risks associated with centralized profiling and prediction systems.
These emerging PETs complement the principles of privacy by design (PbD), a comprehensive framework and philosophy for integrating privacy throughout the lifecycle of systems and processes (Schaar, 2010), to achieve our collaborative privacy goals. Specifically, PbD establishes the vision for privacy, while PETs provide the practical means to manifest that vision. Numerous companies and institutions have already embraced PbD and PETs to pursue shared privacy objectives. For instance, Google has been a prominent advocate and practitioner of federated learning, using it in applications like Gboard to enhance text prediction without centralizing individual user data collection (Hard et al., 2018). These pioneering practices should serve as exemplars to inspire more entities across various sectors to embrace the principles of PbD and the capabilities of PETs in privacy protection. In addition, as we move forward, it is imperative that these strategies be extended to encompass the realm of location data, ensuring that collaborative efforts continue to evolve and adapt to the ever-expanding digital landscape, thereby reinforcing our collective commitment to robust privacy protection in an increasingly interconnected world.
Bottom-up: Privacy cooperation
Recent research has acknowledged that privacy extends beyond its individual scope and is instead a collective social phenomenon that hinges on cooperation among all members of society (Margulis, 2003; Westin, 2003). Privacy management, as described by Petronio (2002), entails a shared responsibility among individuals for safeguarding boundaries. However, a major barrier to cooperation is that privacy is socially constructed, which reflects the diverse values and norms of individuals across different cultures, and therefore people conceptualize, locate, and practice privacy in varied ways (Dourish and Anderson, 2006; Margulis, 2003; Marwick and Boyd, 2014; Westin, 2003; Zhang and McKenzie, 2023). This inherent variation in the understanding and implementation of privacy can be a challenge in establishing a common foundation for cooperation in matters related to location privacy. Fortunately, there is evidence that privacy can emerge through the negotiation of adaptable boundaries, and cooperation does not require a consensus on a singular set of privacy norms (Marwick and Boyd, 2014). Rather, as Martin (2016) suggests, privacy can be viewed as a social contract—a mutually beneficial agreement within a community regarding the sharing and use of information. In this regard, cooperation involves recognizing that while individuals may have different privacy priorities, their expectations of privacy need not diminish (Martin, 2012). There is a shared interest among people in privacy and the right to privacy. By acknowledging this common ground, cooperation can thrive, even amid the diverse conceptualizations and practices of privacy.
Conclusion
In Western societies, privacy is often seen as a point of individual resistance against the growing surveillance conducted by both the state and commercial entities. It is believed that privacy, by promoting individual autonomy, protects the ability to remain anonymous and exercise personal self-determination. However, in a landscape where extensive data collection, particularly regarding location information, has become pervasive, privacy discussions are complicated by the difficulty of defining harm at the individual level resulting from privacy violations, given the potent capabilities of algorithms for profiling and predictions. This article critically examines the individualist focus of existing notions and practices of privacy and the emphasis on anonymity within the context of location privacy. The argument put forth is that, in light of the prevalence of geospatial data analytics, relying solely on an individualistic approach is insufficient. This insufficiency stems from the fact that two crucial components of big data analytics—algorithmic profiling and predictions—do not necessarily require the identification of individuals, yet they still pose threats to privacy. To address this, I argue that a collective approach to privacy should be considered as a complement to the existing individualistic approach. Privacy should be considered a collective value and a collective social phenomenon to facilitate both regulatory and centralized efforts as well as individual actions. This collective approach acknowledges the interdependencies of privacy concerns, the potential impacts on communities, and the necessity for collective action in effectively tackling privacy challenges. By broadening the focus beyond the protection of individual interests and acknowledging the collective implications of privacy, a deepened understanding of the societal effects of contemporary data processing can be gained. This shift in perspective enables a comprehensive understanding of privacy within the context of location data, allowing us to grasp how these processes shape our societies.
Footnotes
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
