Sage Journals: Discover world-class research

Abstract

This article presents a novel methodology to examine the tracking infrastructures that extend datafication across a sample of 14 menopause-related applications. The Software Development Kit (SDK) Data Audit is a mixed methodology that explores how personal data are accessed in apps using ChatGPT4o to account for how digital surveillance transpires via SDKs. Our research highlights that not all apps are equal amid ubiquitous datafication, with a disproportionate number of SDK services provided by Google, Meta, and Amazon. Our three key findings include: (1) an empirical approach for auditing SDKs; (2) a means to account for modular SDK infrastructure; and (3) the central role that App Events—micro-data points that map every action we make inside of apps—play in the data-for-service economy that SDKs enable. This work is intended to open up space for more critical research on the tracking infrastructures of datafication within our apps in any domain.

Keywords

App infrastructures ChatGPT datafication digital methods audit digital surveillance FemTech health data menopause Software Development Kits (SDKs)

Since the pandemic, FemTech—a portmanteau for “female technology”—has rapidly expanded (Erickson et al., 2022; Mehrnezhad et al., 2022), catalyzed by the promise that these apps and devices might finally close the gendered data gap (Perez, 2019). The FemTech industry, currently valued at US$60 billion (Statista, 2024), provides women with new ways to manage health conditions ranging from maternal health, sexual health, and menopause. This fast-growing and largely unregulated marketplace raises concerns about who truly benefits, especially as more women turn to these apps and share intimate health data—sensitive information about women’s bodies, health, sex, gender, sexual orientation, close relationships, online searches, reading habits, or even private communications (Citron, 2022a, 2022b). In the United Kingdom, for example, nearly half (49.5%) of the population experience menopause but only 59% of general practitioners have received adequate training to recognize and manage related symptoms (Arnot, 2023). This lack of medical support has driven many women to seek help from these apps, the limited regulatory protection for this kind of intimate health data notwithstanding (Schäfke-Zell, 2022). Our article presents a novel method for auditing the infrastructures of datafication that underlie how a sample of FemTech menopause apps are tracking end-user behavior.

The primary means by which health data are shared between apps and advertisers is through “Software Development Kits” (SDKs). This understudied infrastructure offers manifold services and represents the building blocks of app creation. SDKs are “modular” connectors that come preloaded with an array of tools and code, representing an integral part of the software supply chain in our apps. Because their primary function is to connect apps to third-party services, they contain several APIs (application programming interfaces) that amplify their capacity to profile and track end-users (Pybus and Coté, 2024; Gontovnikas, 2020). Platform companies like Google, Meta, and Amazon provide a disproportionate number of these services, which affords them greater access to end-user data as developers become increasingly dependent on their use. While apps are supposed to adhere to existing privacy regulations, their infrastructure is complex and constituted by myriad actors who provide proprietary software within a data-for-service economic model. Consequently, the back-end of applications is opaque, creating blind spots for regulators and policymakers.

The following discussion examines how apps and third parties work together to capitalize on women’s health data. We propose a novel methodology to audit the tracking infrastructures that break down the datafied body into intimate, discrete, and actionable data points. We argue that the modularity of the SDK plug-and-play service delivery has been overlooked, with little attention devoted to understanding how this infrastructure can intensify digital surveillance. Our mixed methodology, which we term the SDK Data Audit, is attentive to the formation of these modular practices, providing a way to examine and account for how applications process and share user data. We have created three SDK Discovery Tools that delineate Google, Meta, and Amazon’s services, and engaged a large language model (LLM), ChatGPT4o, to audit Android manifest files (which are documents that account for every data point that can be shared between the end-user device, the developer, and the third parties embedded in apps). Our study of menopause apps offers three key findings which advance the study of the mobile infrastructures of datafication, highlighting asymmetries that span well beyond the domain of FemTech applications. We provide: (1) a more granular and expansive approach for auditing mobile app surveillance; (2) a means to assess the modular service infrastructure of SDKs; and (3) evidence of how “App Events,” the micro-data points which map every single action we take inside of apps, should be understood as a fundamental conduit for datafication. We see this work in dialogue with policymakers who want to develop more meaningful ways to augment agency and choice for women who deserve menopause apps that do not track, profile, and effectively compromise their intimate health data. In so doing, we aim to contribute to a safer and more empowering regulatory environment.

Personal and intimate health exposed: FemTech tracking infrastructures

Within scholarship on FemTech, there is growing consensus that there are not enough safeguards for protecting health data, which puts the security, privacy, and safety of women who use these services at risk (Jacobs and Evers, 2023; Mehrnezhad et al., 2022; Scatterday, 2021). Data governance and privacy issues are further exacerbated when considering that intimate data sell for 50 times more than the price of credit data, making it highly lucrative to monetize (Gilman, 2021; Mehrnezhad et al., 2024). These scholars underscore the significant privacy concerns related to FemTech applications and technologies, specifically in how they handle intimate data. Mehrnezhad et al. (2022) argue that the potential risks associated with FemTech technologies outweigh their benefits, especially given the extensive amount of health-related data that can be used to reveal intimate insights about the female body and mind. Erickson et al. (2022) identify several privacy harms stemming mainly from the illegal and unregulated use of health data sold to US workplaces, insurance companies, or data brokers. In addition, privacy risks also emerge when personal and intimate data are inferred from data that have been anonymized or de-identified.

Consequently, feminist legal scholarship has raised alarms about the lack of data protection in the context of unrestrained corporate intimate surveillance enabled by these apps and devices (Citron, 2021; Gilman, 2021; McMillan, 2022). Further regulatory challenges arise because these data are not classified as health data within its traditional definitions, creating what Schäfke-Zell (2022) describes as a legal “gray area” in applying for health data protection. App stores then compound these legislative challenges by allowing the majority of FemTech apps to be categorized as “health and fitness” instead of “medical,” further instantiating the claim that these apps are gathering non-medical health data and leaving users more vulnerable to their monetization strategies (Rosas, 2019; Scatterday, 2021; Shipp and Blasco, 2020).

For app users who share data about their intimate bodies, there are scant resources beyond privacy policies to help women discern meaningful differences in how ostensibly “similar” apps might be accessing and sharing their data. Yet, FemTech app users rated third-party access to their data among their top two privacy concerns (Cao et al., 2024). Since SDKs have a “modular” structure—in that they constitute “structurally independent” units that are highly interconnected (Baldwin and Clark, 2000)—developers can use as many or as few of their services to monetize their apps, affording them different degrees of surveillance (Pybus and Coté, 2024). However, few resources or methods are available to study their modular infrastructure. We aim to fill this gap with a methodology to audit these tracking infrastructures that facilitate datafication in our applications.

The widely cited walkthrough method developed by Light et al. (2018) provides invaluable guidance for inferring surveillance practices in our apps; however, it encounters an opacity challenge in accounting for how SDKs (primarily provided by platforms) are enacted in the back-end. While it is no surprise that all apps access our data, some apps collect and share far more than others—how and through which processes this asymmetry manifests is not immediately apparent. Addressing this problem, we present the SDK Data Audit to move beyond merely identifying trackers toward more comprehensively quantifying and qualifying the monetization services embedded in our apps. We do so by providing tools and an empirical approach that will enable more focused inquiries into how third parties can access intimate data and offer a means to assess which apps have a higher potential to cause privacy harms through intimate surveillance.

Datafication of SDK tracking infrastructures in apps

SDKs have increasingly become an object of study within app studies, which focus on the complex and multi-situated infrastructures of apps (Dieter et al., 2019). This scholarship highlights the relationality between SDKs and apps (Pybus and Coté, 2024; Gerlitz et al., 2019; Van der Vlist and Helmond, 2021), their relationship to datafication (Flensburg and Lai, 2022; Pybus and Coté, 2021; Lomborg et al., 2024), and their role in extending platform power and monopolization (Cohen, 2024; Nieborg et al., 2024; Blanke and Pybus, 2020). This literature has also been informed by discussions about the relationship between infrastructures and platforms, wherein Plantin et al. (2018) have argued that platforms are becoming infrastructuralized. Similarly, SDKs are an integral component of this infrastructuralization, such that it is now almost impossible for developers to make apps without platforms, allowing new dependencies to emerge.

From a political economy perspective, SDKs are also intensifying surveillance- (Zuboff, 2019), data- (West, 2019), and/or platform- (Srnicek, 2017) capitalism by facilitating data extraction and monetization in exchange for their services. Scholarship focusing on the datafied infrastructures of applications also interrogates how these agents facilitate complex end-user data extraction (Flensburg and Lai, 2022; Flensburg and Lomborg, 2023; Pybus and Coté, 2021) by privileging its non-rivalrous qualities. This foundational logic, which underscores the app economy, facilitates the continuous (un)coupling, (re)use, and seamless (re-)integration of end-user data to produce value in this new data-for-service economy (Blanke and Pybus, 2020). Thus, SDKs can represent a conduit for platformization (Poell et al., 2019; Van Dijck et al., 2018), making mobile applications more “platform-ready” (Helmond, 2015; Poell et al., 2019). They achieve this by extending the multi-sided business model (Rieder and Sire, 2014; Van Dijck et al., 2018), which positions developers as new complementors (Van Dijck et al., 2019) for platforms by offering an extensive range of monetization and development services (Pybus and Coté, 2024). Yet despite this rich literature, few methodologies exist for auditing SDK services in apps.

Auditing mobile infrastructures

Auditing techniques have been proposed to enhance transparency in data-driven systems (Mittelstadt et al., 2016; Sandvig et al., 2014). One approach we draw on is “code auditing,” which aims to discern which technical and legal harms might be embedded. However, as Sandvig et al. (2014) caution, this approach alone would be insufficient without a broader perspective attentive to the normative discussions of having “accountability by auditing.” That is, if auditing is to be effective—which for apps would ideally facilitate the detection of privacy harms and/or risks—we must have a clearer idea about how we want these technologically advanced systems to behave (Sandvig et al., 2014). The SDK Data Audit addresses this question by developing a methodological intervention attentive to the modularity and behavior of SDKs embedded in apps. In so doing, we compare our analysis of what resides in the back-end, with the existing governance tools in the front-end (i.e. privacy policies and Data Safety Agreements) to ask if they align or whether there are discrepancies that can be observed.

Auditing SDKs comes with several challenges. Apple, for example, is a closed ecosystem that prevents meaningful external oversight or inspection. Subsequently, almost all research, including ours, has focused predominantly on the open-source Android ecosystem, wherein the code for any app in the Google Play Store is made accessible in objects known as Android Package (APK) files. Databases like Exodus Privacy (n.d.), which display the third parties and privacy permissions inside most Android applications, are helpful but as we shall demonstrate, only partially reliable. Other considerations must also be made when choosing how to audit mobile applications. For instance, some scholars interested in these questions have developed mixed methodologies that examine the networked relationship between apps and platforms (Weltevrede and Jansen, 2019), developed tools to make API codes and privacy permissions more visible (Chao et al., 2024), mapped partnerships that emerge from SDKs (Helmond and Van der Vlist, 2021), or categorized the services third parties provide (Pybus and Coté, 2024). Building on this work, we have focused on two key objects: manifest files and SDKs.

Android manifest files, located in APK files, represent a record of every data requisition that will be enacted by an application (Pybus and Coté, 2021). Like a ship’s manifest that names every person on board, an Android manifest should list every way an app and its third parties access data from an end-user’s phone. These are interesting files to qualitatively explore, namely because they contain: (1) all the AdTech SDKs; (2) privacy permissions, which are labeled in accordance with the level of risk deemed as either “normal” or “dangerous” by Google.¹ “Normal” permissions are described as low risk, so the end-user is never alerted to their presence. “Dangerous” permissions should be listed because they give access to more sensitive data such as location, microphone, or contacts; and (3) metadata tags, which can automatically turn on or off software identifiers like Ad IDs or analytic tracking.

Next, to examine SDKs, Pybus and Coté (2024) offer a taxonomy which we have adopted in our methodology to account for the AdTech services in the menopause apps. This provides three distinct kinds of service clusters (development, app extension, and AdTech services), and we have decided to focus only on those services that enable monetization. These AdTech services break down further into: attribution services (tracking that occurs predominantly outside of apps to see if ad campaigns are working), engagement services (tracking within apps used for behavioral profiling and audience segmentation), and advertising services (delivery of ads by services such as ad networks and exchanges that SDKs provide).

The SDK Data Audit

The manifest file is one of the most valuable documents for auditing Android apps, but it is primarily structured for machine readability rather than human comprehension. We questioned if ChatGPT4o could be used to make these manifests more legible. ChatGPT4o is a LLM that enables the processing of significantly larger datasets. The “GPT” stands for generative pre-trained transformers (Campello de Souza et al., 2023), developed to perform various tasks in real-time, based on prompts or questions. What differentiates OpenAI’s model is its scaled-up capacity to closely mimic human conversation (Roumeliotis and Tselikas, 2023). However, how it communicates raises serious concerns, especially around the accuracy of its information, potential misinformation (such as hallucinations), and the replication of biases, discrimination, and stereotypes in its responses (Lambert and Stevens, 2024). Despite these challenges—and its significant environmental impact (Haque and Li, 2024)—ChatGPT4o’s semantic capabilities offer a possible opportunity for code explanations and auditing, opening a potential new space for critical engagement with inaccessible technical objects.

The SDK Data Audit methodology, illustrated in Figure 1, uses four steps to compare and triangulate our findings. This involves: (1) opening an app’s APK file (containing the manifest and SDKs), (2) analyzing manifest files with ChatGPT4o, (3) consulting Exodus Privacy,² and, finally, (4) examining the app’s Data Safety and privacy policy agreements. We selected 14 menopause-related applications from the Google Play Store across the United Kingdom, European Union, United Sates, and Canada to evaluate our mixed methodology. Our main criteria stipulated that apps appear under the category “menopause” and have a minimum of 10,000 downloads. We then downloaded their APK file from either APKpure or APK Mirror (open-source databases) and used an open-source decompiler tool called ClassyShark³ to open and read these files in order to access their manifests and the third-party SDKs.⁴ Next, we imported the manifest files from ClassyShark into ChatGPT4o and asked the LLM a series of questions informed partly by a growing body of prompt engineering literature (Giray, 2023; Henrickson and Meroño-Peñuela, 2023; Marvin et al., 2024).

Figure 1.

The SDK Data Audit method overview.

To minimize inaccuracies and enhance our results’ repeatability, we used a strategy known as “specifying output structure” (Azure OpenAI, 2024), which directs ChatGPT to “cite” and refer to the source material, which in our case meant the manifest file. We then used this document to discover: (1) the SDKs, (2) the SDK services that were integrated, (3) whether Ad Identifiers and analytic services were automatically enabled, and, finally, (4) the permissions that were inside the app, which we have put aside for further research. To facilitate comparisons, we focused only on third parties used for monetization to align with the “trackers” listed by Exodus Privacy. We then categorized the SDK services our apps were using in accordance with the AdTech services in Pybus and Coté’s (2024) taxonomy. In this step, our emphasis was not on quantifying the number of third parties, but rather on qualifying how these services were being used to access end-user data by identifying their purposes—namely: attribution, engagement, or advertising.

Finally, we compared our findings with our apps’ privacy policies and Data Safety Agreements, the two key documents available to end-users. Here, we evaluated differences in how data access was reported and observed in our SDK Data Audit. For this step, our aim was not just to reveal whether end-users were being tracked—this much is obvious—but rather to understand how this tracking infrastructure is mobilized by third parties and, equally, to ascertain differences in how apps datafy, profile, and track their users. As a corollary, when our findings revealed a disproportionate number of SDK services were providing App Event analytics, we supplemented our method with the walkthrough method (Light et al., 2018) to assess what intimate health app events could be tracked in our sample. While it is beyond the scope of this article to create a new metric that demarcates a “threshold” which might determine how little or how much an app accesses our personal data, our research addresses the need for new resources so that end-users, like the women using these apps, can make more meaningful and informed choices based on how their data are leveraged within this service-for-data economy.

SDK Discovery Tools for a modular data-for-service economy

As we applied our methodology and cross-matched results with Exodus Privacy, we immediately noticed what we might call a “translation” issue. Exodus Privacy (n.d.) refers to third-party software found in apps as “trackers,” which they define as “pieces of software meant to collect data about you or what you do.” However, given that both an SDK and its services are software, how do we distinguish between the whole and its parts? This is an especially important consideration because the parts in question access and action data differently, which can intensify value extraction depending on the developer’s in-app monetization strategy. We are using the more technical term for trackers—SDKs—to distinguish between the whole SDK and the sum of its service parts (Pybus and Coté, 2024). This allows us to emphasize why SDKs’ modular infrastructure matters. In short, the more services embedded by the developer, the more significant the app’s data collection capabilities will be. Thus, by distinguishing between an SDK (like Firebase, AdMob, or Google AdManager) and their services (such as Google Analytics, Crashlytics, or Dynamic Links), our methodology critically examines the different types of end-user data which result from embedding their unique services.

Labeling third-party SDK components as “trackers” obfuscates the extensive ways in which each service has the potential to amplify extraction. This is particularly relevant when examining “Super SDKs” (Pybus and Coté, 2024) that belong to major platforms like Google, Meta, and Amazon. Theirs are the most common and offer the most significant number of services, creating new dependencies for developers. Thus, by embracing the complexity of the SDK taxonomy, our more granular approach exposes the underlying relationships between developers, SDKs services, and access to end-user data, that go well beyond simplistic notions of “tracking.” Instead, we provide a more comprehensive understanding of how datafication occurs inside apps.

The SDK Data Audit demonstrated that many Super SDK services were only partially represented in Exodus Privacy. For example, in all three of Google’s AdTech SDKs in our sample, we found 17 monetization services, while Exodus Privacy named only five trackers. Since the SDKs belonging to platform companies were among the most prominent, we decided to create what we have called “Super SDK Discovery Tools” that: (1) compare the differences between our SDK audit method and the Exodus Privacy database; (2) develop a clearer way to distinguish between SDKs and their services; and (3) facilitate a more consistent approach to code these services in the manifest file. The Super SDK Discovery Tools in Tables 1 (Google), 2 (Meta), and 3 (Amazon) should be read as “living” tools, which will inevitably change and evolve alongside platform service offerings. These are guides we created to streamline our decisions about which modular services to account for in the platform companies we encountered throughout the SDK Data Audit. They also briefly summarize what each of these services do by applying the SDK taxonomy created by Pybus and Coté (2024).

Table 1.

Google firebase, AdMob, and Ad Manager SDK Discovery Tool.

SDK	Exodus Privacy: Trackers	SDK Discovery Tool: Google SDK services	SDK taxonomy: what do these services do?
Google Firebase	Google Analytics	Analytics	Engagement: event tracking, behavioral profiling, and audience segmentation
	Google Tag Manager	Tag/Events	Engagement: behavioral profiling with bespoke events
	Crashlytics	Crashlytics	Engagement: behavioral profiling
		Screen Views	Engagement: interface tracking
		Analytics: Measurement	Engagement and attribution: links app to Google’s AI “BigQuery” tool (a serverless data warehouse used to generate user profile insights)
		A/B Testing	Engagement: ad personalization
		Remote Configuration	Engagement: ad personalization: work off-line to develop and test profiles.
		Software IDs	Attribution: track identifiers such as app installation IDs (campaign tracking) or user IDs (out of app tracking).
		Ad ID Tracking	Attribution: tracking ad campaigns
		Dynamic Links	Attribution: tracking app installations
		Cloud Messaging	Engagement and advertising: sends personalized ads or messages
Google AdMob	AdMob	Tracking: Ad ID	Attribution: tracking ad campaigns
Google AdMob	AdMob	Display Ad Manager	Advertising: mobile ad network used to customize ads, enable and tailor in-app ads, and monitor ads.
Google Ad Manager	Google Ads	Tracking: Ad ID	Attribution: tracking ad campaigns
		Google Sign-In	Attribution: tracking for profiling
		Display Ad Manager	Advertising: mobile ad network used to customize ads, enable and tailor in-app ads, and monitor ads.
		Ad Creation and Integration	Advertising: personalized ads
3 SDKs	5 Trackers	17 SDK services

Table 2.

Meta Facebook SDK Discovery Tool.

SDK	Exodus Privacy: trackers	SDK Discovery Tool: Facebook SDK services	SDK taxonomy: what do these services do?
Facebook	Login	Login	Engagement and attribution: tracking and profiling
	Share	Custom Tabs	Engagement: enables sharing of content
	Events	App Events	Engagement: behavioral profiling
	Notifications (part of Social Graph)	Social Graph	Engagement: behavioral profiling and custom audience design
	Places	Place	Engagement: tracking location
	Ads	Facebook Audience Network (FAN)	Advertising: create and target ads
		App Links	Attribution: deep linking for campaign tracking
		Facebook Ad ID	Attribution: tracking ad campaigns
		Initialiser	Engagement and advertising: configures SDK to access to the Social Graph API, app events tracking and profiling, and access to the FAN
	Flipper (deprecated)		Engagement and development: software debugging
1 SDK	7 Trackers	9 SDK services

Table 3.

Amazon SDK Discovery Tool.

SDK	Exodus Privacy: Trackers	SDK Discovery Tool: Facebook SDK services	SDK taxonomy: what do these services do?
Amazon	Amazon Ads	Amazon Ads	Advertising: ad network used to customize ads, enable and tailor in-app ads, and monitor ads.
		In app purchasing (IAP)	Engagement: behavioral profiling via in-app purchasing
		Ad ID Tracking	Attribution: tracking for campaigns and profiling
1 SDK	1 Tracker	3 SDK services

The services that we have included in our Discovery Tools have all been verified and cross-referenced with the developer tools provided by Google (Firebase, n.d.; Google, n.d.-a), Meta (Meta, n.d.-a), and Amazon (Amazon Developers, n.d.). In some instances, services appeared ambiguous or unclear and we could not immediately discern if they were used for monetization; these have been excluded. Other services, which seemed more like “features,” have also been excluded. For example, apps like “Balance” and “Health and Her” were both using Google Analytics and had this code in their manifest files:

<meta-dataname= “google_analytics_default _ allow_ad_personalization_signals” value=“true” > (Health and Her manifest file, our emphasis).

We decided against using “Ad Personalization” as its own service because it appears more like a feature of Google Firebase Analytics. However, since this feature is turned on (“value = true”), Health and Her is named in our results as an app with its Advertising Identifiers automatically enabled. To sum up, the SDK auditing tools represent our decisions on what counts as an Ad Tech service. Others may disagree, highlighting an important research challenge: What counts as an SDK? What counts as a service? And, how do we create better ways to agree on these distinctions? Thus, these Discovery Tools should be considered a starting point for more critically engaged SDK scholarship which would focus on making platform services that access end-user data more legible and accessible.

Findings from the SDK Data Audit: how menopause apps leverage the data-for-service economy

The SDK Data Audit reveals three key findings that demonstrate the value of our methodology and highlight the need for further research into the infrastructures of datafication that reside within mobile applications. These can be summarized as follows: (1) the methodology provides a more granular and explainable approach to analyze SDK services, surpassing the capabilities of resources like Exodus Privacy; (2) a means to examine SDKs’ modular infrastructure, which intensifies datafication within mobile applications; and (3) insight into the overlooked importance of “App Events,” the cornerstone for the data-for-service economy, constituting foundational infrastructure for the analytic services that SDKs provide.

SDK Data Audit versus Exodus Privacy: Our SDK analysis (Figure 2) demonstrates discrepancies between the APK files we examined in ClassyShark, the manifest files we examined with the Discovery Tools and ChatGPT4o, and finally with Exodus Privacy. These differences primarily arose from how we accounted for SDK services, especially within Super SDKs. Auditing the manifest files with ChatGPT proved successful⁵ and closely aligned with what we observed in the original APK files. This outcome was intentionally orchestrated. The prompts for ChatGPT4o were meticulously fine-turned until the LLM reflected what we saw in the original APK file. Thus, if something were missing, we would prompt ChatGPT4o to double-check its work, or we developed specific prompts to alleviate any inconsistencies we were observing. For example, while Google Firebase Analytics consistently appeared in our results, other services like Firebase’s “Events” and “A/B testing” required specific prompt adjustments for ChatGPT4o to register them. By the end of our experimentation, the LLM gave us fairly consistent and repeatable results with the prompts we developed. Choices were also made about which SDK services to include in the Discovery Tools. For instance, ChatGPT4o identified 15 Google Firebase services inside Omena’s manifest file, but we included only the 9 we could verify via Firebase’s developer website. These decisions highlight more general challenges for auditing SDKs and using LLMs—a burgeoning area for further study.

Figure 2.

SDK services inside menopause apps.

Our final observation here was that not all the SDKs were listed in the manifest files, despite this being the expected requirement (Google, n.d.-b). For example, Mixpanel, an event analytic service for tracking and profiling users, was found in Her Spirit, Hormona, MenoLife, Omena, and Peppy Health. However, only Her Spirit correctly declared this in their manifest file. Similarly, Sentry, an analytic reporting service, was only identified in the manifest of two of the four apps using this service in our sample. These may be legacy SDKs, but there could have also been an oversight by developers who forgot to account for them. Regardless, third-party SDKs cannot be captured if they do not exist in the manifest file that ChatGPT4o is examining. We set these findings aside for further research.

2. Discovering the modularity of SDK services inside our apps: The SDK Data Audit reveals variations in the number of SDK services developers integrate when building apps, offering unique insight into the technical implementation of monetization strategies. Figure 3 shows significant divergences between the menopause apps we audited. For example, MenoLife and Omena draw upon almost every single Super SDK service from Google and Meta, using 22 of their 23 AdTech services. Conversely, Flo, Her Spirit, and FemmHealth have embedded only one to three services each. From this vantage point, we argue that privacy risks and harms can be exacerbated with the number of embedded services that can intensify the capture and use of personal data. And yet, the different degrees of integration remain hidden from the end-user. Instead, the only information we are provided when deciding which app is safe to download is vague declarations that it “may” have third parties or has Google Firebase installed.

Figure 3.

Super SDK services in menopause apps.

A closer look at the kinds of monetization services these apps have embedded reveals that seven apps use Firebase’s A/B testing and Remote Configuration services (Google, n.d.-c). This combination enables streamlined “marketing experiments” by allowing developers to test app features, campaigns, and targeted messages to profile and engage end-users. The remote configuration features enable these experiments to run even when the app is not actively in use. In addition, 10 apps employ Firebase’s Measurement service, which connects them to Google’s advanced AI-powered cloud tool, BigQuery. This serverless data warehouse equips developers with the ability to track user behavior in real-time, providing insights about their interactions with App Events (discussed in depth below). Subsequently, the scope and scale of these modular platform services streamline how apps datafy end-user behavior and this audit offers more nuanced insights into how this surveillance is conducted. Reducing this complex infrastructure to mere “trackers” obscures this more granular perspective, which can more fulsomely demonstrate the difference between how an app like MenoLife versus Her Spirit is more intensively leveraging Firebase’s services.

Finally, our method reveals that Ad IDs (Advertising Identifiers used to track and serve ads to end-users) are yet another modular service developers can activate or disable. These allow apps to track end-users across different devices to measure which ad campaigns they have encountered, and which have been successful; we can call these attribution services. While they do not directly access sensitive user data, they still track location and other private information. Apple’s App Tracking Transparency offers users more control over this dataveillance affordance with the clear opportunity to turn these IDs off, whereas to date, Android does not. Our results, in Figure 3, reveal that eight apps (with orange arrows on top) have automatically enabled tracking Ad IDs with Google, and one app (with the blue arrow), Omena, has automatically enabled Ad IDs with Meta without clearly indicating that they are in use.

These findings raise concerns about representing when tracking is enabled without explicit consent. While end-users should ideally have a choice, the approach varies among apps. For instance, of the six remaining apps in Figure 3, two (Her Spirit and Femilog) have chosen not to include Ad IDs. In contrast, the other four apps (Calm, Flo, Hormona, and Peppy Health) include them but have not enabled them by default, thereby allowing end-users to opt out of tracking when the app is downloaded. This difference highlights the varying degrees of user agency across the apps. Arguably, having these tracking features more clearly explained and represented is even more critical in FemTech apps, primarily because the intimate data they generate are often used for targeted advertising in a wholly opaque manner. These findings also point to the need for more meaningful modes of consent given the manifold ways in which the modularity of SDK services can access end-user data.

3. App Events: Supercharging the data-for-service economy: Finally, the SDK Data Audit reveals the prominent role of App Events’ in the datafication of how FemTech end-users log their interactions within these apps. Here, our findings show that each app in our sample uses some kind of analytics service. Google Analytics is the most popular, with 13 out of 14 apps using it, represented in Figure 4. App Events are provided by companies that offer a range of analytic tools to analyze every “action performed by users within an app [that] a developer or marketer chooses to measure” (Adjust, n.d.). These can include anything from what users click on, when they log in, and what they look at or purchase. They provide deep insights into behavior and preferences and promise to facilitate different data-driven decision-making that can drive in-app optimization for monetization and engagement strategies (Field Drive, n.d.). App Events, therefore, represent an increasingly lucrative site of investment. Services like Firebase Analytics (which includes Google Analytics) provide 500 different pre-made events built into their dashboard for developers to leverage. These might include actions like: “search,” “select content,” “select an item,” “click,” or “notification open.”

Figure 4.

Summary of app events analytics services.

App Events can also be customized to meet developers’ more specific needs. These can be used in combination with pre-made services to maximize insights. While there are limits to how many App Events can be simultaneously deployed, Facebook SDK still allows developers to use up to 1000—pre-made and/or customized App Events—at a time (Meta, n.d.-b). These can be swapped in and out and paired with other analytics companies like AppsFlyer or AI tools like Google’s BigQuery to leverage in-app behavior (Meta, n.d.-b; AppsFlyer, n.d.). From our analysis, we see different companies that provide App Events analytics for: (1) granular behavioral tracking (e.g. Adjust, Mixpanel, Snowplow, FullStory, Apptentive, Customer.io, Pubnub, or Amplitude); (2) granular attribution tracking (Braze); (3) both behavior and attribution tracking (Google, Meta, AppsFlyer); (4) more customizable events (Google, Meta, Mixpanel, AppsFlyer, Apptentive, Customer.io); (5) segmentation or profiling based on the convergence of demographic and behavioral data (Google, Facebook, Mixpanel, or Braze); and (6) custom reporting tools such as dashboards for tailored insights about how these services profile end-users (provided by all). These are then leveraged and transformed into strategies to drive more engagement, attribution, and personalized marketing within most apps.

Discussion

Returning to our menopause sample, one of our original concerns centered around how these apps share intimate health data. Shipp and Blasco’s (2020) study revealed that “third parties . . . receive period-related data in the form of App Events (p. 502).” Given that App Events are used to personalize app experiences and drive “precision targeting” for advertisers (Field Drive, n.d.), we can conclude that one of their most critical functions is tracking and designing end-user engagement strategies. Within menopause apps, we infer that engagement would likely mean how long women might spend on these apps, the articles they click on, and/or the kinds of intimate health data they reveal about themselves. Figure 5 summarizes 14 kinds of intimate health data we observed when deploying the walkthrough method (Light et al., 2018) to qualitatively assess what data women might upload about themselves when they use these apps. Our results show that some apps ask users to provide data about their sleep patterns, symptoms, moods, or diet information, including their weight. Other apps, which are also for period tracking, ask women about when they had their last period, details about their sex drive, or to disclose intimate details about their bodies. Apps like Balance also provide articles about menopause and its myriad effects on the mind and body. These points of contact are all potential App Events that can be used for behavioral profiling.

Figure 5.

Types of intimate health data captured by app events.

One of the limitations of the SDK Data Audit is that static analysis of the back-end cannot tell us which App Event data are actively being shared with third parties. Instead, it brings to our attention what services apps use and shows whether they have set their permissions to access this data automatically. The Google Play Store does provide a “Data Safety Agreement” which, working in tandem with privacy policies, is meant to give users a snapshot of collected data and who it is shared with, so the people can make an informed decision before they install an app (Google, n.d.-a). However, these documents are often inaccurate or sometimes simply missing (Kollnig, 2021; Story et al., 2019). Revisiting Figure 5, the four apps with green arrows (Peppy Health, Health and Her, Flo, and Calm) have indicated they are sharing “app activities” or App Events with third parties. Conversely, the apps with the red arrows (Hormona, MenoLife, Mimosa, and Clue) only disclose that these data are being shared in their privacy policies, but not in their Data Safety Agreements. Ultimately, this is confusing and misleading. Moreover, there is no description of what these App Events or app activity might be, what is being shared, or how this might be used for behavioral profiling and advertising. MenoLife, for example, promises only to share their user’s email address in their Data Safety Agreement. However, the SDK Data Audit reveals how they have embedded almost every behavioral profiling service from both Google and Meta, and their privacy agreement states that “usage data” (App Event data) are being shared with both platforms, in addition to an advertising platform called Rakuten (which we did not observe in their SDK).

Our methodology remains attentive to both the front-end and back-end, allowing us to capture how datafication unfolds within mobile applications. We identify a crucial disconnect between how SDKs are declared and represented as trackers. We draw attention to their modular infrastructure and the extensive range of third-party services apps can embed. The SDK Data Audit uncovers a broad spectrum of app tracking behaviors, with some apps engaging in more comprehensive tracking and profiling of end-users than others. These critical differences should be more accurately represented to give real meaning to the rather vague notion of end-user consent. We also demonstrate that datafication within apps is an iterative, socio-technical process shaped both by user interactions with the interface and by the modular SDK infrastructures provided by platforms and third parties. In this instance, by gaining a clearer understanding of what might constitute an “intimate health data App Event,” the SDK Data Audit throws into stark relief the number of apps using these services and their role in supporting the data-for-service economy. This economic model, which maximizes monetization strategies and behavioral tracking, is facilitated by developers who (un)knowingly provide different degrees of access to myriad actors. Indeed, App Events are likely why 17% of UK women report receiving “distressing” targeted ads directly based on their in-app activities in FemTech applications (ICO, 2023).

Conclusion

In conclusion, our SDK Data Audit contributes a novel approach for auditing datafication infrastructures. This method combines both a front- and back-end approach to provide a more granular way to examine what kinds of personal data are being accessed by SDKs, lending greater insight into how digital surveillance operates within our apps. Our methodology highlights that not all apps are equal amid ubiquitous datafication. This can be observed by paying attention to their modular infrastructure to audit how extensively third parties can monetize personal data. Our findings show that analytic tracking, facilitated mainly by App Events, is a significant apparatus for data capture. Looking back at Figure 5, this suggests that women are being asked to disclose personal information about their health that does not appear to be adequately protected. Eight of the apps in our study indicated that they were sharing “app interactions” or App Event data with third parties. Finding better ways to prevent this kind of data sharing and represent these differences is paramount, especially if we are to provide more meaningful agency rather than resignation when it comes to deciding which app to download.

Based on this case study, we recommend expanding options for end-users to opt out of automatic Ad identifiers and analytics tracking in Android, which were enabled in over 57% of the menopause apps we examined. Given the sensitive nature of intimate health data potentially made accessible through FemTech apps, we suggest an additional opt-out mechanism to prevent developers from converting this data into App Events via SDK services. While some apps did disclose sharing this data with third parties, users—especially women using these apps—deserve more than vague declarations on how their data “may” be accessed and shared. Whether anonymized or not, if health data are monetized within an app, users should have control over how this occurs and by whom it is used, particularly considering potential privacy risks, discrimination, and unexamined harms that may arise. Finally, although we have focused on menopause applications, the SDK Data Audit is ultimately designed to audit the tracking infrastructure of apps across any domain. Therefore, we hope this work will contribute to ongoing struggles for more agency over our personal data and open up space for more critical research on the tracking infrastructures of datafication in our apps.

Footnotes

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Social Science and Humanities Research Council, Canada and National Research Centre on Privacy, Harm Reduction and Adversarial Influence Online, Bristol UK.

ORCID iDs

Jennifer Pybus

Mina Mir

Notes

Author biographies

Jennifer Pybus is a globally recognized scholar whose interdisciplinary research intersects digital and algorithmic cultures and explores the capture and processing of personal data. Her work focuses on the political economy of social media platforms, display ad economies, and the rise of third parties embedded in the mobile ecosystem which are facilitating algorithmic profiling, monetization, polarization, and bias. Her research contributes to an emerging field, mapping out datafication, a process that is rendering our social, cultural, and political lives into productive data for machine learning and algorithmic decision-making.

Mina Mir is a doctoral candidate at the Department of Politics at York University. Her work focuses on data privacy, racial and gendered discrimination, and surveillant biometric technologies, especially computer vision and facial recognition technology. Her research traces the historical lineages of technologies of visual representation to analyze their intersection with modes of social classification, dispossession, and extractivism.

References

Adjust (n.d.) What are app events? Available at: https://www.adjust.com/glossary/events/ (accessed 11 November 2024).

Amazon Developers (n.d.) Developer portal master. Available at: https://developer.amazon.com/en-US/home.html (accessed 12 November 2024).

AppsFlyer (n.d.) In-app events. AppsFlyer. Available at: https://www.appsflyer.com/glossary/in-app-events/ (accessed 11 November 2024).

Arnot

(2023) Menopausal women often turn to doctors who know little about the symptoms – here’s what needs to change. The Conversation, 14 September. Available at: http://theconversation.com/menopausal-women-often-turn-to-doctors-who-know-little-about-the-symptoms-heres-what-needs-to-change-207450 (accessed 12 January 2024).

Azure OpenAI (2024) Prompt engineering techniques with Azure OpenAI. Available at: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/advanced-prompt-engineering (accessed 5 July 2024).

Baldwin

Clark

(2000) Design Rules: The Power of Modularity. Cambridge, MA: MIT Press.

Blanke

Pybus

(2020) The material conditions of platforms: Monopolization through decentralization. Social Media + Society 6(4): 1–13.

Campello de Souza

Neto

Serrano de

, et al (2023) ChatGPT, the cognitive mediation networks theory and the emergence of sophotechnic thinking. SSRN. Available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4405254

Cao

Laabadli

Mathis

, et al (2024) “I deleted it after the overturn of Roe v. Wade”: understanding women’s privacy concerns toward period-tracking apps in the post Roe v. Wade era. In: Proceedings of the CHI conference on human factors in computing systems, Honolulu, HI, 11–16 May, pp. 1–22. New York: ACM.

10.

Chao

Van Geenen

Gerlitz

, et al (2024) Digital methods for sensory media research: toolmaking as a critical technical practice. Convergence 30(1): 236–263.

11.

Citron

(2021) A new compact for sexual privacy. William & Mary Law Review 62(6): 1763–1840.

12.

Citron

(2022a) The Fight for Privacy: Protecting Dignity, Identity, and Love in the Digital Age. New York: W. W. Norton & Company.

13.

Citron

(2022b) The end of Roe means we need a new civil right to privacy. Slate, 27 June. Available at: https://slate.com/technology/2022/06/end-roe-civil-right-intimate-privacy-data.html (accessed 8 November 2024).

14.

Cohen

(2024) Platforms, data infrastructures, and infrastructure stacks. SSRN Scholarly Paper. Available at: https://papers.ssrn.com/abstract=4693056

15.

Dieter

Gerlitz

Helmond

, et al (2019) Multi-situated app studies: methods and propositions. Social Media + Society 5(2): 1–15.

16.

Erickson

Yuzon

Bonaci

(2022) What you do not expect when you are expecting: privacy analysis of FemTech. IEEE Transactions on Technology and Society 3(2): 121–131.

17.

Exodus Privacy (n.d.) What we do. Available at: https://exodus-privacy.eu.org/en/page/what/ (accessed 5 July 2024).

18.

Field Drive (n.d.) Event data analytics: understanding and improving your events. Available at: https://www.fielddrive.com/blog/event-data-analytics-improving-events (accessed 11 November 2024).

19.

Firebase (n.d.) Google’s mobile and web app development platform. Firebase. Available at: https://firebase.google.com/ (accessed 11 November 2024).

20.

Flensburg

Lai

(2022) Datafied mobile markets: measuring control over apps, data accesses, and third party services. Mobile Media & Communication 10(1): 136–155.

21.

Flensburg

Lomborg

(2023) Datafication research: mapping the field for a future agenda. New Media & Society 25(6): 1451–1469.

22.

Gerlitz

Helmond

Nieborg

, et al (2019) Apps and infrastructures – a research agenda. Computational Culture. Available at: http://computationalculture.net/apps-and-infrastructures-a-research-agenda/

23.

Gilman

(2021) Periods for profit and the rise of menstrual surveillance. Columbia Journal of Gender and Law 41(1): 100–113.

24.

Giray

(2023) Prompt engineering with ChatGPT: a guide for academic writers. Annals of Biomedical Engineering 51(12): 2629–2633.

25.

Gontovnikas

(2020) What is an SDK? Available at: https://auth0.com/blog/what-is-an-sdk/ (accessed 18 May 2023).

26.

Google (n.d.-a) Google for developers – from AI and cloud to mobile and web. Available at: https://developers.google.com/ (accessed 12 November 2024).

27.

Google (n.d.-b) Provide information for Google Play’s data safety section Available at: https://support.google.com/googleplay/android-developer/answer/10787469?hl=en (accessed 8 February 2024).

28.

Google (n.d.-c) Firebase A/B testing. Available at: https://firebase.google.com/docs/ab-testing (accessed 7 November 2024).

29.

Haque

MdA

(2024) Exploring ChatGPT and its impact on society. AI and Ethics. Epub ahead of print 21 February. DOI: 10.1007/s43681-024-00435-4.

30.

Helmond

(2015) The platformization of the web: making web data platform ready. Social Media + Society 1(2): 1–11.

31.

Helmond

Van der Vlist

(2021) Platform and app histories: assessing source availability in web archives and app repositories. In: Gomes

Demidova

Winters

, et al (eds) The Past Web: Exploring Web Archives. Berlin: Springer, pp. 203–214.

32.

Henrickson

Meroño-Peñuela

(2023) Prompting meaning: a hermeneutic approach to optimizing prompt engineering with ChatGPT. AI & Society. Epub ahead of print 4 September 2023. DOI: 10.1007/s00146-023-01752-8.

33.

ICO (2023) ICO to review period and fertility tracking apps as poll shows more than half of women are concerned over data security. Information Commissioner’s Office, 7 September. Available at: https://ico.org.uk/about-the-ico/media-centre/news-and-blogs/2023/09/ico-to-review-period-and-fertility-tracking-apps/ (accessed 12 January 2024).

34.

Jacobs

Evers

(2023) Ethical perspectives on FemTech: moving from concerns to capability-sensitive designs. Bioethics 37(5): 430–439.

35.

Kollnig

(2021) Tracking in apps’ privacy policies. Cryptography and Security. Epub ahead of print 26 November. DOI: 10.48550/arXiv.2111.07860.

36.

Lambert

Stevens

(2024) ChatGPT and generative AI technology: a mixed bag of concerns and new opportunities. Computers in the Schools 41: 559–583.

37.

Light

Burgess

Duguay

(2018) The walkthrough method: an approach to the study of apps. New Media & Society 20(3): 881–900.

38.

Lomborg

Sick

Flensburg

, et al (2024) Monitoring infrastructural power: methodological challenges in studying mobile infrastructures for datafication. Internet Policy Review 13(2): 1–28.

39.

Marvin

Hellen

Jjingo

, et al (2024) Prompt engineering in large language models. In: Jacob

Piramuthu

Falkowski-Gilski

(eds) Data Intelligence and Cognitive Informatics. Singapore: Springer, pp. 387–402.

40.

McMillan

(2022) Monitoring female fertility through “FemTech”: the need for a whole-system approach to regulation. Medical Law Review 30(3): 410–433.

41.

Mehrnezhad

Shipp

Almeida

, et al (2022) Vision: too little too late? Do the risks of FemTech already outweigh the benefits? In: Proceedings of the 2022 European symposium on usable security, Karlsruhe, 29–30 September, pp. 145–150. New York: ACM.

42.

Mehrnezhad

Van Der Merwe

Catt

(2024) Mind the FemTech gap: regulation failings and exploitative systems. Frontiers in the Internet of Things 3: 1296599.

43.

Meta (n.d.-a) Meta developer documentation. Available at: https://developers.facebook.com/docs/ (accessed 5 July 2024).

44.

Meta (n.d.–b) Overview – Meta App events documentation. Available at: https://developers.facebook.com/docs/app-events/overview/ (accessed 11 November 2024).

45.

Mittelstadt

Allo

Taddeo

, et al (2016) The ethics of algorithms: mapping the debate. Big Data & Society 3(2): 1–21.

46.

Nieborg

Poell

Caplan

, et al (2024) Introduction to the special issue on locating and theorizing platform power. Internet Policy Review 13(2): 1–17.

47.

Perez

(2019) Invisible Women: Data Bias in a World Designed for Men. New York: Abrams Press.

48.

Plantin

Lagoze

Edwards

, et al (2018) Infrastructure studies meet platform studies in the age of Google and Facebook. New Media & Society 20(1): 293–310.

49.

Poell

Nieborg

Van Dijck

(2019) Platformization. Internet Policy Review 8(4): 1–13.

50.

Pybus

Coté

(2021) Did you give permission? Datafication in the mobile ecosystem. Information, Communication & Society. 25(11): 1650–1668.

51.

Pybus

Coté

(2024) Super SDKs: Tracking personal data and platform monopolies in the mobile. Big Data & Society 11(1): 1–17.

52.

Pybus

(2024) SDK Data Audit. Available at: https://osf.io/w96tq (accessed 12 November 2024).

53.

Rieder

Sire

(2014) Conflicts of interest and incentives to bias: a microeconomic critique of Google’s tangled position on the Web. New Media & Society 16(2): 195–211.

54.

Rosas

(2019) The future is FemTech: privacy and data security issues surrounding FemTech applications. Hastings Business Law Journal 15(2): 319–341.

55.

Roumeliotis

Tselikas

(2023) ChatGPT and Open-AI models: a preliminary review. Future Internet 15: 192.

56.

Sandvig

Hamilton

Karahalios

, et al (2014) Auditing algorithms: research methods for detecting discrimination on Internet platforms. Data and Discrimination: Converting Critical Concerns into Productive Inquiry 22(2014): 1–23.

57.

Scatterday

(2021) This is no ovary-action: FemTech apps need stronger regulations to protect data and advance public health goals. North Carolina Journal of Law & Technology 23(3): 636–668.

58.

Schäfke-Zell

(2022) Revisiting the definition of health data in the age of digitalized health care. International Data Privacy Law 12(1): 33–43.

59.

Shipp

Blasco

(2020) How private is your period? A systematic analysis of menstrual app privacy policies. Proceedings on Privacy Enhancing Technologies 4: 491–510.

60.

Srnicek

(2017) Platform Capitalism. Cambridge: Polity Press.

61.

Statista (2024) Global FemTech market size 2030. Available at: https://www.statista.com/statistics/1333181/global-femtech-market-size/ (accessed 12 January 2024).

62.

Story

Zimmeck

Ravichander

, et al (2019) Natural language processing for mobile app privacy compliance. CEUR Workshop Proceedings 2335: 24–32.

63.

Van der Vlist

Helmond

(2021) How partners mediate platform power: mapping business and data partnerships in the social media ecosystem. Big Data & Society 8(1): 1–16.

64.

Van Dijck

Nieborg

Poell

(2019) Reframing platform power. Internet Policy Review 8(2): 1–10.

65.

Van Dijck

Poell

De Waal

(2018) The Platform Society: Public Values in a Connective World. Oxford: Oxford University Press.

66.

Weltevrede

Jansen

(2019) Infrastructures of intimate data: mapping the inbound and outbound data flows of dating apps. Computational Culture. Available at: http://computationalculture.net/infrastructures-of-intimate-data-mapping-the-inbound-and-outbound-data-flows-of-dating-apps/

67.

West

(2019) Data capitalism: redefining the logics of surveillance and privacy. Business & Society 58(1): 20–41.

68.

Zuboff

(2019) The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. London: Profile Books.

Tracking menopause: An SDK Data Audit for intimate infrastructures of datafication with ChatGPT4o

Abstract

Keywords

Personal and intimate health exposed: FemTech tracking infrastructures

Datafication of SDK tracking infrastructures in apps

Auditing mobile infrastructures

The SDK Data Audit

SDK Discovery Tools for a modular data-for-service economy

Findings from the SDK Data Audit: how menopause apps leverage the data-for-service economy

Discussion

Conclusion

Footnotes

Funding

ORCID iDs

Notes

Author biographies

References