Abstract
This article examines the possibilities and pitfalls of using Big Data to address sexual and reproductive health concerns as related to the Sustainable Development Goals (SDGs), paying particular attention to contextual difference in development settings. The global datafication of sexual and reproductive life has taken place at great speed. However, evidential deficiencies and a lack of critical engagement of the specific issues around working with sexual and reproductive health Big Data in development contexts is apparent. Informed by critical data studies, and framed by a political economy perspective which calls attention to power structures, we seek to deepen our understanding of the role and challenges that Big Data around sexual and reproductive health in the Low and Middle-Income Countries can play in addressing the SDGs. First, we explore the ways in which sexual datafication processes produce Big Data. We then consider how such Big Data could directly contribute to addressing the SDGs beyond simply monitoring and evaluating. Next, we unpick how the sensitive and stigmatised nature of sexual and reproductive health can have ramifications in data-driven contexts where significant power asymmetries exist. By doing so, we provide a more nuanced articulation of the challenges of datafication by contextualising the stigma around sexual and reproductive in a datafied context. We argue that whilst Big Data in relation to sexual and reproductive health shows potential to support the SDGs, there are specificities that must be considered to ensure that the push for data-driven approaches does no harm.
Introduction
Datafication processes, whereby complex human feelings, relationships and actions are transformed into digital data which often involves quantification (Lupton, 2016), and the resultant Big Data that is generated, provide opportunities for addressing the Sustainable Development Goals (SDGs) (Hassani et al., 2021; Wu et al., 2020). The datafication of sexual and reproductive life has taken place at great speed, and data is now being collected in a variety of contexts and forms such as via apps and Internet searches and from social media, electronic health records and online transactions. This provides a wide range of ‘alternative’ data sources that can be used to complement traditional sexual and reproductive health datasets. However, the role of datafication and Big Data in addressing sexual and reproductive health concerns in relation to the SDGs has been less well considered compared to other topics.
The importance of data has increased globally and data is often the key input into systems that are used to explain, manage, regulate and predict the world we live in; however, data are not neutral (Kitchin and Lauriault, 2014). Yet discourses around Big Data are contradictory with some suggesting much potential for Big Data to address key global challenges, whereas others argue critical concerns around data quality, privacy, bias and other issues must not be side-lined by such hype (Car et al., 2019; Kitchin, 2016; Scott, 2021). We contend that the specificities of data related to sexual and reproductive health, differ from other data for a range of complex reasons as will be explored below. Additionally, these specificities in relation to Low and Middle-Income Countries (LMIC) particularly, have been rendered invisible. In this paper, we are among the first to argue that the nuances of sexual and reproductive health data in LMIC require critical consideration to ensure that data-driven approaches are beneficial, appropriately resourced and do no harm. By doing so, we borrow from and extend Blumenstock’s call for attention to context: In the rush to find technological solutions to complex global problems there’s a danger of researchers and others being distracted by the technology and loosing track of the key hardships and constraints that are unique to each local context. Designing data-enabled applications that work in the real world will require a slower approach that pays much more attention to the people behind the numbers (2018: 70) (our emphasis added)
As Blumenstock notes, technology should not detract from social considerations and people must be brought centre stage. We seek to add a further layer to this, whereby the context of the data itself must be considered, as there are significant consequences and risks when working with data around sensitive topics as we will demonstrate below.
The central question addressed in this paper is, what are the opportunities and challenges of working with Big Data in relation to sexual and reproductive health (SRH) in LMIC? We adopt a critical data studies approach which argues that data are not neutral objective or ‘raw’, but are instead contextual and situated (Dalton and Thatcher, 2014). The paper is organised as follows. First, we explore the ways in which sexual datafication processes are generating Big Data that could contribute to improving sexual and reproductive health around the world. Second, we consider how this Big Data could contribute towards addressing the SDGs focusing specifically on LMIC. Following this, borrowing from political economy approaches and calling attention to power differentials, we critically consider the challenges of working with SRH Big Data where we begin by detailing issues around infrastructure, skills and resources in LMIC where sexual and reproductive health may not be considered a development priority. To follow on, we present issues around inequalities, bias, discrimination, privacy and data quality where we argue that Big Data sets may exclude certain groups leading to biased conclusions for programme and policy development. Finally, we draw attention towards how the increased visibility that vast volumes of data bring, could compromise the identity and safety of marginalised communities. Thus crucial that significant efforts are made to prevent harms that technological developments may technological developments are not used to cause harm.
Bid data for development
A data revolution being driven by the increased volume of real-time data is taking place across LMIC as well as in more industrialised nations. Data are the basic inputs into processes that create information and knowledge, with knowledge providing the basis for understanding and explaining the world. Such understanding can be used to direct actions, inform policy development and exert influence over others (Kitchin, 2021). Consequently, whoever has access to high quality, large volumes of data, has a competitive and power advantage over those who do not. Understanding that knowledge, which creates influence and power is dependent on data, enables the attraction to Big Data to become clear. The use of Big Data in development contexts is now well considered in a range of domains such as transport, energy systems, health, aid distribution and disaster contexts (Agrewal and Prabakaran, 2020; Iliashenko et al., 2021; Heeks and Shekhar, 2019; Qadir et al., 2016; Sarker et al., 2020; Zhang et al., 2018). Big Data is an umbrella term referring to vast quantities of digital data that is continually being generated (UN Global United Nations Global Pulse, 2013). Big Data for development is thought to differ from the way the term Big Data is commonly used, however the reasons behind this are beyond the scope of this paper (see Ali et al., 2016). For this paper, we draw on the UN Global Pulse (2012) definition where Big Data for Development generally shares some, or all, of the following features: (1) Digitally generated – that is, the data are created digitally (as opposed to being digitised manually) and can be stored using a series of ones and zeros, and thus can be manipulated by computers. (2) Passively produced – a by-product of our daily lives or interaction with digital services. (3) Automatically collected – that is, there is a system in place that extracts and stores the relevant data as it is generated. (4) Geographically or temporally trackable – for example, mobile phone location data or call duration time. (5) Continuously analysed – that is, information is relevant to human well-being and development and can be analysed in real-time. Here, ‘real-time’ does not always mean immediately and could be understood as information which is produced and made available in a relatively short and relevant period of time, and information which is made available within a timeframe that allows action to be taken in response, that is, creating a feedback loop. Importantly, it is the intrinsic time dimensionality of the data, and that of the feedback loop that jointly define its characteristic as real-time. (pg 15).
Such data can be split into user content generated through active engagement, and device generated data which can be passively collected (van Heerden, 2020). User generated content includes data generated from utilising social media including information from users posts as well as other data such as location and network information; from smartphone messaging apps; and through e-commerce and search engine data. Passively generated data, also known as exhaust data, are produced as a by-product of human computer interactions such as around phone usage, or that from bodily worn devices which record individuals activities or bodily processes. These types of data possess a shared characteristic; they are ‘organic’ meaning that they are by-products of processes and were not collected or sampled with the explicit intention of drawing conclusions, unlike traditional data collection instruments such as censuses or surveys (Laney, 2001).
What can Big Data do for the sustainable development goals?
The Sustainable Development Goals are a collection of 17 interlinked global goals, which aim to eradicate poverty, establish socioeconomic inclusion, improve health and protect the environment. Issues around SRH cut across multiple SDGs, particularly Goal 3: Ensure healthy lives and promote well-being for all at all ages and Goal 5: Achieve gender equality and empower all women and girls (UN, 2016). Big Data have the potential to provide timely information on a large scale, adding value by generating knowledge that can be further used to inform interventions from planning to implementation, as well as evaluations of development programmes and monitoring SDGs indicators (Letouzé, 2015; Lopes and Bailure, 2018; MacFeely, 2021; Silber, 2018). It is argued that Big Data can generate insights around health, well-being and quality of life that are missed by traditional data sources for example, Big Data can address gaps where some population groups are excluded from traditional data sources (Silber, 2018). Additionally, Big Data is said to ‘represent information about people’s behaviour instead of information about their beliefs’ (Letouzé, 2015: 4, our emphasis added). Social desirability bias, where respondents provide data in a way that they think will be viewed favourably, is a concern in SRH research and evidence gathering, due to the stigmatised and taboo nature of the topic. For example, respondents may deliberately answer questions inaccurately, either by underreporting stigmatised activities, or, by over reporting normative ones (Kelly et al., 2013; Rao et al., 2017). For example, Cullen (2020), using data from Nigeria and Rwanda argues that standard survey methods may significantly underestimate the prevalence of intimate partner violence (IPV). Additionally, the method used generates different rates with the most common method, face-to-face interviewing, showing the lowest IPV rates, and anonymous methods documenting the highest. Accordingly, efforts to address the difficulties of gathering accurate, complete and reliable SRH data are welcome and the attraction to Big Data is evident.
Whilst there is much fanfare about the promissory potential of Big Data, it is important to consider the digital landscape in LMIC to examine the feasibility and scope for collecting Big Data around any topic, not just in relation to SRH. There were approximately 5.27 billion unique mobile phone users (75% of these using a smartphone), about 4.72 billion Internet users (more than 60% of the world’s total population) and approximately 4.33 billion active social media users (more than 55% of the global population) in April 2021 (Kemp, 2021). These numbers are increasing, with data suggesting that 332 million new users came online over a 12 month period, with the number of social media users increasing by 13.7% over the same period (Kemp, 2021). However, the distribution of Internet users is uneven, with developing and least developed countries as well as those in rural areas having reduced digital access (ITU, 2020). In LMIC, mobile phones are the primary means of internet access and nearly half of women access the internet this way (Rowntree, 2019). Thus, mobile phone data or data generated through mobile devices constitutes a valuable data source in development contexts, since it is the only digital technology used by most people in low-income groups (Kshetri, 2014). Additionally, many LMIC are beginning to digitise systems, for example, the District Health Information Software 2 (DHIS2) system is creating vast swathes of digital data for processing and analysis. That said, lack of affordability and the high cost of connectivity relative to income prevents access, and digital services in many least developed countries (LDCs) remain prohibitively expensive (ITU, 2021). Additionally, infrastructure issues around electricity, signal coverage and bandwidth negatively impact access to, and the use of, digital systems in LMIC (Houngbonon et al., 2021). Other factors such as low digital literacy, limited content in local languages as well as social gender-based norms where women are less likely to be online compared to men, play an important role in explaining limited digital integration (Bastion and Mukku, 2020; James,2019). Finally, a lack of in country capacity for data processing, poor data cultures and skills shortages for analysing Big Data, present challenges for integrating Big Data solutions into LMIC (Kalema and Mokgadi, 2017; Kshetri, 2014; Young et al., 2021). Therefore, whilst Big Data-driven approaches in LMIC are feasible, there are several generic access challenges which may hinder their uptake and success.
Sexual datafication, Big Data and SRH
The datafication of sexual and reproductive life has rapidly occurred due to the increased digital mediatisation of all domains of daily life. Datafication refers to the conversion of qualitative aspects of life into quantified data (Ruckenstein and Schüll, 2017). By exploring the ways in which sexual and reproductive life is digitally mediated, we can begin to see the vast opportunities for Big Data pertaining to SRH to be collected either, actively via user generation, or passively as an exhaust from other activities. For example, online spaces enable those on the fringes of society such as those belonging to minority sexual groups, those with certain fantasies or sex workers and clients to network, share experiences and advice and seek information (Hawkins and Watson, 2017; McDermott et al., 2015; Milrod and Monto, 2020; Noack-Lundberg et al., 2020; Carter et al., 2021; Randall and McKee, 2017). Commercial sex in its many forms is advertised, organised, performed and paid for via the digital realm (Hammond, 2015; Jones, 2015; Kingston et al., 2020; Sanders, et al., 2018; Sanders et al., 2020). Digital technologies such as online spaces and text messaging enable protective practices to be engaged in by sex workers (Bernier et al., 2021). The digital sphere however also opens opportunities for sexual abuse, exploitation and harassment, paedophilia and trafficking (Holt et al., 2010; Kloesss et al., 2017; Machimbarrena et al., 2018; Mandau, 2020; McGlyn, 2017; Ringrose et al., 2021).
Online retailers make up the largest market share of the sex toy market at 60%, and in some developing countries, online purchasing presents the only option due to legal concerns (Grand View Research, 2021). Pornhub, the world’s largest provider of online porn, reports that 130 million people a day visited Pornhub and in 2020 mobile devices made up 84% of all Pornhub’s traffic worldwide; 80% of that from smartphones, with tablet and laptop traffic seeing reductions (Pornhub, 2021). Whilst the USA, Japan and the UK are the top three traffic providers on Pornhub, several LMIC are on the top 20 list including Mexico, Brazil, Columbia and the Philippines (Pornhub, 2021). A variety of digital dating opportunities in the form of apps, online adverts or matchmaking websites catering for general interests as well as speciality preferences where users input their likes, dislikes and backgrounds have become popular (Reynolds, 2015). People use social media to share information via text and other media, such as images or videos about sexual and reproductive health from public health campaigns, to documenting birth stories and announcing pregnancy loss (Anbalibi and Forte, 2018; Gabarron and Wynn, 2016; Jones et al., 2019; Sanders, 2019). Additionally, people use online spaces, digital assistants and chatbots for locating general sex-based information as well as health information around contraception, abortion and accessing sexual and reproductive healthcare services (Courtenay and Baraister, 2021; Jerman et al., 2018; Mitchell et al., 2014; Nadarzynski et al., 2021; Patterson et al., 2019; Wilson, et al., 2017). Fertility support for both men and women via online forums or social media is also reported (Grunberg et al., 2018; Stenström and Pargman, 2021).
There are now multiple systems for booking and managing health appointments, as well as the use of digital online consultation’s, the use of which accelerated during the COVID-19 pandemic (Kempton et al., 2020; Nadarzynski et al., 2017; Zhao et al., 2017). Health management information systems (HMIS) are designed to manage healthcare data including systems that collect, store, manage and transmit a patient’s electronic medical records (EMR) or systems supporting healthcare policy decisions. DHIS2, an open source HMIS used in 73 LMIC, covers approximately 2.4 billion people. DHIS2 has specific data work packages that support HIV and reproductive, maternal, newborn, child and adolescent health, providing vast quantities of data potentially covering approximately a third of the world’s population. Prescription, insurance and pharmacy services all have online systems providing further information and valuable data to those with access (Aldughayfiq and Sampalli; Geissler J, 2021; Goundrey-Smith, 2018). The purchase and provision of condoms, the pill, HIV and other STI testing kits and home pregnancy tests can all be digitally mediated through an assemblage of state and private providers. (Ahmed-Little et al., 2016; Pai et al., 2021). Further self-care options include tracking apps around menstrual health and fertility (Lupton, 2015). Whilst not exhaustive, this list provides some insight into the volume and breadth of digital data captured globally pertaining to sexual and reproductive health. These transactions and interactions generate substantial volumes of both user generated and passively collected data which, when processed and analysed could reveal insights about people’s sexual practices and behaviours, sexual and reproductive health and bodies. This extensive volume of data once processed, could provide opportunities for a range of stakeholders (e.g. commercial organisations, healthcare providers, researchers, NGO’s and government or multinational organisations) to address SRH issues and work towards addressing the SDGs. In some cases, this work has begun as will be explored below.
The role of Big Data in sexual and reproductive health has been less well considered compared to other health domains, where it has been extensively argued that Big Data, such as hospital records, patients’ medical records and digital data produced by devices part of the ‘internet of things’ can support the healthcare system in developed and LMIC (Dash et al., 2019; Wyber et al., 2015). However, there is an emergent body of work exploring the use of Big Data in HIV (Qiao et al., 2021). For example, research has explored using Internet search data (Young and Zhang 2018) the analysis of social media content (Cai et al., 2020, Stevens et al., 2020; van Heerden, 2020; Young 2015; Young et al., 2021), using electronic health records (Yang et al., 2021), the application of machine learning techniques to datasets (Weissman et al., 2021), web scrapping (Rennie et al., 2020) and using mobile phone data to understand HIV flows (Valdano et al., 2021). Other areas beyond HIV include using social media data to explore gender-based violence (Carlyle et al., 2019; XueJia et al., 2019), analysing data from a wide range of apps related to SRH such as menstruation, fertility and pregnancy (Barassi, 2017; Hamper, 2020; Lupton, 2015; Starling et al., 2018; Tatsumi et al., 2020), and the use of electronic health records (Simons and Kohn, 2019). Big Data has been used to predict post-partum depression and a range of reproductive and gynaecological molecular and cellular characterisations, physiological and physio-pathological insights and clinical outcomes (Khamisy-Farah et al., 2021; Moreira et al., 2019).
The discussion above is not intended to be a definitive guide of all data trails pertaining to SRH and further work mapping and categorising such data could be beneficial in working to address the challenges explored below. However, this snapshot evidences an increasing interest in the way that Big Datasets and associated techniques can be applied to address issues around sexual and reproductive health. Nevertheless, there remain two deficiencies. First, the work is dispersed, and each study focuses on an individual topic be that HIV, post-partum disorders or GBV; there is no literature bringing together this body of work and addressing the collective challenges and concerns of working with data around sexual and reproductive health. Second, much work around Big Data and SRH is based in Higher Income Countries (see Van Herrden and Young, 2020); thus, the specificities and challenges faced in LMIC have been less well considered in relation to SRH. The following section will explore in what ways Big Data pertaining to SRH can work towards achieving the Sustainable Development Goals paying particular attention to the SRH dynamics in LMIC contexts.
Opportunities
There has been much hype around Big Data (see Wyber et al., 2015). It has been argued that Big Data promises a ‘data deluge’, from data-scarce to data-rich studies of ‘detailed, interrelated, timely and low cost data – that can provide much more sophisticated, wider scale, finer grained understandings of societies and the world we live in’ (Kitchin, 2013: 263). There is little disagreement that Big Data has the potential to support the SDGs; the scale of this benefit is however contested due to the challenges this presents. Big Data can assist in working towards the SDGs in two ways – first, by providing supplementary data to support monitoring targets related to SRH, and second, by providing data and evidence to enable better programming, services and initiatives to support SRH and outcomes as they relate to the SDGs, we discuss both of these below.
Measuring and monitoring
Big Data can be used to support the measurement and monitoring of the SDGs indicators related to SRH by combining Big Data sets from alternative data sources with traditional data sets to obtain newer insights. For example, if multiple data sources related to the same entity (e.g. individuals, households, communities, groups, etc.) are combined, information on behaviours, patterns and relationships on different dimensions of a phenomenon can be generated (Abreu Lopes, and Ballur, 2018; Silber, 2018). Since one of the important principles of the SDGs is to ‘leave no one behind’, data disaggregation plays an important role (Martinez, 2017). Disaggregation can be done by gender (e.g. in goal 5), ethnicity (e.g. in goal 10), income group or other relevant classifications. For example, in terms of SRH, the target could be finding the number of women diagnosed with a specific STD and belonging to a particular ethnic group living below the poverty line, providing crucial disaggregated information for monitoring and evaluation. SDGs indicators can be classified into three tiers (IAEG-SDGs, 2020). In Tier I, the methods to produce the indicators exist and the data is available. However, many SDGs indicators remain unavailable even at a national level, with Tier II and Tier III indicators being more problematic. The methodology for computing Tier II indicators exists, and is internationally established, however, the data is not regularly produced by countries. As a result, there are areas or regions for which the data are not fully available. Tier III indicators lack a methodology or international standards for their computation, a topic which was the focus of the 51st session of the UN Statistical Commission which took place in 2020. Big Data approaches can support the development of indicators in the last two tiers, thus addressing persistent data gaps.
Traditional data sources for the measurement and evaluation of the SDGs, such as via large-scale national sample surveys on sexual reproductive health may be present, however, some groups in the population can be excluded. Consequently, Big Data can also fill gaps where information is not available. For example, mobile phone spending can be used as a proxy variable for income level in areas or groups in the population where this data is not available via other data sources. Traditional data sources may not be able to provide timely information since data may not be available every year, for example. Hence, information between the data collection periods may be needed for monitoring and evaluating the SDGs. Understanding temporal dynamics is crucial to monitoring SDGs indicators over time and improving quality of life. In this context, Big Data can be used to predict indicators, as well as develop and evaluate theories of change. It becomes possible to investigate whether the target phenomena are influenced by individual or systemic factors. For example, these can be helpful to predict change in social norms, which cannot be directly observed in traditional indicators. In addition, relationships found in Big Data can predict health or social indicators in geographical areas where traditional surveys have not been carried out, but correlated geospatial data is available (Abreu Lopes, and Ballur, 2018). Monitoring however, is insufficient to improve the SDGs, ‘the measurement approach alone does not cover the whole spectrum of ways in and channels through which Big Data as an entirely new ecosystem could impact—contribute to or hamper—human progress as called for and measured by the SDGs’ (Letouzé, 2015: np). Thus, whilst insights derived from Big Data can be used to measure some SDGsroindicators, the true potential for Big Data lies in its ability to assist progress towards achieving specific targets and thus contributing directly to the achievement of the SDGs (Perera-Gomez and Lokanathan et al., 2017), as will be explored below.
Directly contributing to the SDGs
This section will tease out some of the benefits specifically in relation to SRH that go beyond monitoring SDG indicators. Big Data can contribute to SRH programming, services and initiatives in a number of ways. Big Data could contribute to outcomes measured by the SDGs via non-policy actions, that is, by people using ‘insights and suggestions derived from Big Data, such as Google Maps estimates, algorithmic recommendations of when to see a doctor, etc. These are largely unrelated to policies but remain in the realm of “applications”—ways in which Big Data helps “do stuff, concrete tasks, more effectively”’ (Letouzé, 2015: np). For example, in the United States, algorithms run against Electronic Health Records have been used to identify patients at increased risk of HIV and alerting healthcare providers about patients who may benefit from PrEP with the aim of improving PrEP prescribing and thus preventing new HIV infections (Krakower et al., 2019). By preventing new HIV infection rates this could contribute towards achieving the SDG target 3.3.1, Number of new HIV infections per 1000 uninfected population, by sex, age and key populations. By providing real-time insights into the population SRH, this could enable targeted interventions for vulnerable groups to be developed and adapted quickly whilst in operation to achieve maximum benefit and reduce costs. Social marketing public health campaigns are often used to increase testing for STI’s. Digital campaigns can draw on data from web-based ad click through’s alongside connecting clickthrough’s to outcomes, for example, web-based clickthrough’s to HIV/STI tests ordered online (Gilbert et al., 2019). Analysing real-time data in this context enables STI programme planners to understand the effectiveness of digital campaigns and to address deficiencies in targeted communications, improving the impact and efficiency of such approaches, thus reducing costs. mHealth, the use of mobile technologies and multimedia to fulfil health goals and provide support to healthcare delivery tasks (Nurmi, 2013), provides opportunities as passive and user generated data collected from these systems could be analysed to target specific groups. Interestingly, SRH mHealth initiatives in LMIC may help overcome barriers to accessing SRH, particularly those that are rooted in the social contexts surrounding sexuality such as regarding provider prejudice, discrimination, stigmatisation, fear of refusal, lack of privacy and confidentiality (Biddlecom et al., 2007). Barriers in accessing services in traditional ways create data gaps around certain groups, however, data from mHealth systems could plug such gaps enabling greater insights and more targeted programming. It is beyond the scope of this paper to cover every opportunity that Big Data presents for supporting the SDGs via a direct contribution; however, the narrative above and earlier in the paper demonstrates such potential. An exercise thoroughly mapping different domains where Big Data could contribute to SRH-related SDGs targets would be a useful follow-on endeavour to push this field forward.
Challenges
Despite this potential, we should remain cautious of the limitations of Big Data and below we call attention to the challenges around working with SRH Big Data. Critique around Big Data and health has focused on generic issues around self-selection bias, data quality, duplication of respondents and coverage issues (Abreu and Ballur, 2018; Hilbert, 2016; Maaroof, 2015). Furthermore, problems related to privacy and confidentiality have also been considered (Shlomo and Goldstain, 2015). However, through our discussion, we argue that SRH Big Data is not data like any other, in fact, there are a range of specific issues and challenges that arise due to the taboo and stigmatised nature of the topic that require reflection and considered actions to mitigate potential harms.
Political economy approaches can help explore such issues. Taking Collinson's definition, a political economy approach is ‘concerned with the interaction of political and economic processes within a society: the distribution of power and wealth between different groups and individuals, and the processes that create, sustain and transform these relationships over time’ (2003: 3). Thus, the analysis below calls attention towards power structures, highlighting potential winners and losers in the rapid datafication of SRH. Taylor and Broeders (2015) argue that the use of new communication technologies by those in LMIC is resulting in a shift in power from the state to corporations who gather, process and analyse this burgeoning digital data. As the volume of data increases, this leads to increased visibility for populations who were previously less surveyed. Resultantly, there is an increase in power to monitor and surveil for unregulated actors, giving corporations a new type of power and greater influence over the lives of individuals. With this power comes the potential for misuse and the exclusion of smaller state actors with local knowledge and a contextual understanding of the vast volume of data. They and others (Hayes, 2017), warn a lack of accountability may risk experimentation by big tech who may be attracted to LMIC as they offer the opportunity to test their solutions in a real world context. Additionally, Big Data analytics may in fact fail to address the structural conditions which frame challenges in LMIC (Talyor and Broeders, 2015) such as HIV, poverty, inequalities, and gender-based violence. Distributed governance creates a power to represent data subjects whereby individuals lose the ability to control how they are represented as data flows around increasingly large and complex systems of stakeholders across geographical boundaries. Within this complex system of power imbalances, data itself becomes the driver for the collection and processing of data, and function creep occurs (see Hayes, 2017). Corporations thus accumulate increasing volumes of data enhancing their power to profile and render visible for counting, sorting, monitoring and intervening with, those who were previously unseen or those who wish to remain at the periphery of surveillance (Taylor and Broeders, 2015).
Capacity and resources
There are two interrelated issues in relation to capacity and resources. In some instances, in LMIC care related to SRH is chronically underfunded with significant unmet need remaining (Pathak and Tariq, 2018). Whilst HIV has seen a relatively large amount of funding, this fails to meet the demand and SRH is often not a funding priority (Schäferhoff et al., 2019). Stigma related to SRH arises at multiple levels, including governmental, societal and individual. Such stigma, rooted in, and perpetuated by, patriarchal desires to control women’s decision-making and bodies and moral ideas around ‘good’ and ‘bad’ sexual behaviour and desires, has the potential to influence policy, funding and programming. Despite the growth of donor aid from 2002 to 2017 in some east African countries, resulting in a period of improved indicators (Kibira et al., 2021), UNAIDS (2020) reports that increases in resources for HIV in LMIC stalled in 2017, with funding decreasing by 7% between 2017 and 2019. Furthermore, financial resources from donor aid are not evenly distributed with HIV receiving the most funding support and other aspects such as maternal health, abortion and family planning receiving a much smaller allocation in comparison (Schäferhoff et al., 2019). Harnessing the potential from SRH Big Data requires increasing investment by donors and governments, alongside more effective approaches that strengthen foundational data systems and governance frameworks and support local knowledge and capacity development which may meet challenges due to the stigma of the topic. In countries where laws prevent access to abortion or criminalise homosexuality, financial (and social) support for Big Data initiatives that recognise and address the needs of women and other minority groups may not be forthcoming. For example, the Global Gag Rule, issued by the Trump administration sought to curtail access to abortion. They sought to ban foreign NGOs that received US Government family planning assistance from using funds, from any source to provide abortion services, counselling, or referrals, or to advocate for the liberalisation of their country’s abortion laws, including for the first time, HIV funding through the President’s Emergency Plan for AIDS Relief (PEPFAR) (Priyanka, 2019). Whilst the Global Gag Rule has been rescinded, policies and funding requirements could impact on the ability of using Big Data initiatives in SRH contexts. The growing involvement of a complex web of stakeholders in Big Data analytics increases those who have power over citizens, with many tech firms coming from higher income countries. Some of these may have a particular ideology to embed creating risks. For example, reports from the US reveal the use of mobile data (normally used in marketing of consumer goods or services) alongside mobile geo-fencing, to target women attending abortion clinics with anti-abortion advertisements (Coutts, 2016). This creates risks for women at a time of vulnerability who may be seeking sensitive health care. Such surveillance enables the sharing of names and addresses of women seeking abortion care, and those who provide it, with anti-choice groups (Coutts, 2016). How long before such tactics, driven by big tech companies from higher income states, become embedded in LMIC? Additionally, SRH will have to work against other donor priorities such as climate change, COVID-19 and its impacts, and increased support for economically productive sectors. Additionally, the British reduction in Official Development Assistance (ODA) has the potential to impact funding for SRH. Thus, Big Data initiatives including leveraging capacity or developing infrastructure specifically around SRH in LMIC, where there are competing priorities and many of the SDGs remain unmet, alongside the context whereby the rights of women and other marginalised groups are rarely prioritised, may meet social, political and financial resistance limiting their effectiveness.
Privacy and risks
Issues of privacy and safety around Big Data have been gaining recognition (Maaroof, 2015). In parallel, issues around data protection in LMIC have increasingly been recognised. For example, the rapid withdrawal of Allied forces from Afghanistan saw the swift resurgence of the Taliban. With this came fears around the compromised safety of data that could be used to help identify Afghans who had supported coalition forces, with grave consequences (Hu, 2021). The theft of over a half a million records concerning missing people and their families, detainees and other people receiving services from the Red Cross and Red Crescent Movement drew worldwide attention and condemnation (ICRC, 2022), demonstrating that no data is considered out of bounds. It is not only data breaches that risk privacy with reports that data collected about refugees may be used for other purposes such as counter terrorism or migration management (see Hayes, 2017). Such instances emphasise the risks of increased visibility that come with increasing datafication. Institutional frameworks to protect privacy may not be in lieu. This has important consequences, especially in a development context. Citizens may lack understanding of the multitude of ways that they become entangled with data as they live their daily lives and how this is appropriated by others to create advantages (Smith, 2018) and even when the datafication of services is recognised, distrust remains (Steedman et al., 2020). Furthermore, individuals might be unaware of consenting to data collection, and access to welfare may be contingent on agreeing to digital data collection and the resultant surveillance (Cukier and Mayer-Schoenberger, 2013; Holloway et al., 2021).
Consideration of legal frameworks and ethics must be in place to protect data sharing processes (Maaroof, 2015). As the International Development world becomes more data-driven, the generation of profiles and the use of complex targeting and eligibility assessments to identify and provide better provisions for service users, necessitates increased data collection around individuals, groups or spaces (Hayes, 2017). Thus, the volume of data being collected increases visibility and enhances opportunities for datasets to be linked, presenting the risk of identification (see Shlomo and Goldstein, 2015). Visibility, identification and other privacy related issues in the field of SRH add an additional layer of complexity. Sexual and reproductive health is considered a significantly private domain of health and wider life. Some HIV positive people report being shunned by local communities or experiencing high levels of violence, homosexuality or involvement in sex work also brings stigma, violence and criminal sanctions or worse, and in Somali and Nigeria the death penalty stands in some regions for homosexuality (Jjuuko and Tabengwa, 2018). Details about fertility, or pregnancy related concerns are private and abortion remains criminalised in several localities (Jain 2019; Larrea et al., 2021). Thus, the use of Big Data in relation to SRH issues presents very specific and critical concerns around privacy. In Kenya, concerns were raised around the use of biometrics in HIV research, highlighting that function creep, whereby data gathered for health purposes could be used by the police to target key populations for arrest (KELIN, 2018). Thus, it is essential that the correct protections are put in place to protect Big Data related to SRH, and that protective processes are applied at all stages from collecting and storing of data, to processing and decision-making across all stakeholders. Additionally, wider work around digital literacies is vital to ensure that citizens understand how their data will be collected and used, as well as their data rights when engaging with all digital systems but especially in SRH contexts. Community involvement in Big Data initiatives is essential to safely develop systems. Furthermore, with the risk of increased visibility, questions remain about the ways in which those from marginalised communities may seek to protect themselves and disengage from digital systems, or seek to minimise their digital trails. The impact such unintended consequences have on the ability to receive welfare, health care and other forms of support requires urgent attention.
Bias and discrimination
Bias and discrimination in alternative data sources and algorithms have received particular interest. Banks (2018) has written in depth about how Big Data systems have developed in sophistication, and are used as forces for control, manipulation, and punishment, constricting poor and working-class people’s opportunities, resulting in discrimination. Issues around race, gender and disability (Obermeyer et al., 2019; Packin, 2021) are reported, Bemjamin (2019) demonstrates that technology is developed within a racist context and generates information that, when (mis)used, exacerbates inequities for those already marginalised. Bias algorithms learn from bias data, thus, consideration around the ways that data may be biased is a fundamental issue to consider in relation to SRH. Bias can enter at many points; access to technologies that generate Big Data can be curtailed due to economic reasons, infrastructure, and other social divisions such as gender or disability. Resistance at local levels or lack of funding from donor organisations to support capacity building and infrastructure as well as live projects as explained above, may mean the Global North dominates in the Big Data projects that are conducted. Colonial history offers important insights into how external labelling, analysis, measurement, and systems can result in inhumane and callous effects (Mawere and Van Stam, 2020). The Western domination of digital platforms and data processing create systems and algorithms that are removed from their context and are designed and implemented by those who understand the technology, but lack awareness about the cultural context. Thus, when using data to drive decision-making, planners and decision makers should consider who has access to these technologies, who is absent from the data and the impacts of such exclusions (UN Women, 2018). It is essential that local actors and social scientists who can provide contextual knowledge play a pivotal role in teams which a currently dominated by data scientists (Taylor and Broeders, 2015).
Digital data availability is heterogeneous across borders; for example, countries characterised by larger mobile phone and Internet penetration rates will generate data that is more directly produced by people. In contrast, in countries where there are large aid communities, data will be more programme-related. There are also differences across socio-demographic and geographical characteristics and differences within borders themselves (Maaroof, 2015; Hoi-Wai, 2014). Hoi-Wai (2014) stresses that it is fundamental to investigate the biases generated when Big Data influences policies. Particularly, attention should be given to those countries that produce a smaller amount of data or have less capacity in Big Data to avoid adding issues across place based digital divides. The digital divide can operate across other lines particularly those related to other forms of exclusion. Those belonging to minority groups, for example, LGBT + communities and sex workers may experience digital exclusion. For example, there may be reticence to post on social media about sexual orientation, or individuals may be excluded from healthcare facilities due to stigma resulting in their electronic health records being excluded from datasets which are used for decision-making purposes (Ganesh et al., 2016; Kim et al., 2018). This leads to bias data due to the under representation of those groups. Furthermore, those that are free to post on social media about their LGBT + or sex work status do not represent all views or positions and may be in a privileged position compared to those who seek to maintain a hidden identity, or those who experience other intersectional stigmas and inequalities around class, race or disability, for example. Others may seek to deliberately self-exclude for other reasons. For example, Heeks (2021) reports that refugees and migrants may be unwilling to use digital systems for fear of repercussions. These communities are vulnerable to sexual violence and face a range of challenges related to sexual and reproductive health (Amiri et al., 2020; Krause, 2020). Despite limited evidence around digital self-exclusion, it becomes apparent that self-exclusion from digital systems may lead to planning and programming decisions being made using data where the most vulnerable are absent, creating further exclusions which may contribute to worsening SRH outcomes. Thus, whilst some members of marginalised communities may benefit from digital technologies and become included in Big Data datasets to aid appropriate programming, those who are often the most marginalised of the marginalised continue to remain excluded from datasets and thus interventions, widening inequalities.
Big data has been the object of social critique (see e.g. Boyd and Crawford, 2012). Some of these debates relate to LGBT + lives. For instance, we refer to the Jen Jack Gieseking’s call for a ‘queer feminist approach to the scale of big data’. Here, it is argued that the characteristics of big data undermine the importance of communities that are ‘small’ due to histories of discrimination and violence (Gieseking, 2018). Other work, for example, McGlotten (2016) details the challenges of big data related to non-white LGBT + people. The binary construction of data is a diffused practice in data collection strategies both in Big Data and traditional survey contexts. This leads to a simplification of how sexuality might be identified (see Guyan, 2022). Data collected via Big Data, such as Facebook, may offer users a range of gender options, which work towards inclusivity. However, allowing users to choose only one identity option, is insufficient to account for the multiple, and also overlapping ‘experiences of self’ that may characterise queer identity (Ruberg and Ruelos 2020).
Despite claims that online data reflects people’s beliefs more than data collected via survey’s etc., online spaces are heavily curated (Tiggemann and Anderberg, 2021) thus queries about the accuracy of data collected should remain at the fore. Exhaust data, such as, GPS data may not be curated in the way that online social media posts are. For example, some individuals may seek to deliberately hide or present a different identity due to stigma, violence or fear of criminal sanction. Tech-related violence impacts the data generated and limits access and participation and prevents freedom of online expression (Shephard, 2016) creating inherent biases in data. Increased visibility that comes with the inclusion in datasets presents challenges for those from minority sexual groups with some members maintaining two social media accounts, one straight and one queer ((Maya Indira Ganesh et al., 2016; see also Shephard, 2016). Thus, datasets generated from social media data should be used with caution and the inherent biases around access and the way in which the digital data has been curated should remain at the fore.
Conclusion
In this paper, we have highlighted the opportunities and challenges of working with Big Data in the context of sexual and reproductive health to address the SDG. Whilst there are clear opportunities for Big Data to contribute to the SDGs beyond simply monitoring and measuring, the field of SRH presents nuances that must be considered if Big Data can be utilised to safely realise such promissory opportunities. Big Data solutions and approaches cannot be understood purely from the technical domain and it is inadequate to apply techniques and processes across multiple areas without considering the specificities of the topic in local contexts. A political economy approach highlights the power asymmetries, demonstrating the risks to vulnerable communities. Thus, we need collaborative and multi-disciplinary approaches that challenge such asymmetries where technical insights are combined with local knowledge and contextual familiarity. As we have argued, Big Data in the context of SRH, is not data like any other and while this paper is not intended to be a conclusive document for the safe, ethical and effective use of Big Data in SRH, in fact this is just the beginning, it is intended to stimulate dialogue and provoke thought among those seeking to draw on Big Data in SRH contexts. As Big Data as a field itself matures and grows in development contexts, more will become known about the usefulness and impacts of such approaches. However, in the meantime planners, policy makers, programme managers, technology providers and researchers seeking to use SRH Big Data in development contexts should be aware of the ways that SRH data differs to other data. Thus, working in partnership with communities, they should seek to address such issues in their work to help improve well-being alongside mitigating for negative consequences. We believe that failing to take heed of the issues outlined above would be negligent, unethical and could lead to catastrophic consequences for the most marginalised.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Natalie Hammond, Manchester Metropolitan University. Natalie is a sociologist and senior lecturer in the Department of Social Care and Social Work. Her research focuses on gender, sexual and reproductive health and well-being and digital technologies.
Angelo Moretti is an Assistant Professor in Statistics at Utrecht University in the Department of Methodology and Statistics. Angelo is a survey statistician and an elected member of the International Statistical Institute (ISI). He has conducted research in small area estimation under a wide range of approaches, such as multivariate generalised mixed-models and survey calibration. His research also focuses on mean squared error estimation based on bootstrap approaches, and data integration methods (statistical matching and probabilistic record linkage). He is also interested in applications related to social exclusion, crime and public attitudes indicators.
