Abstract
Africa has a colonial past that renders it a linguistic melting pot, where language is not only important for communication but is inextricably related to cultural identity. In Africa, there are over 2000 languages that are still being used and spoken. Language diversity coupled with cultural diversity may affect the process of obtaining informed consent in data-intensive research. We explore some of the challenges and opportunities of multilingualism in handling informed consent in the context of data-intensive research. In multilingual contexts, as in most African countries, language is exceptionally central, and translation has potential cultural, social, historical, functional and scientific importance. However, it is recognised that terminological and translation activities may not always be cost-effective or feasible. We consider alternative mechanisms of harmonisation of data-related terminology and concepts in multilingual contexts, such as iconography, graphic elicitation and other multimedia formats of information sharing. The inclusion of visual or multimedia explanations in informed consent forms can improve comprehension, enhance information transfer and learning, reduce potential vulnerabilities associated with low literacy levels or the inability to interpret technical language associated with data-intensive research, build trust with participants and their communities, and promote autonomy of potential participants. We recognise that the inclusion of visual or multimedia content to facilitate information transfer is only one component of the informed consent process for data-intensive research. Research ethics committees (RECs) should be mindful of other key considerations and challenges of informed consent for data-intensive research in sub-Saharan Africa (SSA), and to explore whether these alternative forms of consent are ethical and effective in multilingual contexts.
Introduction
Data-intensive research is an incipient and dynamic research field. Data intensive research involves data resources that are beyond storage requirements, computational intensiveness or complexity that is currently typical of the research field (Computer Research Association, 2015). It is research in which the capture, curation and analysis of large volumes of data is central to the scientific question (Resnik et al., 2017). It is driven by rapid advancements in technology that come with enormous benefits, such as increased access to information and services through internet-based platforms (Tenopir et al., 2020). However, this development comes with ethical-legal challenges like privacy of data donors, data transfer and sharing, data access and data exploitation (Braunack-Mayer et al., 2023; Kabanda et al., 2023; Tanweer, 2022). In data-intensive research, large volumes of information are collected, processed and shared, sometimes without the knowledge of data subjects (Laurijssen et al., 2022; Smit et al., 2023; Williams and Pigeot, 2017).
Research ethics dictates that informed consent must be sought when deliberate research is conducted, particularly when the data involves human participants, associated bio samples and data (Karim et al., 2018; Knifed et al., 2008). Informed consent is based on five cardinal principles, namely: competence, voluntariness, disclosure, understanding and authorisation (Beauchamp and Childress, 2019). Language is central to informed consent and the use of complex language requires consideration of readability and participant literacy levels (Falagas et al., 2009; Karim et al., 2018; Moodley et al., 2005; Rowe and Moodley, 2013).
Researchers in sub-Saharan Africa (SSA) face a myriad of challenges when dealing with informed consent and data-intensive research. These issues include data illiteracy (Moyo and Bangani, 2023), utilisation of technical terms and language multiplicity (Lema et al., 2009). Transparency and accountability are significant ethical considerations especially when data is transferred or re-used. The linguistic diversity in SSA is a major barrier to informed consent (Alexander, 2007; Blommaert, 2007). Researchers are often required to translate complex technical terms and data-related concepts into local languages to facilitate informed consent processes. This can be complicated or even unfeasible (Busisiwe et al., 2023), often in relation to the volume of languages used or spoken in different countries. For example, by 2022 there were over 2000 languages in Africa, with Nigeria accounting for 520 languages, followed by Cameroon (277 languages), Democratic Republic of Congo (214 languages), Ghana (83 languages), Kenya (68 languages), South Africa (31 languages) and Zimbabwe (23 languages; Statista, 2022).
Translations that accurately capture the intended meaning of data-related concepts is challenging and African language lexicons often do not have up-to-date translations for scientific terms which is compounded by the fact that many researchers lack vernacular linguistic competency having completed all their science education in a foreign language. Furthermore, urban communities have emerging hybrid languages (Beck, 2010; Falagas et al., 2009; Kießling and Mous, 2004; Mc Laughlin, 2009; Prah, 2010). Potential research participants may also have low levels of data literacy, inadequate awareness of potential risks and benefits, or they may be from communities with limited experience in research participation (Kripalani et al., 2021; Sørensen, 2022). Sub-Saharan Africa hosts multiple languages and cultures with varying geographical set ups culminating in cultural and contextual differences including paternalism in healthcare that can affect empowered participation in conversations on data-related issues (Akpa-Inyang and Chima, 2021; De Roubaix, 2017; Lenharo, 2023; Norman, 2015; Vila Ortiz et al., 2023). In addition, limited availability of resources may hinder development of communication materials or investment in comprehensive educational initiatives for research participants and capacity-building of researchers without which data-related concepts may be difficult to convey (Addissie et al., 2016; Minnies et al., 2008; Morrow et al., 2015). Collectively, these factors impose a supererogatory obligation on researchers to communicate effectively with research participants.
Successful research depends on using easily comprehensible language (Kadam, 2017). Terminology such as big data, data sources, data subjects, data privacy and data protection are used across different academic fields of research. However, delineations of these terms are often not transparent or universally accepted (Andreotta et al., 2022). This may further complicate the informed consent process, which is also contextual and may be influenced by multilingualism, culture and economic status (Akpa-Inyang and Chima, 2021; Lakes et al., 2012). It is therefore important for researchers, RECs, and the research communities to develop mechanisms of enhancing participants’ understanding of the subject. In the next section, we define and contextualise key terms used in data-intensive research, namely, personal data, data privacy and big data to emphasise the importance of harmonisation of terminologies.
Common terminologies used in data-intensive research
Personal data refers to any information relating to a person which can be used for identification (Haddadi et al., 2015; Karim et al., 2018; Lupton, 2018). Data laws define the concept as any information relating to an identified or identifiable person; however, what the concept refers to varies and would need interpretation from specific cultural contexts (Kainja, 2023; Makulilo, 2012; Ncube, 2016; Roos, 2006; Staunton et al., 2020; Wanekeya, 2023). Examples of personal data include names, photos, email addresses, bank details, medical information or computer Internet Protocol (IP) addresses. Different ways of identification determine operationalisation of the terms according to specific cultures and contexts. The collection of personal data introduces the potential risk of a personal data breach, which is defined as a breach of security leading to accidental or unlawful destruction, loss, alteration, unauthorised disclosure or access to personal data (Alunge, 2020). For example, the sharing of patients’ personal data with unauthorised third parties can lead to deceptive activities like insurance fraud (Republic of Kenya, 2019; Nkomo, 2019; Pool et al., 2024).
Other significant terms are data privacy and big data. Data privacy is the protection of personal data from unauthorised access, use, disclosure or destruction (Barker et al., 2009; De Capitani Di Vimercati et al., 2012; Jain et al., 2016). It involves collection of personal data, processing and storage in compliance with relevant laws, and implementing appropriate security measures to prevent data breaches. To this end, several African countries have introduced laws that give institutions the responsibility of ensuring privacy and security of data (Alunge, 2020; Andreotta et al., 2022; Brand et al., 2022; Lenharo, 2023; Makulilo, 2012; Ncube, 2016; Wanekeya, 2023).
The term big data shows the linkage between data concepts and data-intensive research. Big data refers to extremely large and complex datasets that are difficult to process and analyse using traditional data processing methods. It allows researchers to do things not possible before, such as discovery of new information facts, relationships, indicators and pointers that would otherwise not have been realised. This is where the term data-intensive research emerges, that involves using expansive data that is beyond storage and computational intensiveness typical of the research field (Bydon et al., 2020; Kitchin and McArdle, 2016; Ward and Barker, 2013; Ylijoki and Porras, 2016). For example, the eyeball-scanning crypto project Worldcoin uses the iris to create a digital identity in some countries. Worldcoin was stopped from collecting data in Kenya due to lack of security and storage of iris scans (Roth, 2023). However, the case was dropped, and the company was able to continue with the project, which raises further ethical concerns.
Additionally, big data refers to data sets containing personal information that continue to grow, are large, complex and difficult to store and analyse (Nelson, 2015). The complexity of data is further described by the ‘5 Vs’, namely: volume, velocity, variety, veracity and value. Volume refers to the large size of the data, while velocity is the speed at which new data are created, stored and moved around. Variety refers to the diverse range of data types and sources such as structured data from databases, unstructured data from social media or semi-structured data from sensors. Veracity relates to the accuracy of the data and depends on the source and the quality of the data. Value is what is realised after big data are processed, analysed and deployed (Andreotta et al., 2022; Favaretto et al., 2020). The 5 Vs provide a clearer picture of the magnitude of data, variety, how they are created, analysed and used to make decisions. For example, enormous volumes of mobility data (volume), difference types (variety) were collected rapidly (velocity) and accurately (veracity) and used (value) in enforcement of COVID-19 protocols (Merrill et al., 2022). However, it is recognised that issues such as potential future use of data may be unknown at the time of data collection and that these uncertainties are difficult to deal with ethically (Andreotta et al., 2022; Ferretti et al., 2021).
Based on the complex definitions, it is argued that visual or multimedia content could simplify informed consent processes in multilingual contexts. The inclusion of visual icons in informed consent forms could improve comprehension, particularly in data-intensive research. We therefore propose contemplating text with a visual combination of icons. The following section discusses multilingualism in informed consent processes in SSA.
Multilingualism and informed consent in sub-Saharan Africa
The Declaration of Helsinki (World Medical Association, 2013) provides for adequate enlightening of research participants. The guideline states that ‘special attention should be given to the specific information needs of individual potential subjects as well as to the methods used to deliver the information’ (World Medical Association, 2013). However, the terms ‘adequate information’ and ‘special attention’ that must be given to ‘specific information needs’ are not explored further or defined, making them open to broad interpretation. Adoption and implementation of these guidelines has also not been uniform across different jurisdictions due to the fact that interpretation is contextual and may be influenced by language and culture (Rossi and Palmirani, 2020b).
The General Data Protection Regulation (GDPR; European Parliament, 2016) is the gold standard of data privacy regulation on which many countries’ data regulations are modelled. The GDPR requires that informed consent is in an intelligible and easily accessible format using clear and plain language (Art. 7(2)), and that informed consent complies with the principles of data protection (Danezis et al., 2015; Hildebrandt and Tielemans, 2013). In addition, the GDPR explicitly mentions ‘icons’, ‘cartoons, infographics, flowcharts’, ‘comics/cartoons, pictograms, and animations’ meant to provide information ‘in an easily visible, intelligible, and legible manner’ (Art. 12 (7)). These legal stipulations require that certain design elements be used when creating the informed consent form to ensure transparency, which encompasses the ‘quality, accessibility, and comprehensibility of the information’ (Edwards et al., 2019). In Europe, the development of new data protection iconography is becoming more popular due to the challenges faced in understanding data-related terminology and concepts (Rossi and Lenzini, 2021; Rossi and Palmirani, 2020a).
In contrast, data protection laws in Africa are silent on the use of iconography. For instance, despite the South African Protection of Personal Information Act (POPIA) being modelled on the GDPR, it is silent on ‘transparency by design’ requirements for informed consent forms to address clarity for the intended audience (Eiband et al., 2018; Felzmann et al., 2020; Lnenicka and Nikiforova, 2021). Similarly, the data protection laws of Zimbabwe (Ncube, 2016), Ghana and Kenya (Republic of Kenya, 2019) are elaborate, but also omit any reference to designing informed consent forms. Fortunately, the Declaration of Helsinki provides a mandate to rethink and redesign consent forms to bridge multilingual challenges (Rossi and Palmirani, 2020b). The Declaration states that researchers can only obtain voluntary informed consent ‘after ensuring that the potential subject has understood the information’ (World Medical Association, 2013). It is thus advisable that information be presented in a ‘language’ that minimises possible misunderstandings or misinterpretations of information and provides clarity on the content of the consent form. Reference to ‘language’ in such situations could also include visuals or pictographs, considering that a depiction used in any language often has the ability to convey the same message (Rossi and Palmirani, 2020b). However, despite the intention to provide uniform and appropriate language or visuals, there could be inherent differences in cultural meaning or interpretation. This highlights the importance of engagements with research communities to ensure appropriate use of any visuals or images in different research contexts.
Language discourse emphasises translation of terminologies to indigenous languages in multilingual contexts. Use of correct terminology may contribute to development of effective scientific and technological transfer, assimilation of knowledge and skills amongst researchers and participants. In multilingual contexts, language is central over other forms of communication such as facial expressions and gestures. This notwithstanding, translation has potential cultural, social, historical, functional and scientific importance. Despite this, it is recognised that terminological and translation activities may not always be feasible (Alberts, 2010). In addition, while providing research information in indigenous languages is necessary for ensuring inclusivity, illiteracy may render translation of informed consent forms pointless. Consequently, it is imperative to consider other mechanisms of harmonisation of data-related terms and concepts, such as utilisation of visual representations. An example of successful visual representation and iconography in multilingual contexts is road signs and signals, which are considered as a universal language understood by most road users regardless of their preferred language of communication. They are a product of an international harmonisation effort that culminated in the 1968 Vienna Convention on Road Signs and Signals where consensus on the current system of road signs, size and colours and icons was achieved. Road signs and signals demonstrate the value of symbols or icons with universal understanding, and they are commonplace in SSA. They are the main method of communicating to road users and controlling traffic and they complement the efforts of traffic police to ensure safety. We recognise that adherence to traffic laws may be variable in different SSA contexts, but this does not detract from the inherent value of universally recognisable symbols or icons.
Iconography, which may be defined as ‘the traditional or conventional images or symbols associated with a subject’ (Merriam-Webster, n.d.) and ‘pictorial material relating to or illustrating a subject’ (Merriam-Webster, n.d.) was central to the development of road signs (Dewar and Pronin, 2023; Economic Commission for Europe-Inland Tansport Committee, 1968; Krampen, 1983). We believe that iconography is a potential mechanism of harmonising data-intensive research terminology. Icons can communicate meanings independently of textual literacy and linguistic barriers in a standardised manner. As such, icons are an important tool for performing interactive tasks such as initiating actions and obtaining information. They communicate meaning and can overcome language barriers as experienced in varied linguistic and cultural settings in Africa.
While it is important to consider potential cultural differences in the meanings of symbols and use of different colours, standardised icons that explain important terminology in data-intensive research could enhance understanding of informed consent (Efroni et al., 2019). Privacy icons have been used to improve the informed consent process by making users more informed. For example, the Platform for Privacy Preferences Project (P3P) uses a machine-readable syntax that helps users to understand privacy statements and give warnings of what is against their preferences (Efroni et al., 2019). Moreover, websites or applications containing high-quality icon packs such as Flaticon.com (Freepik Company S.L., 2023) have icons for data-intensive research terminology that can easily be incorporated into informed consent forms. Travel, hotel and office icon packs are among other high quality icon packs that are easily understandable and cut across languages and cultures (LottieFiles, 2021). These icons are used in SSA, and their efficacy has been studied. Soares (2015) empirically validated the efficacy of standardised icons used on mobile devices by users in a culturally diverse SSA region. Additionally, an empirical study on icon recognition showed that the use of conceptual models designed to match individual usability can enhance recognition (Ashe et al., 2018).
Graphic elicitation is another possible mechanism to complement data-intensive research terminology. Graphic elicitation is unique, as the icons, symbols, images or pictures are produced in collaboration with people derived from the community to whom the communication is targeted. Graphic elicitation involves presenting participants with visual stimuli such as diagrams, icons or drawings. Working with an artist, participants can edit the visual stimuli until they capture the idea that they wish to express. Through an iterative process, often involving focus group discussions, participants can produce icons or figures that communicate their shared understanding of concepts and terminologies (Kingori, 2015).
As an example, Kingori (2015) used graphic elicitation to study the decision-making process of potential participants in biomedical research in West and East Africa over an extensive period and illustrated complex and largely abstract moral dilemmas faced by Kenyan fieldworkers. The final product was a set of drawings that elucidated the participants’ shared understanding of complex abstract terms. True to the adage, ‘a picture is worth more than a thousand words’, these drawings were a potent communication tool. Graphic illustrations can therefore be viewed as a language developed by the relevant stakeholders. This is distinct from the traditional informed consent approach whereby text originated by the researchers is presented and explained to the prospective participants. The process used in developing illustrations to assist in decision making and the effort taken to develop clear illustration suggestions that text and visuals can complement each other thereby enhancing informed decision making (Kingori, 2015).
A contextual example from SSA that supports the effectiveness of multimedia in simplifying complex information for patient or participant populations is Speaking Books®. Speaking Books® originated in SSA in 2003, when the South African Depression and Anxiety Group (SADAG) was working to combat teen suicide in South Africa and experienced the challenge of distributing health information to low literacy communities and set out to develop an affordable solution for reaching low literacy communities with important health information. Speaking Books® now cover more than 100 health, social and literacy topics in over 40 languages, and have been distributed in 35 countries (Speaking Books, n.d.). Speaking Books® include culturally relevant artwork and text, supported by straightforward and easy-to-understand text. Each page has a corresponding button that triggers a soundtrack of the text in the local language, often through the voice of a local celebrity. Speaking Books® are a user-driven solution to healthcare information dissemination. There is also a clinical trial version of Speaking Books® that introduces participants to concepts associated with clinical trials, such as informed consent, and thus may provide a valuable tool to supplement informed consent processes for data-intensive research.
Pictures have also been used as supplementary visual communication tools in genomic research projects to assist researchers in obtaining informed consent from the San community (Bedeker et al., 2019). The ‘Biobanking and Me’ speaking book was written to address concepts and terminology associated with genetic research in South Africa. Two bilingual versions of the speaking book (English-Afrikaans and English-isiXhosa) were developed and assessed. This tool improved the understanding of basic genetic concepts by using the communities’ cultural beliefs and their visual story-telling methods found in the San culture’s rock paintings to explain the abstract nature of genes and inheritance to inform genomic research. Following use of the speaking book, participants demonstrated a significant overall knowledge gain. In addition, the majority of participants had a positive reaction to the artwork, bilingual audio and text of the speaking book (Bedeker et al., 2019), thereby demonstrating how such tools may effectively facilitate participant autonomy and build trust among potential research participants and their communities.
Importantly, our current view is that visual content should not replace written content in informed consent forms but should be used to facilitate information transfer, reduce potential vulnerabilities associated with low literacy levels or the inability to interpret technical language associated with data-intensive research, build trust with participants and their communities, and promote autonomy of potential participants during the consenting process (Andreotta et al., 2022). We recognise that having both visual and written content in informed consent forms required validation to ensure that both formats of content are consistent, appropriate and accurate, and that this might place additional burden on the development of informed consent forms. Research on using videos has been conducted; however, further research is required (Hendricks et al., 2018). We also recognise that the inclusion of visual or multimedia content to facilitate information transfer is only one component of the informed consent process. We therefore recommend that RECs should deliberate on data-intensive research protocols and informed consent forms, and be mindful of other challenges of informed consent such as low education levels, language barriers and diverse cultures.
It is important to recognise the nuances of socio-cultural influences on the consent processes in African countries, where a person may be defined through their community, and their decision-making could be influenced by family, friends, spiritual leaders and clans. As such, RECs should assess the process of obtaining informed consent, as well as the informed consent forms (Kingori, 2015). The other challenge is assessing the value of informed consent particularly when obtaining consent may be impractical, for example, social media data and mobility data. In such circumstances, data subjects participate unknowingly thereby denying the opportunity to engage in the research process, leading to vulnerability and possible negative research outcomes (Ferretti et al., 2021). The success of research depends on acceptance by the public since the public is the data donor. This calls for broad consensus, or social license in the communities about what is acceptable. Given the multiplicity of languages and resultant challenges, iconography and other non-textual mechanisms can be the solution, RECs have a significant role here. In SSA, anonymisation of data is a response to ensuring privacy and confidentiality. Respective laws address processing of data to ensure privacy, confidentiality and transparent handling. In this regard, where data have been de-identified or anonymised, RECs should evaluate the possibility of the data being linked back to the communities or individuals, and whether there are measures to mitigate risks associated with possible social harms. Research ethics committees should also assess proposed data analytical processes for transparency and possible biases that could culminate in discrimination, stigmatisation or marginalisation based on race, gender, ethnicity, sociocultural beliefs and physical and emotional attributes (European Union Agency for Fundamental Rights, 2022). For RECS to competently discharge their responsibilities, multidisciplinary membership needs to cater for recruitment, co-opting data experts and training members to develop capacity in reviewing data-intensive research.
Conclusion
There are numerous challenges to informed consent for data-intensive research. These challenges are complicated in multilingual contexts of SSA. We have highlighted the importance of visual content, such as icons, participant-engaged development of visual content through graphic elicitation and other multimedia as options for promoting harmonisation of terminology in data-intensive research and improving information transfer during informed consent processes. Empirical research is also needed to develop visual and other multimedia to facilitate informed consent processes and explore whether these alternative forms of consent are ethical and effective in multilingual contexts.
Footnotes
Acknowledgements
The authors would like to thank all members of the Research for Ethical Data Science in sub-Saharan Africa (REDSSA) Consortium of Bioethicists for the stimulating discussions during Consortium meetings that lead to the conceptualisation of this manuscript.
List of abbreviations
GDPR: European Union’s General Data Protection Regulation
POPIA: South African Protection of Personal Information Act
REC(s): Research Ethics Committee(s)
SSA: Sub-Saharan Africa
Authors’ contributions
LO, MB, GRC, WJ, SS, TB and KM conceived the manuscript. All authors contributed to the draft manuscript and read and approved the final manuscript.
Availability of data and materials
Not applicable.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
All articles in Research Ethics are published as open access. There are no submission charges and no Article Processing Charges as these are fully funded by institutions through Knowledge Unlatched, resulting in no direct charge to authors. For more information about Knowledge Unlatched please see here:
.
Research reported in this publication was supported by the US National Institute of Mental Health of the US National Institutes of Health under award number U01MH127704. The content is solely the responsibility of the authors and does not represent the official views of the National Institutes of Health
Ethics approval and consent to participate
Not applicable for this conceptual paper.
Consent for publication
Not applicable.
