Abstract
There is increasing criticism on the use of big data and algorithms in public governance. Studies revealed that algorithms may reinforce existing biases and defy scrutiny by public officials using them and citizens subject to algorithmic decisions and services. In response, scholars have called for more algorithmic transparency and regulation. These are useful, but ex post solutions in which the development of algorithms remains a rather autonomous process. This paper argues that co-design of algorithms with relevant stakeholders from government and society is another means to achieve responsible and accountable algorithms that is largely overlooked in the literature. We present a case study of the development of an algorithmic tool to estimate the populations of refugee camps to manage the delivery of emergency supplies. This case study demonstrates how in different stages of development of the tool—data selection and pre-processing, training of the algorithm and post-processing and adoption—inclusion of knowledge from the field led to changes to the algorithm. Co-design supported responsibility of the algorithm in the selection of big data sources and in preventing reinforcement of biases. It contributed to accountability of the algorithm by making the estimations transparent and explicable to its users. They were able to use the tool for fitting purposes and used their discretion in the interpretation of the results. It is yet unclear whether this eventually led to better servicing of refugee camps.
Keywords
Introduction
Big data and new approaches to analyze these data have gained a prominent role in public sector decision-making and public service delivery (Desouza and Jacob, 2017; Klievink et al., 2017; Mergel et al., 2016). This trend has been captured under the label of “algorithmic governance” (Danaher et al., 2017; Gritsenko and Wood, 2020; Williamson, 2014). This concept expresses that algorithms fundamentally change the ways in which governments function. Big data analytics involves large-scale and real-time exploration of patterns and correlations in big data as a basis for government action at present (nowcasting) and in the future (forecasting) (Van der Voort et al., 2019). Algorithms can do so based on pre-defined rules and by self-improving through machine learning (Flach, 2012; Burrell, 2016).
Algorithmic governance has been praised for achieving higher levels of efficiency and effectiveness (Milakovich, 2012) and providing public officials with better decision-support (O’Malley, 2014). However, it is also subject to criticism. One part of this criticism focuses on algorithmic responsibility (Mittelstadt et al., 2016; Meijer and Grimmelikhuijsen, 2020). Algorithmic models may use data sources and indicators that are ill-suited for the purposes of public governance. This can lead to incorrect and discriminative decisions (Lepri et al., 2018). For example, some algorithms in the fields of law enforcement (Ferguson, 2017) and healthcare (Obermeyer et al., 2019) contain problematic racial biases. Machine learning replicates and amplifies biases present in the historical datasets used to train algorithms. These biases are not always recognized as algorithmic governance is cloaked in a neutral language of technology (Andrews, 2019). Algorithmic responsibility requires adequate weighing of ethical dilemmas involved in the organizational use of algorithms based on knowledge about their (possible) impacts (Meijer and Grimmelikhuijsen, 2020).
In addition, algorithmic governance is criticized for its lack of accountability. The ways in which algorithms function, often defy comprehension by public officials using them and citizens subject to algorithmic decisions and services (Giest and Grimmelikhuijsen, 2020). In fact, even those who have designed and trained the machine learning models may no longer be able to interpret and explain the obtained results (Burrell, 2016). Such algorithms are referred to as “black boxes” (Pasquale, 2015). As a result, the discretion of public officials working with algorithmic tools is limited. Public sector decision-makers, as all human beings, generally tend to favor the results presented by machines, even if these are mistaken—a phenomenon known as automation bias (Bullock et al., 2020; Goddard et al., 2012; Yeung, 2017). In other cases, decisions and delivery of services based on algorithmic models are fully automated and no longer subject to professional discretion. This absence of accountability undermines the legitimacy of algorithmic governance (Lepri et al., 2018; Meijer and Grimmelikhuijsen, 2020). Citizens and other societal actors have a democratic right to information on government decisions and processes which affect their interests (Piotrowski and Rosenbloom, 2002).
In response to these issues, research has called for more transparency and regulation of algorithms (Busuoic, 2020; Giest and Grimmelikhuijsen, 2020; Yeung and Lodge, 2019). These are useful, but ex post solutions in which the development of algorithms remains a rather autonomous process that tends to outpace governance. This paper explores another way to improve algorithmic governance: co-design of algorithms by and for actors who are involved with the governance issue. Co-design is defined as a design-led process, involving creative and participatory principles and tools to engage different kinds of people and knowledge in public problem solving (Blomkamp, 2018: 731). Earlier research has suggested that co-design by a multi-disciplinary team of researchers, practitioners, policymakers, and citizens would enhance fairness, transparency, and accountability of algorithms (Aizenberg and Van den Hoven, 2020; Lepri et al., 2018; Whitman et al., 2018). However, this has not yet been assessed in empirical cases of algorithms developed for public sector use—in which specific public values are at stake (Andrews, 2019). Based on a case study of development of an algorithm for public service delivery, we assess how co-design can ensure responsibility and accountability of algorithmic governance.
This paper presents a case study of development of an algorithm designed to address an important governance issue: management of refugee camps in Nigeria. Populations of refugee camps are fluctuating, depending on conditions in the region (such as political, economic, and climate conditions) and policy responses of host countries. In Nigeria, 2.7 million internally displaced persons live in refugee camps in the northeastern part of the country. Many have fled violence by Islamist extremist group Boko Haram and conflict-induced food shortages (UNHCR, 2020). Timely and accurate information on the populations of refugee camps is important for organizations tasked with the management of refugee camps to efficiently allocate budget, capacity, and resources to aid refugees. Satellite images and an algorithm for object detection provide a new alternative for estimating the populations of refugee camps, based on the numbers of tents. However, ethical concerns pertaining to such advanced ways of migration tracking have been raised (Dijstelbloem, 2017). Therefore, migration management is a critical case to study the effects of co-design on responsibility and accountability of algorithms used in public governance.
In this paper, we first discuss current theory on co-design and its applicability to development of algorithms. Then, we discuss recent applications of big data analytics in migration management and issues with responsibility and accountability of algorithms used in this field. In the methodology section, we present our case along with the data and method used to assess co-design of an algorithmic tool for public service delivery in refugee camps. In the results section we present how co-design was implemented and to what changes in the design of the algorithm this led in three phases of development: (1) data-selection and pre-processing, (2) training and machine learning, and (3) post-processing. We also reflect upon instances in which co-design could be strengthened and the conditions required for this. In the conclusions section, we discuss the main contributions of this study to theory on co-design and algorithmic governance in general and the use of algorithms to manage migration in particular.
Co-design and algorithmic governance
Co-design is a method for creatively engaging citizens and stakeholders to develop new solutions to complex problems (Blomkamp, 2018). The principles and practices of co-design resonate with collaborative governance practices in which non-governmental actors and citizens are co-creators of public services (Ansell, 2016; Bryson et al., 2020; Hartley et al., 2013). In co-design, actors who have an interest in addressing an issue have an active role in designing new policies and public services. They collaborate in gaining a better understanding of the issue at hand and iteratively developing and testing new solutions. Co-design refers to a process, set of principles and practical tools (Blomkamp, 2018): the process entails iterative stages of design thinking, oriented towards innovation. The principles include that actors are creative and experts in their own professional and lived experience. The practical tools include creative and tangible methods for telling, enacting, and making.
Co-design has gained popularity in public governance for several reasons. First, researchers expect co-design to lead to better solutions to problems which government actors are dealing with. Methods and principles for generative experimenting would stimulate innovation. Second, they expect co-design to foster cooperation and trust between the actors involved. Third, they expect co-designed policies and services to be more responsive to the needs of intended beneficiaries (Blomkamp, 2018; Hermus et al., 2020). Hence, co-design contributes not only to effectiveness of policy and administrative changes, but also to other public values such as responsiveness and legitimacy (Lewis et al., 2020; McGann et al., 2018).
These hypothesized effects of co-design have not yet been studied in relation to development of algorithmic governance solutions. Big data analytics or data science has developed as a specialism of its own (Kettl, 2018). Governments often outsource IT and analysis tasks, rather than building in-house expertise and data analysts developing the algorithm act as relatively autonomous agents (Goldfinch, 2007; Van der Voort et al., 2019). They develop and train the algorithmic model to be the most accurate “fit” with training data and they have less attention for ethical and political aspects of the design. This is considered as problematic if algorithms are to contribute to other public values (Andrews, 2019; Boyd and Crawford, 2012; Chatfield and Reddick, 2018; Mergel et al., 2016). Through involving the perspectives of a variety of actors, co-design could integrate checks and balances on the implications of the algorithm's operations for public values such as fairness, privacy, and social justice in the design (Aizenberg and Van den Hoven, 2020; Mittelstadt et al., 2016; Whitman, et al., 2018). This would ensure responsible algorithmization (Meijer and Grimmelikhuijsen, 2020). Furthermore, the involvement of stakeholders in the process of design of the algorithm and its implementation in governance processes opens the algorithm up to scrutiny. This would enhance algorithmic accountability.
Despite the qualities of machine learning, development of an algorithm requires human decisions in several stages of development. At these moments, co-design could contribute to algorithmic responsibility and accountability. Bruha (2000) distinguishes three stages (and several sub-stages) of algorithm development: (1) collection and preprocessing of data; (2) data mining; and (3) postprocessing. The first stage is concerned with selecting relevant data sources and preparing data for analysis. Big data sources include missing values, inconsistent data, noise, and problems such as data sets being too large or containing too many variables to process. In the stage of preprocessing, the researcher develops a strategy for selecting and ordering relevant data. The stage of data mining involves choosing the appropriate method of analysis—the algorithm—and training the algorithm to extract the right information from the data. In this stage, there are also several choices to be made by the developer. In the third stage, after the algorithm has been trained to extract the right information from training data, new data can be introduced to the algorithm to extract new information. When new information is extracted from the data, postprocessing is needed, for example to document and visualize data in a way that supports interpretation and use.
Several design methodologies with a focus on citizens and stakeholders have emerged, including value-sensitive design (Friedman and Hendry, 2019), user- and human-centered design (Baumer, 2017) and participatory design (Schuler and Namioka, 1993). These are increasingly used in design of algorithms and AI, recognizing that data-driven solutions need to be developed with attention for their societal context (Aizenberg and Van den Hoven, 2020; Whitman et al., 2018; Zhu et al., 2018). However, few studies have analyzed the hypothesized effects of co-design in development of algorithmic solutions for governance where public values are at stake (Andrews, 2019; Boyd and Crawford, 2012; Chatfield and Reddick, 2018; Mergel et al., 2016). Notable exceptions can be found in the works by Ruijer et al. (2017) and Safarov et al. (2017) who have shown that re-use of open government data by third parties in hackathons and living labs results in better utilization of open government data and more responsiveness to the needs of citizens. In addition to these studies, this case study will explore to what extent and how co-design of algorithms can enhance responsibility and accountability of algorithms used in public service delivery. In the next section, we first present the context of this case study: algorithmic governance of refugee migration.
Algorithmic governance of refugee migration
The management of refugee camps is among the most pressing governance issues of our time. Almost five million people live in managed refugee camps (UNHCR, 2018: 60). This includes refugees who have fled their home countries and persons who are internally displaced. A well-known example is Moria, a refugee camp on the Greek Island of Lesbos which burned down and was rebuilt in the summer of 2020. This camp provides shelter to 13,000 refugees who fled countries such as Afghanistan and Syria. In addition, about one million refugees worldwide live in self-settled camps. For example, during the European refugee crisis unofficial camps were set up in Idomeni, Greece and “the Jungle” in Calais, France. In 2012, the UNHCR estimated the average population of refugee camps to be 11,500. Some refugee camps have the size of major cities. For example, Kutupalong in Cox's Bazaar, Bangladesh is home to nearly 900,000 Rohingya who fled Myanmar. Bidi Bidi in Uganda has a population of 285,000, and the Dabaab camp complex in Kenya has 235,000 inhabitants.
Populations of refugee camps change as a result of conditions in the refugees’ countries of origin (such as war, political oppression, and drought) and policy responses of host countries and the international community to refugee migration. However, governmental and non-governmental organizations lack reliable data and estimates on refugee migration (Willekens et al., 2016). Traditionally, the populations of refugee camps are measured based on census data collected at the sites. However, censuses are burdensome, data quickly gets outdated and sharing of data between actors suffers limitations. This lack of timely and accurate data makes it difficult for government organizations and NGOs tasked with camp management to anticipate what amounts of supplies need to be delivered to the camps. As a result, camps may run out of essential supplies, or face overstocks that strain already overburdened supply chains.
New sources of big data have become available which can help to estimate the size of refugee camps in real time. The work by Salah et al. (2019) in the “Data for Refugees” challenge demonstrates that mobile phone data can support coordinated efforts to help refugees. In this project, a telecom provider provided the researchers with anonymized mobile call detail records of Syrian nationals in Turkey. This was used to estimate the numbers and locations of Syrian refugees in Turkey. Curry et al. (2019) argue that also social media data and meta-data from refugees can be used to map migration. Refugees actively use social media in migration decision-making in their host countries and in transit locations (Dekker et al., 2018). For example, Mazzoli et al. (2020) used geolocated Twitter data to detect migration flows resulting from the crisis in Venezuela. The International Organization for Migration (IOM) and the European Commission have launched the “Big Data for Migration” alliance committed to making more reliable sources of migration data available to inform policymaking (IOM & European Commission, 2018). The United Nations’ Humanitarian Data Exchange facilitates sharing of datasets (OCHA, 2020).
Satellite data offer specific opportunities for migration management. The amount of satellite imagery has increased substantially over the past few years, similar to its applications in public governance. Satellite imagery has for example been used for classifying land-use in difficult to reach areas (Wieland and Pittore, 2014, Han et al., 2017) and already, satellite imagery can be used for real-time object detection of vehicles (Van Etten, 2018). There have also been earlier applications using satellite data to estimate the populations of refugee camps. The United Nations Institute for Training and Research (UNITAR) maps refugee camps within its Satellite Analysis and Applied Research programme (UNOSAT) (UNITAR, 2020). Statistics Netherlands (CBS) and IT company CGI used satellite images and social media data in a study to map migration flows commissioned by the European Space Agency (ESA) (Statistics Netherlands, 2018).
Several studies report on the effectiveness of this method to estimate the sizes of refugee camps. Bjorgo (2000) extracted the area of five refugee camps from satellite images. He found a significant relationship between the camp area and the population. Giada et al. (2003) compared four methods to extract information from high-resolution satellite imagery of the Lukole refugee camp in Tanzania. These methods required postprocessing steps that needed to be decided manually. For example, a decision needed to be made on what detection areas were too small (i.e. including too few pixels) to be a tent. Wang et al. (2015) successfully implemented an automated method to detect the number of tents in case of two refugee camps. Their methodology was useful especially for white tents, which have a large contrast with the soil. This study builds upon these previous studies by using a method to extract information from satellite imagery in which no requirements related to contrast differences apply and no manual decision needs to be made during postprocessing of new data.
This trend of datafication of migration management (Broeders and Dijstelbloem, 2015) faces important criticism related to responsibility and accountability of algorithms. Algorithmic tools to monitor human mobility allow for states to predict new migration flows with greater precision. On the one hand, this can be used to estimate or foresee arrivals and prepare more efficiently for large influxes of people. States can prepare and adapt their reception conditions, thus complying with their legal obligations (Beduschi, 2021). On the other hand, large-scale monitoring of mobility can contribute to further securitization of migration. States may be inclined to put measures in place to avert international migration or prevent the arrival of migrants and asylum-seekers, even when this is lawful under international human rights law (Beduschi, 2021).
Criticism also concerns proportionality. Mazzoli et al. (2020) were able to study features of migration beyond what is recorded by government organizations based on geolocated Twitter data. They collected information on migration routes, settlement areas, and spatial integration in cities. The use and combination of detailed big data sources risks bigger privacy infringements on already vulnerable and under-resourced populations (Eubanks, 2018). According to Dijstelbloem (2017), technologies to monitor human mobility are in the end political tools which need to be guided by ethical principles, solid legislation, periodical evaluations, and the checks and balances of experts and political and public debate. This case study assesses whether and how co-design of an algorithm to estimate the size of refugee camps by different stakeholders can improve responsibility and accountability of algorithms in several stages of the design.
Data and method
This research was part of a humanitarian action challenge aiming to stimulate cooperation between business and NGOs in developing innovative technological solutions for peace, justice, and humanitarian action. This challenge brought together three actors: the IOM, the leading inter-governmental organization in migration management; Elva foundation, a non-profit organization which supports informed humanitarian responses and stabilization efforts in conflict-affected regions; and Notilyze, a company which specializes in big data analysis. In total, the collaboration consisted of six people. The focus on developing an algorithmic tool was suggested by IOM. This organization initiated several collaborative pilot projects for improved capacity and resource planning in refugee camps.
The partners chose a co-design approach as they had experienced that many data-driven initiatives strand because of a mismatch between the algorithmic tool and the needs and expectations of its users and stakeholders. Earlier studies have found that humanitarian aid workers require IT tools to be easy to use and to understand (Kuchai et al., 2020). The partners therefore implemented the process, principles, and practical tools of co-design (Blomkamp, 2018) from the onset of the project: the design process of the algorithm went through several iterative cycles of analysis of issues relating to the delivery of emergency supplies, design of an algorithmic prototype, and testing and refining this prototype based on the needs of the involved actors. The principles maintained in this process were to take into account the needs and expertise of different stakeholders and aiming to develop an innovative solution. Stakeholders were asked to bring tacit knowledge and lived experiences into the design process. In two workshops and during bilateral meetings between partners, practical tools of co-design were used to imagine the desired solution. These included creative mapping of ideas and testing and discussion of the prototype. The consortium did not implement a specific design methodology such as value-sensitive design (Aizenberg and Van den Hoven, 2020; Friedman and Hendry, 2019; Zhu et al., 2018), or participatory design (Schuler and Namioka, 1993). Similar to how implementation of these methodologies in practice varies, the co-design approach implemented in this study evolved during the project which is reported in more detail in the next section.
In contrast to co-design principles, the data subjects in this case study—local actors managing the camps and refugees living in the camps—were not part of the consortium. The partners experienced practical and ethical barriers in involving them. Local camp managers were very busy managing day-to-day issues in the camps and the NGO was only able to reach out to them incidentally during the project. Furthermore, ethical questions arose as studies suggest that co-design can overburden refugees and actors supporting them, while the outcomes of participation for their personal situation remain uncertain (Dekker et al., 2021). For the case of Nigerian refugee camps specifically, the NGO voiced concerns of freedom in participation and voicing opinions in a conflict-affected area where Boko Haram is influential and denouncing western influences. As will be discussed in the conclusions section, we consider the lack of involvement of the data subjects and intended beneficiaries in co-design of the tool a major limitation which was only partly resolved by their interests being indirectly represented through IOM and the NGO.
Academic experts in data science and public governance who co-author this paper were external advisors to the project who provided feedback on an incidental basis. They acted as sparring partners to representatives of the consortium to prevent “groupthink”: premature concurrence between the partners which prevents creativity and innovation (Janis, 1982). At several moments during the project, they helped the partners to reflect on decisions in development and implementation of the algorithmic tool. The lead data analyst at Notilyze met with the academic expert on data science once or twice a month. Involvement of the public governance expert was more incidental and increased towards the final stages of the project when questions on implementation and use became prominent.
We took an action research approach to researching and developing the algorithm (Huxham, 2003; Stringer, 1996). This paper draws on actions and intentions of each of the actors collaborating in the project as logged in personal accounts and project outputs including project documentation and the prototype itself. This study reports on the full duration of the project between October 2018 and March 2020. This includes two three-day workshops with the partners (a kick-off event on 1–3 October 2018 and a follow-up workshop on 14–16 November 2018), several meetings between the partners and exchanges through phone and email. This action research approach helped to gain insight into the design process of an algorithm used for public service delivery and the details of the algorithm. These aspects of algorithmic governance are usually difficult to access for researchers (Kitchin, 2017).
The analysis is structured along the three stages of algorithm development as distinguished by Bruha (2000). For each stage we report how the process, principles, and practices of co-design were applied and how this led to changes in the algorithm. In the discussion of these results, we reflect on how co-design contributed to responsibility and accountability of the algorithm based on accounts of the involved partners. The three expectations raised in co-design literature were used as theoretical lenses in the analysis: we consider how co-design affected the development of the algorithm, cooperation and trust between the actors involved in the process and responsiveness to the needs of intended beneficiaries. We also reflect on instances where co-design could be strengthened.
Results
Data selection and pre-processing
At the start of the project, Elva and Notilyze drafted an initial plan to develop a forecasting algorithm which made better use of the camp censuses to estimate and predict the supplies needed in refugee camps. Data from camp censuses was available through IOM for refugee camps in Nigeria and Ethiopia. During the kick-off event, the plan was discussed and refined by the three partners in several creative workshops. These workshops focused on exchanging knowledge on the issues that were experienced by camp management. Actors specified their goals by reflecting on the “what” and the “why” of the project and by exploring the possibilities of working with the census data. Creative principles and several practical tools of co-design were applied (Figure 1). Navigating between the technical expertise of the data analysts and the practical goals from IOM and the NGO, the consortium decided that the needs of camp management would be leading in the project, rather than pragmatic considerations such as availability of data. This decision was inspired by exchanges of the NGO with several camp managers in which they specified their needs. These exchanges fostered responsiveness towards the users and intended beneficiaries of the algorithmic tool.

Co-design tools for creative mapping of processes and ideas used during the workshops in October and November 2018.
In January 2019, a first prototype of the algorithmic tool was developed, estimating needed supplies in refugee camps in Nigeria and Ethiopia based on several variables included in the census data. This tool structured existing census data and calculated the amount of required supplies based on international guidelines for refugee camps (e.g. the UNHCR WASH guidelines). The tool was also able to predict the needs in the next three months. Based on internal testing and discussion of the prototype and after consultation with the data science expert, the consortium decided that a tool that would work independently from census data was desirable.
The partners started to explore alternative big data sources which could be used to estimate the sizes of refugee camps. NGO Elva advised the analysts at Notilyze on the choice of data sources. Mobile phone data and social media data were considered, but these were not available, nor deemed desirable for this project. Values relating to algorithmic responsibility such as privacy were discussed and taken into account. In choosing aggregate satellite data over individual Wi-Fi or social media data, a concession was made in accuracy of the data in order to safeguard privacy. Specifically for Nigeria, the consortium considered to include big data sources on terrorism, such as the Armed Conflict Location & Event Data Project Database, the Global Terrorism Database and data on natural disasters. However, Elva pointed out that these data sources depend on specific causes of migration and are less generally applicable. Eventually, the partners decided on the use of satellite imagery. A vast amount of global satellite imagery is available nowadays—often open source. An object detection algorithm to detect tents would be able to indicate the size of the refugee camp and needed supplies based on data such as the total camp area, the number and the sizes of tents (Figure 2).

Satellite images of a refugee camp on March, June, and December 2015, indicating growth of the camp.
Along with the decision to use satellite data, the consortium shifted their focus towards refugee camps in Nigeria where census data on a larger number of camps was available to train the algorithm. IOM was able to provide census data for 300 Nigerian refugee camps with on average 2700 refugees per camp. These censuses took place on a monthly basis between 1 June 2015 and 20 October 2018. As none of the partners possessed satellite imagery of Nigeria, they explored open sources of satellite imagery. Satellite imagery from Google Earth was available for 110 of the 300 Nigerian camps. Images were taken on a monthly basis, with slight differences in frequency per location. The algorithm compared images taken at different points in time to observe changes in the camp populations.
Aside from limited availability of satellite imagery, limitations also existed for object detection based on these images. In some camps, refugees live in types of housing which could not be visibly distinguished from regular buildings or homes. In other cases, the borders of the camp were unclear. For example, in case of Figure 3, the IOM census data indicated the presence of three refugee camps. On the left part of the image in blue, two structured camps are visible. The third camp at the yellow marker indicates tents in between other buildings. It is unclear whether these belong to a third camp and what the borders of this camp are. This creates issues with manually labeling tents in the data used to train the algorithm. If the borders of a camp are already unclear to the human eye, it is even more difficult for a computer model to distinguish them. The data analysts and NGO Elva decided to exclude these camps from the training data set, as they lacked local knowledge on the borders of the camps. In a subset of 56 camps, tents could be visibly located on the satellite images (Figure 3). This subset of refugee camps was used to develop the algorithm.

Refugee camp with unclear borders.
After this selection process, the data needed to be preprocessed. This involved preparation of the satellite images for analysis. For example, the images needed to be resized so that they had the same zoom levels. Here the expertise of the data analysts was most important. Their analysis showed that 0.10 meter per pixel gave the best results for training of the algorithm. They also suggested that contrast stretching of the images would help to enhance the performance of the algorithm. This procedure made the tents better recognizable for the algorithm (Figure 4).

Image slice before and after contrast stretching.
Lastly, it was necessary to pre-define the borders of each camp and to make sure that the whole camp is visible in one image, including some margin for growth. This was done by the data analysts through visual inspection, using the geographic location of the camp that was shared by NGO Elva. Several satellite images of each camp on different moments in time during the period of study were included in the dataset: up to 10 for each camp. The data analysts selected the image closest to each census date to train the algorithm.
Training and machine learning
In the stage of training and machine learning, the goal of the project and roles of the different partners were clear, and the partners only collaborated on an incidental basis. After consulting with the academic advisor on data science to discuss specific design choices, the modeling phase was finalized and the data analysts proceeded with one of the most recent methods for image recognition and object extraction from images: Faster R-CNN (Faster, Region-based Convolutional Neural Network), developed by Ren et al. (2016). A neural network is a machine learning algorithm that tries to mimic how our brains work. The neural network algorithm can be trained to make new observations based on features of an object that it receives as input.
In this case, the algorithm was trained to recognize tents based on a training set consisting of 38 satellite images. The remainder of the images was kept for testing the algorithm after development. Tents on the training set of satellite images were labelled manually by the data analysts after visual inspection. This was not an easy task as satellite images do not contain any information on the nature and function of buildings. The expertise of the NGO was of help: Elva advised to label the smaller tents and buildings in the refugee camps as these are likely to be used to house refugees. Their input helped increase the accuracy of the algorithm—enhancing algorithmic responsibility.
Training of the algorithm took several machine learning iterations in which the algorithm self-taught to recognize tents, which are basically sets of white pixels grouped in a rectangle. Each tent had a slightly different pixel composition, depending on the angle of the sun, the arrangement of the tents and spacing between the tents. Therefore, the algorithm produced a probability (p) of an object on the image being a tent, ranging from 0 to 1 (Figure 5). In this stage, the expertise of the data analysts in training the algorithm helped producing increasingly better estimations. In addition, decision rules were formulated on what amount of false positive or false negative classifications is deemed acceptable. IOM and Elva provided important input on whether over- or under-estimating the amount of needed supplies would be more problematic. After discussion between the partners on values of algorithmic responsibility such as accuracy and fairness, an object was labelled as a tent when its detection probability was higher than 0.5.

Detection of a tent on a satellite image.
The estimations of the algorithm improved throughout several training iterations. After several rounds, one of the analysts noticed that overfitting of the model occurred. The algorithm started to wrongly recognize Google Earth buttons on the satellite images as white tents (Figure 6). Focus of the algorithm on this specific bias in the training data negatively impacted the performance of the algorithm on new data. In response, the data analyst manually corrected these labels from (false) “positives” to “negatives” so the model would learn from its mistakes in further training iterations. This prevented a bias in the training data from compromising the accuracy of the algorithm.

Overfitting of bias in the model.
After pre-processing and training, the results of the detection algorithm were validated based on the manually labelled test data. The algorithm was able to correctly detect 36,687 out of the 46,395 tents in the test dataset, amounting to 79% of the total. It also found 12,431 false positives, objects which the analysts did not assign visibly as being tents (Figure 7). However, manual labels might not coincide fully with the actual purposes of the tents and buildings and it proved difficult to assess the exact performance of the algorithm, even after taking along the input of the NGO.

Detection of tents in a refugee camps, showing true positives (green); false positives and double detections (orange); and false negatives (red).
After validation, the analysts published the code of the object detection algorithm in the open source repository GitHub 1 . This established transparency and made the algorithm available for re-use and improvement to actors outside the consortium.
Post-processing
In the third and final stage of development, the algorithm was used to make estimations on the total dataset of 56 refugee camps. The tent detections were used as an input for a linear regression model to estimate the number of refugees in a camp. This choice for a linear regression model was made after discussion with the partners which raised awareness of interpretability of both the model and the results. Values relating to algorithmic accountability informed this choice. Linear regression was the most straightforward method to generate an estimate of the population of refugee camps. The performance of the algorithm was expressed as the difference between the estimated and actual number of people in the camps according to the IOM censuses.
As the independent variable for regression, different variables were considered, all of them incorporating information from the object detection model in a slightly different way. Examples of such variables are the number of tents, the total tent area and derivatives of those. The data analysts assessed the performance of these different variables to estimate the number of refugees in a camp. Although a derivative of the total surface covered with tents performed best, the analysts chose to work with the second-best predictor, the raw total surface area. This choice was made after discussion with the NGO and the governance expert to ensure model interpretability for actors who are managing the camps and will be using the algorithmic tool. The obtained estimator reads
The analysts found that all predictions on camps included in the IOM censuses fell within this interval. Yet, this confidence interval is quite large and there was quite some deviation from the predicted population by the algorithm compared to the actual observations. Input from IOM and Elva helped explain this. They explained that in many refugee camps, refugees also live in other types of shelter and without shelter. In their censuses, IOM made rough estimates of the percentages of refugees living without shelter. This accounted for a part of the differences between the estimated and the actual numbers of refugees. NGO Elva added that the assumption that tents are fully occupied is a reasonable one in this region, as resources are scarce, and a large number of refugees is seeking shelter. Differences could also be accounted for by the expertise of the data analysts: In some cases, the dates of the census and satellite image were further apart or the exact census date was unknown, resulting in time intervals in which the camp population may have changed. The number of days between the date of the Google Earth image and the date of the camp census was added as an explanatory variable to account for changes in size of the camp during the period between both dates. These suggestions based on different types of expertise were used for further improvement of the algorithm.
Discussion
We presented how an intergovernmental organization (IOM), an NGO (Elva) and a data analysis company (Notilyze B.V.) collaborated to design an algorithm to estimate the sizes of Nigerian refugee camps in order to better manage the delivery of emergency supplies. In different stages of design—data selection and preprocessing of the data; training and machine learning of the algorithm; and postprocessing of the algorithm—input from the different actors was integrated in the algorithm through the process, principles, and tools of co-design. For each stage, we now reflect on how these three aspects of co-design were implemented, and how this influenced algorithmic responsibility (including values such as accuracy, privacy, and fairness) and accountability (including values such as transparency and interpretability). We also reflect upon instances in which co-design could be strengthened and the conditions required for this.
In the stage of data selection and pre-processing, the consortium created a plan and discussed the goals of the project in several workshops and meetings using co-design tools. A first prototype resulting from the first design iteration which included only census data was deemed insufficient by the partners. It did not offer much more information than the information that was already available through censuses. In a second iteration of design, the partners decided to explore the use of satellite data for better estimations of the populations of refugee camps. Expertise from the NGO and the data analysts contributed to selecting and preparing the data for analysis—while considering values such as effectiveness, privacy, and fairness. The partners experienced an inherent tension between accuracy and privacy in the choice of data sources. The use of fine-grained Wi-Fi or social media data was considered, but the partners deemed this too privacy invasive. Instead, they chose data on an aggregate level which were already publicly available. Although all data sources on refugee mobility are inherently privacy-encroaching, a balance was sought in the amount of surveillance needed to service refugee camps.
In the stage of training and machine learning of the algorithm, the goal of the project and roles of the different partners were clear, and the partners collaborated on an incidental basis. Fitting the principles of co-design, partners each contributed to the design based on their own expertise and experience. In this project, the data analysts recognized an obvious example of overfitting of the algorithm during training. However, oftentimes it is unclear whether biases are present, and overfitting can go unnoticed. Input from practitioners involved in co-design can help to signal overfitting based on more subtle biases in the training data. Furthermore, by providing representative training sets, they can help to ensure a longer training time without the risk of overfitting. Unfortunately, the limited availability of open source satellite imagery remains an important limitation for pilot projects such as this one.
In the stage of post-processing, input from the partners led to choosing a simple predictor of the camp population based on the number of tents so that actors who would be using the algorithmic tool would understand how outcomes were generated. Only when users understand how a tool functions, they can use it for fitting purposes and they are able to use their discretion in interpretation of the results. Also, insight in the development of the tool helped users to bear in mind that the estimations are not perfect, preventing automation bias. Collaboration in co-design led to an understanding of the algorithm and how outcomes were generated—contributing to accountability within the scope of this project.
In addition, disclosure of the source code of the object detection algorithm created transparency beyond the partnership. Others can review the code, make use of it and continue to improve it. However, as Kroll et al. (2017) argue, disclosure of the source code is only a first step towards algorithmic accountability. Not all people—especially not those in a vulnerable situation—will be able to scrutinize algorithmic tools used in public governance. In cases such as the Dutch SyRi algorithm to detect social welfare fraud and predictive policing algorithms, privacy watchdogs, human rights organizations, and researchers have scrutinized algorithms and started court cases on behalf of others (Meijer and Wessels, 2019; Van Bekkum and Borgesius, 2021).
At several moments during development of this algorithm, the partners experienced that co-design fell short. In pre-processing of the data, the partners lacked local knowledge from actors managing the refugee camps in Nigeria. The NGO consulted with camp management, but they were not able to invest much time in this project. With local knowledge, the data analysts would have been better able to define the borders of each camp and to identify the types of shelters which house refugees in the stage of pre-processing the satellite images. Also in the phase of training and machine learning, additional input from local partners involved in camp management would have helped in formulating decision rules on acceptable amounts of false positive or false negative classifications and thereby over- or under-estimating the population of the camps and resources needed. In the stage of validating and post-processing, quite a large share of unexplained variance in the model remained, making it less useful for actors involved in delivery of goods and services. This unexplained variance might decrease when accounting for the percentages of people living in other types of shelters or without shelter. Here again input from local partners would have helped. Yet, the stakeholders experienced that this imperfect model still added value to current camp management. They intended to use it as an additional tool—as opposed to a replacement of—decision-making by experienced practitioners with a thorough understanding of local dynamics and sensitivities. Eventually, the model based on satellite data was combined with census data from IOM to arrive at better estimations.
Inaccuracy of the model could also be a source of information in itself: if the model structurally underestimates the number of people in a camp based on the observed tents, this would indicate a shortage of tents and supplies that can be addressed. A more extensive evaluation of the tool has yet to be completed. At the start of the project IOM, Elva and Notilyze agreed that the model would be evaluated by implementing it for two camps. However, by the time the model was ready to be tested in March 2020, other pressing matters arose for camp management, IOM and the NGO. Their focus and resources shifted towards other tasks and the implementation has been postponed.
Conclusion
This paper assessed the effects of co-design as a method to ensure responsibility and accountability of algorithmic governance. Calls for ethically and socially responsible AI often fail to provide solutions beyond stressing the importance of transparency, interpretability, and fairness (cf. Aizenberg and Van den Hoven, 2020). Co-design has been suggested as an alternative to ex post solutions such as transparency and regulation of algorithms (Lepri et al., 2018). The implementation and effects of co-design have however not yet been studied in the context of the design of algorithms for public governance where attention for public values is essential.
Our case study of co-design of an algorithm to support supply management of Nigerian refugee camps demonstrates that co-design supported responsible algorithmization in the choice of big data sources and through preventing reinforcement of biases (Meijer and Grimmelikhuijsen, 2020). Accuracy and privacy were considered in the design, but inherent tensions remain in designing surveillance of populations. Furthermore, co-design helped in creating an algorithm that was understandable to its users. By being part of the design process from the onset, it was clear to those who would be using the tool what data sources the algorithm uses as input and what indicator it uses to estimate the number of refugees in a camp. They were able to decide on suitable purposes for the tool and use their discretion in interpretation of the results. This enhanced algorithmic accountability.
Interpretability, along with fairness, accountability and transparency of algorithms, are an active research area nowadays (cf. Rudin et al., 2021). Many computer scientists are either trying to pry open black-box models, or they strive for designing inherently interpretable and accurate machine learning algorithms. Our findings support the hypothesized effects of co-design (Blomkamp, 2018) and align with other studies on value-sensitive design (Aizenberg and Van den Hoven, 2020; Zhu et al., 2018), human-centered design (Baumer, 2017) and participatory design (Whitman et al., 2018) of algorithms and AI: in development of this algorithm, co-design with stakeholders led to a more accurate and more responsible tool. Furthermore, at different moments in the design of the algorithm, partners were responsive towards the intended users and beneficiaries (camp managers and refugees) although these groups were not directly represented in the consortium. Finally, co-design fostered cooperation and trust between the partners which prevented a mismatch between the functionalities of the tool and the purposes for which it would be used in practice.
It is yet unclear whether this eventually leads to better servicing of refugee camps. First, implementation and scaling of this pilot project proved difficult. Embedding of the tool in the routines of the organizations collaborating in camp management has suffered delays due to budgetary shortfalls and the COVID-19 pandemic. Earlier research pointed out that scaling of public and social innovation requires prioritization and a profound change in beliefs and routines within the organizations involved (Westley and Antatze, 2010). Second, access to better data and algorithmic support do not always convince governments and their partners to change action. Earlier studies provide examples in which evidence from satellite imagery did not result in better aid for affected populations (Hasian, 2016; Raymond et al., 2013).
The management of refugee migration presented a critical case in which responsible and accountable algorithmic governance is essential. Our findings may have broader relevance for algorithmic governance in other domains of the public sector where responsibility and accountability of algorithms are at stake. This for example includes the governance of public security and public health where unfair and opaque algorithms have been used (Ferguson, 2017; Obermeyer et al., 2019). Also in development of algorithms for these domains of the public sector, co-design could provide stakeholders with opportunities to steer towards more responsible and accountable algorithms, in addition to setting standards and forcing regulation.
An important limitation of our study was that intended beneficiaries of the tool—refugees living in the camps—were not part of the consortium and the co-design process. Including affected communities in the design of algorithms is suggested in earlier studies into big data analytics for humanitarian aid (Mulder et al., 2016). Through their lived experiences, they can offer new perspectives for design. Co-design by intended beneficiaries ensures legitimacy as it helps developing an algorithm which serves their needs (Meijer and Grimmelikhuijsen, 2020). Including these groups also supports accountability by making the algorithm accessible and understandable to those who are subject to it (Kroll et al., 2017). Lastly, co-design can also be emancipatory: refugees can reclaim the technology in their search for safe passage and shelter. For example in case of this study, knowledge of availability of spaces in refugee camps could support migration decision-making. However, similar to other studies in which refugees are usually data subjects rather than data experts (Masso and Kasapoglu, 2020), we experienced barriers in involving this group. It proved difficult and unethical to ask vulnerable populations to invest their time in a demanding long-term pilot project with yet uncertain outcomes for their personal situation. In this project, their absence was only partly compensated through input from the NGO and IOM and consultations with local camp managers. Future research should focus on developing flexible arrangements for involvement in co-design under difficult circumstances and in rapidly changing conditions.
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study received funding from the City of The Hague Humanitarian Action Challenge under grant agreement number 18.238 and Grand Challenges Canada under grant agreement number R-HGC-POC-2007-34904.
