Abstract
The social services sector, comprised of a constellation of programs meeting critical human needs, lacks the resources and infrastructure to implement data science tools. As the use of data science continues to expand, it has been accompanied by a rise in interest and commitment to using these tools for social good. This commentary examines overlooked, and under-researched limitations of data science applications in the social sector—the volume, quality, and context of the available data that currently exists in social service systems require unique considerations. We explore how the presence of small data within the social service contexts can result in extrapolation; if not properly considered, data science can negatively impact the organizations data scientists are trying to assist. We conclude by proposing three ways data scientists interested in working within the social services sector can enhance their contributions to the field: refining and leveraging available data, improving collaborations, and respecting data limitations.
Introduction
Data Science (DS) is broadly situated at the intersection of applied mathematics, computer science, and business analytics. The multidisciplinary nature of DS means its scale and scope are fluid, with increasing applications across all sectors of society. One such deployment is DS for Social Good, in which data science strategies are applied to social problems. DS for Social Good tackles an array of societal challenges such as algorithmic fairness (Dwork et al., 2012), human trafficking (Borrelli and Caltagirone, 2020), child welfare (Lanier et al., 2020), and homelessness (Purao et al., 2019). This growing body of literature reveals potential for the application of DS to help understand and support decision-makers working to tackle important societal issues. Despite this potential, significant limitations exist related to the applicability of DS methods in social welfare contexts. Ethical dilemmas and considerations in DS applications for social contexts are well-documented across domains such as child welfare, cash assistance, criminal justice, and anti-human trafficking (Brown et al., 2019; Goldkind et al., 2018; Keddell, 2019; Konrad et al., 2023; Rudin, 2019; Završnik, 2019). In many social contexts, data is frequently on the order of small data. Small data, while it can refer to the absolute size, more broadly describes the overall scale, that is, the breadth and depth of what the data represents and how it can be used (Borgman, 2017). Small data are also characterized by the notion that they are generated to answer a specific set of questions (Kitchin and Lauriault, 2015). In the context of social services, data collection often occurs within smaller communities and is driven by a set of pre-defined research questions. However, there exist ample opportunities for the inverse, where existing data can generate new research questions (Borgman, 2017). DS methods support opportunities to explore data in new ways, generating new information and research directions (Santiago and Smith, 2020). Despite this potential, there exist considerations for the use and limitations when applying DS methods to small data in social contexts, and we discuss these below.
Data Science for social service applications
As the accessibility of DS has increased, so has the practice of using data-driven approaches to tackle social issues such as those found in the social services domain. Social service is an umbrella term used in the United States to define the safety net programs provided to children, families, and individual citizens. Sometimes referred to as social protection programs, they offer social goods such as housing, meals, refugee resettlement, services to migrants as well as child protective services to ensure safety for children and their families. Social service programs may be operated by government, non-governmental organizations (NGOs), or grassroots associations. These programs typically have limited resources, face a diverse set of operational challenges, and are unable to afford formally trained DS expertise. Furthermore, social services include a multitude of stakeholders such as public or private funders, government contractors, clients, and individual donors who are interested in the success and evaluation of these programs. DS can be leveraged to understand processes, identify bottlenecks and provide insights for improved utilization of limited resources. Social services are traditionally knowledge-driven, where decisions are supported by policy familiarity, funding requirements, and practitioners’ expertise. In contrast, DS methods are data-driven, where data is used to uncover patterns and information for knowledge discovery (Bopp et al., 2017). Data-driven methods have complemented the knowledge and mission-driven work of social services for many years and, with careful application, have supported and improved the decision-making process of these organizations.
DS expands the ability to make sense of data and complements traditional analytical methods to improve the information gained (Goldkind and McNutt, 2019). A large motivation for the use of DS in social services stems from the potential benefit of operational and programming improvements, as demonstrated in fields such as healthcare (Stephen et al., 2019). Yet, caution is warranted in using DS in the social sector, namely oversimplifying the inherent complexity. Existing DS methods may mute the nuances and unique characteristics of the data, such as aspects of a client's individual circumstances (Brown et al., 2019). Aggregating data on a community-wide level rather than an individual-level can reduce the comprehensibility of insights for these programs and the clients they serve. As in any sector, data and DS methodologies are conduits for knowledge discovery—providing situational understanding rather than instantaneous solutions. For example, while developing a dashboard or predictive model may develop clientele insights, this acumen alone does not directly translate to an improvement in the quality or quantity of the services provided. Rather, prior to deployment, the validity and utility of insights gleaned from DS must be carefully considered. It is important to note that considering DS methods are data-driven, the outcomes of these methods are only as good as the data used, and therefore, scrutinizing the reliability of the data prior to deployment is important. Facilitating changes in decision-making processes (data-driven or otherwise) have the potential to be problematic and should be examined closely.
The robustness of quantitative results obtained in social services highly depends on data context, structure, and quality. The use of DS in the social context requires collaboration between data scientists and social services stakeholders to consider the potential unintended consequences of DS applications. Such critical conversations lead to DS applications that are responsible, equitable, and importantly—appropriate in real-world contexts. The social services sector provides assistance and resources to marginalized populations and precarious communities—it is vital for both social services professionals and data scientists to consider the effective and ethical dimensions of DS use, its related implications, and possible unintended consequences.
The dangers of extrapolation in small data
DS can be an invaluable tool for creating and supporting more effective and efficient practices for social services. However, even the most advanced methods may struggle to detect and assess the individualistic and humanistic characteristics of social service data. The difficulty in collecting and quantifying nuanced information, such as dimensions of a client's social interactions, feelings, and life experiences, means that some data that impact social service programs are not operationalized. The inability to capture this type of information limits inferences that can be made using DS methods. Through careful consideration of the characteristics of small data, generalization, and extrapolation can be mitigated, and informative analysis for decision makers produced.
The performance and depth of insights gleaned from DS methods are linked to both the size and quality of data. Data quality can be characterized by how clean, objective, and consistent data are (Kitchin and Lauriault, 2015). Compared to big data, small data requires a higher level of quality as its characteristics may exacerbate and amplify erroneous conclusions. For example, in the author's experience working with an NGO intercepting human trafficking, the completeness of collected data was found to be inconsistent (i.e. substantial missing or incomplete data). Although missing data is secondary to an organization's humanitarian mission, for the data scientist, understanding blank data fields presents invaluable information and impacts possible conclusions. While research interest in improving DS to applications on small data is growing (Chahal et al., 2021), we caution against the use of these methods in the social services context, as there are critical considerations regarding how these data should be managed and utilized. For example, algorithmic decision-making in child welfare systems has been criticized as inappropriate use of DS (Ho and Burke, 2022).
Small data are prone to extrapolation. Failing to recognize this may result in generalized information used to make decisions that can negatively impact those the programs are designed to serve. In addition to data extrapolation, DS experts should carefully consider any algorithmic bias that may be produced and both the short and long-term tractability of models and results.
As data scientists, we need to be careful not to generalize analysis before considering its implications. For these reasons, decisions related to operational and program changes cannot solely rely on the results of a DS model but need stakeholder guidance and input early and throughout the modeling process. Through careful selection of DS methods, we can better reduce the unintended consequences of applying these methods in the social services domain.
Considerations moving forward
There are multiple ways to use and apply small data appropriately: predictively, prescriptively, and descriptively. Each offers a contribution to social services. While these data strategies offer exciting possibilities, ethical considerations can balance the possible unintended consequences. It is important for practitioners to remember that sometimes the simplest solutions are the best solutions. DS applications offer unique insights but need to be tempered by the socio-cultural context and societal structures that often impact those in social services.
It is vital for social services professionals in these contexts to develop a basic understanding of the methods being used. It is equally important for data scientists working on these problems to have a firm understanding of the domain context. This collaborative partnership of domain experts, social scientists, and data scientists ensures the appropriate use of methodologies for social services.
Refining and leveraging available data
The volume, quality, and storage of data in social services are generally reflective of the size and resources available to each organization. The barriers to effective data management and data use are significant. However, striving for improvement across the field of social services is a useful goal. The collection of more data is an obvious request by many data scientists; however, in a resource-constrained environment such as social services, this is typically difficult for budgetary reasons, as has been noted (Abazajian, 2022).
Despite the challenges related to increasing the volume of data collected, existing data can be leveraged. One way is to refine and improve data documentation. Data documentation in the most basic sense, focuses on describing information such as variable units. However, providing more detailed information regarding why a data point is collected can benefit analysis (Rasmussen and Blank, 2007). Although data documentation likely exists in an organization, data elements that are beneficial to data scientists are often not considered. For example, social services programs make frequent refinements to their programmatic models, changes in their service delivery systems, data collection processes, or definitions used. Carefully documenting these changes and when they occur can help improve analysis and create opportunities to explore new research questions and directions, as previously discussed. Existing data can further be improved through ongoing feedback loops between the data generated, those collecting the data, and doing the data work (e.g. those in administrative, clerical, and data entry positions) (Møller et al., 2020). In addition, while organizations do collect similar data, such datasets are often disparate. Nuances of the data and its collection processes may cause bias when comparing data across organizations. Improving the data documentation process and elements included could help to mitigate bias and allow for a more equitable comparison across organizations.
Improving collaborations
To prescribe actionable insights, it may take time for the volume of data to accumulate sufficiently. However, this doesn’t mean data scientists cannot play an early role. Assuredly there is much value in the continued partnership between social service professionals, social scientists, and data scientists. Data collected by organizations are often limited to collecting data specified by their funder(s) and current goals rather than an awareness of details that could be relevant for future analysis. Programmatic delivery models evolve over time, and building commensurate data analytics capacity from the beginning by involving data people can improve the overall data life cycle and ensuing analysis. As is true for any DS application, considering the domain context is crucial in social service applications. Working alongside social service practitioners and social science researchers enhances the integration of domain knowledge and theory, allowing for improved data and domain-specific insights.
Understanding the population of interest can be informed locally as well as on a larger scale through data sharing and collaboration agreements. DS researchers could leverage collaborations within and between organizations to enhance the already existing, practitioner-collected data and help operationalize it. While considering privacy, pooling data with other agencies serving the same population gives more data and increases the information gained. However, the analysis must be carefully interpreted as variations across definitions and collection practices can impact the interpretation of results. As discussed earlier, data documentation can be a catalyst for improving an organization's analysis. Data documentation can also improve analysis within collaborations as it helps alleviate some of the data variation and privacy concerns.
Collaborations play an integral role within the social services sector and can occur in multiple ways to address the needs present. Together, data scientists, social scientists, and domain experts can (i) support advocacy for greater investments in data infrastructure in the social services sector; (ii) build more accessible data-sharing tools so that this sector can pool data resources, and (iii) implement cross-training so that social science and humanities researchers become more conversant in computational thinking and data science researchers become more aware of the possibilities of complimenting their methods with stories and qualitative data and methods.
Respecting data limitations
Regardless of the quality and volume of data, an understanding is needed to recognize that data is limited in what it can reveal. Patterns may be weak, and caution is needed when drawing conclusions. Limitations of the analysis (such as bias, weak signal, etc.) should be made known to social service practitioners so they can decide how to best proceed within their organization's context. Therefore, it is vital for social services professionals and data scientists to consider the appropriate use of DS, its related implications, and the possibility of unintended consequences.
The many complexities of social services can limit the interpretation of results, such as variability across service delivery models, client populations, and governing policies and regulations. Analysis and considerations for behavioral interactions of these aspects in social services, both within a program as well as between other entities working in this space should be considered as these elements influence results. For example, while data from a local provider may gather insights regarding a certain program aspect, data may be limited in understanding and capturing the larger systemic influences of a program such as policy, program perceptions, and agency behaviors. Model selection criteria should be case-by-case dependent and need to critically examine the different social elements and objectives when selecting a model. Data scientists should be equipped to offer alternative social-centered approaches when limitations are present such as qualitative methods, human-centered design, and participatory groups.
Conclusions
There is no doubt that DS can be an invaluable asset for the social sector; however, its applicability must be carefully considered to avoid unintended consequences. This brief overview proffers insights on ways in which practitioners and data scientists can partner to improve the use of DS on small data with a focus on social services. We offer three ways data scientists interested in working within the social good sector can enhance their contributions to the field: refining and leveraging available data, improving collaborations, and respecting data limitations. Although current data for social services may be limited, working to refine, and leverage available data can improve not only current analysis efforts but additionally, lays the groundwork for improved analysis in the future. Data-driven work of data scientists complements the knowledge and mission-driven work of social service practitioners. Improving the collaborations between stakeholders makes the most of the data and these different approaches. It is important for data scientists not to fall for the lure of applying the most state-of-the-art methods but rather emphasizes finding the best solutions that meet the objective of the inquiry. The presence of small data in the social sector may not change, however, data scientists can help change how the data is utilized and support the efforts of social service organizations.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The two authors from Worcester Polytechnic Institute were supported by the National Science Foundation under grant no. CMMI-1935602.
