Sage Journals: Discover world-class research

Abstract

The spread of modern digital technologies, such as social media online platforms, digital marketplaces, smartphones, and wearables, is increasingly shifting social, political, economic, cultural, and physiological processes into the digital space. Social actors using these technologies (directly and indirectly) leave a multitude of digital traces in many areas of life that sum up an enormous amount of data about human behavior and attitudes. This new data type, which we refer to as “digital behavioral data” (DBD), encompasses digital observations of human and algorithmic behavior, which are, amongst others, recorded by online platforms (e.g., Google, Facebook, or the World Wide Web) or sensors (e.g., smartphones, RFID sensors, satellites, or street view cameras). However, studying these social phenomena requires data that meets specific quality standards. While data quality frameworks—such as the Total Survey Error framework—have a long-standing tradition survey research, the scientific use of DBD introduces several entirely new challenges related to data quality. For example, most DBD are not generated for research purposes but are a side product of our daily activities. Hence, the data generation process is not based on elaborate research designs, which in turn may have profound implications for the validity of the conclusions drawn from the analysis of DBD. Furthermore, many forms of DBD lack well-established data models, measurement (error) theories, quality standards, and evaluation criteria. Therefore, this special issue addresses (i) the conceptualization of DBD quality, methodological innovations for its (ii) assessment, and (iii) improvement as well as their sophisticated empirical application.

Keywords

data quality digital behavioral data designed data found data error framework

Introduction

The spread of modern digital technologies, such as social media online platforms, digital marketplaces, smartphones, and wearables, is increasingly shifting social, political, economic, cultural, and physiological processes into the digital space. Social actors using these technologies (directly and indirectly) leave a multitude of digital traces in many areas of life that sum up an enormous amount of data about human behavior and attitudes. This new data type, which we refer to as “digital behavioral data” (DBD), encompasses digital observations of human and algorithmic behavior, which are, among others, recorded by online platforms (like Google, Facebook, or the World Wide Web) or sensors (like smartphones, RFID sensors, satellites, or street view cameras) (Wagner et al., 2025). By including “algorithmic behavior,” we acknowledge that in some (or many) instances, this observable behavior is due to non-human agents, e.g., bots. This is in line with the term “machine behavior”, introduced by Rahwan et al. (2019, p. 477), referring to intelligent machines as a class of actors with particular behavioral patterns and ecology. The term DBD is closely related, but not identical, to the term “Big Data” and shares most of its characteristics, such as velocity, volume, value, variety, and veracity (Fröhling et al., 2023; Kohne et al., 2021), and, as such, most of the challenges in terms of analyzability.

DBD presents new methodological opportunities for understanding social and political processes and transformations, from global networking and political polarization to analyzing interaction patterns in digital environments. It also enables us to study the major changes in the private and public spheres that digitization is bringing about, such as the effects of social media and AI on democracies, social cohesion, or individual well-being. In addition to studying new social phenomena, DBD can also help to enhance our understanding of classic social phenomena such as social movements and collective action (Tufekci, 2017) or health behavior and mental well-being (Chancellor & De Choudhury, 2020) by providing new types of data.

However, studying these social phenomena relies on data that satisfies specific quality requirements. In survey research, there is a long tradition of data quality frameworks that mainly focus on intrinsic requirements of survey data (cf. the Total Survey Error framework by Groves et al., 2009) or the broader Total Survey Quality framework (Biemer, 2010). Intrinsic data quality properties refer to inherent attributes of the data itself and are independent of context or usage, e.g., accuracy, validity, or reliability. Due to the heterogeneity of DBD, multiple error frameworks that focus on inherent attributes of specific DBD sources exist, such as the Total Error Framework (TEF) by Amaya et al. (2020), the Total Error Frameworks for Found Data by Biemer and Amaya (2020) the Total Error Framework for Digital Traces of Human Behavior on Online Platforms (TED-ON) by Sen et al. (2021) or, more specifically, the Total Twitter Error (Hsieh, Ching Chi & Murphy, 2017). Bosch and Revilla (2022) introduced an error framework for metered data (in the following, we will use the term “web tracking data”, see also the contribution to this volume by Adam et al., 2024). In contrast, the extrinsic perspective evaluates the data quality based on context-specific criteria, primarily focusing on the data’s fitness-for-use or how well the data meets the needs of users or research objectives. For instance, the FAIR criteria, referring to the usability of data in terms of findability, accessibility, interoperability, and reusability (Wilkinson et al., 2016), are related to extrinsic data quality properties.

The special issue at hand aims to provide deeper insights into current research activities with respect to conceptualizing and empirically investigating data quality issues within digital behavioral data. Specifically, our call invited contributions focusing on the conceptualization of DBD quality, methodological innovations for its assessment and improvement, as well as their sophisticated empirical application. This special issue comprises nine valuable contributions addressing various aspects of DBD data quality. While most of the following contributions adopt an intrinsic perspective, there are a few exceptions, e.g., Dahlke et al. (2023) or Yu et al. (2024), which also include an extrinsic data quality perspective.

The first group of papers focuses primarily on conceptual frameworks for assessing DBD. Daikeler et al. (2024) systematically review 58 existing data quality frameworks to assess their applicability to modern digital social science data. Schneck and Przepiorka (2024) introduce a new comprehensive framework, the Total Error Framework for Digital Behavioral Data (TEF-DBD), employing meta-dominance analysis to identify and quantify error sources within DBD.

A second set of contributions explores methodological innovations and their sophisticated empirical applications. Antoun and Wenz (2024) conduct an accelerometer-based study, which provides high-resolution, passive data on physical activity, offering advantages over traditional self-reported measures. However, nonparticipation bias remains a critical challenge affecting data quality. While the previous paper addresses the representation arm of the TSE framework, the paper by Cernat et al. (2024) focuses on the measurement side. Specifically, it addresses the measurement quality of digital trace and survey data using the MultiTrait MultiMethod (MTMM) model, focusing on smartphone usage behaviors. They challenge the assumption of digital trace data’s inherent superiority over self-reported measures, demonstrating that quality varies significantly across methods. Similarly, Wenz et al. (2024) also focus on passively collected digital behavioral data (DBD) from smartphones. This study evaluates the alignment between self-reported smartphone use and DBD across three key dimensions: amount of use, variety of use, and activities of use.

Adam et al. (2024) evaluate the quality challenges of collecting web tracking data, e.g., individual media consumption. Their contribution highlights issues related to sampling, validity, device diversity, long-tail consumption, transparency, and privacy. They introduce WebTrack, an academic solution enabling enhanced content-level analysis, significantly improving data quality by capturing a broader spectrum of digital behaviors (the software was further maintained and developed by GESIS. For more information introducing the service, see Mangold et al., 2023, and https://gesis.org/webtracking).

The paper by Dahlke et al. (2023) assesses the data quality of web-scraped data against the assumption that all web content is equally accessible. This study challenges that assumption by systematically examining biases in the accessibility of web content collected from URL-logged browsing data. In a similar vein, Grigoropoulou and Small (2024) investigate machine-generated data from private companies that present new opportunities for social science research, but concerns remain about the accuracy and reliability of such data. This study evaluates the data quality of business location records from SafeGraph, a widely used private-sector dataset, focusing on financial institutions’ classification and accuracy. The findings reveal significant classification errors, including mislabeling of businesses, unidentified closures, and duplicate records, which systematically affect the dataset’s validity.

Finally, the paper by Yu et al. (2024) is concerned with the intrinsic and extrinsic quality of datasets used for training machine learning models in hateful communication detection, which is a critical yet underexplored issue. This systematic review evaluates datasets developed over the past decade, focusing on their inclusivity, representational accuracy, and the biases embedded in their curation. The study reveals that existing datasets disproportionately focus on specific target identities while underrepresenting others, such as individuals with disabilities or older adults. Additionally, the review highlights mismatches between conceptualized target groups in the dataset documentation and the actual data contents.

Together, these contributions advance the understanding of data quality issues in digital behavioral data, offering conceptual frameworks and practical methodologies to enhance data validity and reliability in empirical social science research. However, the purpose of this editorial is not only to briefly introduce the contributions to this special issue. We also aim to contribute to the research by classifying DBD studies concerning data quality. In doing so, we refer to a classification schema by Wagner et al. (2025) based on the dichotomous dimensions “data collection modus” (user- or platform-based) and “data generation process” (designed vs. found data). Using this 2 × 2 classification schema can help categorize DBD studies and identify related data quality properties as well as related data quality issues. By distinguishing categories of research design and mode of collecting DBD, we attempt to link data quality dimensions to the respective categories.

The remainder of the editorial is structured as follows: we briefly recapitulate data quality in the (survey-based) social sciences. We then focus on an in-depth introduction to DBD, focusing on introducing the classification schema mentioned above. The editorial concludes with a broader and integrated perspective on the quality of DBD.

Data Quality in the Social Sciences

Daikeler et al. (2024) provide a methodological overview of relevant data quality frameworks in the social sciences, which is central as it examines existing data quality frameworks to assess their applicability to modern digital social science data. They show that most of these frameworks evolve around an intrinsic or extrinsic perspective on data quality.

The Total Survey Error (TSE) is a well-known example of an intrinsic perspective, and it is concerned with the correctness of the data and identifies certain error sources. Following the TSE approach, we can distinguish two error sources, which can be related to “measurement error(s)” and “representation error(s)”. While the first two error sources are introduced in the seminal work on the TSE by Groves et al. (2009), Biemer et al. (2014, p. 387) introduce a third error type, namely “modelling error”, which captures “the error arising from fitting models for various purposes such as imputation, derivation of new variables, adjusting data values or estimates to conform to benchmarks, and so on”. Measurement errors can be divided into validity-related, measurement, and processing errors, indicating how well the survey questions measure the constructs of interest. Representation errors include coverage, sampling, nonresponse, and adjustment errors, indicating how well estimates generalize to the target population (Lyberg et al., 2018, p. 154).

On the other hand, an extrinsic perspective focuses on the usability of data in concrete research contexts, i.e., its “fitness for use” (or fitness for purpose) and contrasts with the intrinsic perspective, which emphasizes the internal properties of the data itself. Other dimensions related to the extrinsic perspective are the legal and ethical dimensions, including open science-related aspects such as reuse and replicability of the data (Daikeler et al., 2024). A prominent implementation is the so-called FAIR principle, introduced by Wilkinson et al. (2016). The acronym refers to the data requirements being findable, accessible, interoperable, and reusable. Data adhering to these FAIR principles are of high scientific value and are necessary but not necessarily sufficient conditions for reproducible (and replicable) research (Schoch et al., 2024). The section on “An Integrated DBD Quality Perspective” will extend the discussion of extrinsic (or contextual) data quality (Batini & Scannapieco, 2006).

Finally, we also want to refer to a recent position paper by Birkenmaier et al. (2024), titled “Defining and Evaluating Data Quality for the Social Sciences”, which also links existing data quality approaches, such as the TSE mentioned above, to more recent methodological developments in the (computational) social sciences. They highlight the challenges of diverse methods, varying data types (surveys, social media, web tracking, and linked data), and the lack of universal standards. In response, they propose a comprehensive and unified framework that integrates two major traditions in data quality evaluation: error-focused frameworks (like the Total Survey Error approach) and dimension-focused frameworks, which emphasize accuracy, timeliness, and usability. Based on input from multiple domain experts, they emphasize the importance of “fitness for use,” meaning data quality should be evaluated based on its intended use case. The proposed framework centers around defining the purpose of data usage clearly, specifying intrinsic quality requirements (such as accuracy and reliability), and addressing extrinsic requirements (like accessibility, documentation, and interoperability). The framework offers practical guidance for researchers and data curators by mapping these requirements to concrete metrics and indicators.

An In-Depth Introduction to DBD

In recent years, the social sciences have increasingly recognized the potential of DBD to address new substantive research questions in many research fields (Box-Steffensmeier et al., 2022). However, the scientific use of DBD is also associated with entirely new challenges. In contrast to survey data, most DBD are not generated for research purposes but are a side product of our daily activities (also called “readymade data” by Salganik, 2018). Hence, the data generation process is not based on elaborate research designs, which in turn may have profound implications for the validity of the conclusions drawn from the analysis of DBD. With respect to the aforementioned distinction between the intrinsic and extrinsic perspective, this mostly refers to the intrinsic perspective.

Since the 1960s, researchers have distinguished between unobtrusive and obtrusive research methods (Webb et al., 1966). Unobtrusive (also, nonreactive or observational) methods allow social scientists to study human behavior without direct interaction, avoiding disruptions like surveys or interviews. Because researchers do not influence data generation, these methods offer higher ecological validity but limit inferential power. In contrast, obtrusive methods, such as experiments, involve researcher-designed interventions in controlled (e.g., lab experiments) or semi-controlled environments (e.g., online field experiments). While large-scale online field experiments enhance ecological validity, they may reduce internal validity, as other factors can confound treatment effects. This distinction can also be applied to digital behavioral data: we can differentiate between data that is gathered with obtrusive methods (so-called “designed data”) and data that is collected with unobtrusive methods (so-called “found data” or “organic data”) (Strohmaier & Wagner, 2014). Digital behavioral data qualifies as “found digital behavioral data” when collected from social media or web platforms without controlling the data generation process. Typical examples are the “traces” that humans leave on Facebook or Google as a byproduct of their interactions with online platforms. Digital behavioral data can be considered “designed digital behavioral data” when the researchers control the data generation process – e.g., by randomly sampling participants from well-defined populations, measuring and blocking confounders, or randomly assigning participants to treatments.

For both, found and designed DBD, researchers can adopt a user-centered or a platform-centered data collection approach. User-centered data collections require users to participate in collecting their data, while platform-centered data collections typically do not require support from the data owner. User-centered data collections, often utilizing surveys to recruit participants, can link digital behavioral data with individual-level information on demographics and variables like party identification, political trust, or evaluations of other societal groups (Stier et al., 2019). To conduct user-centered data collections, research software like web tracking browser plugins (Adam et al., 2024), mobile apps (Kreuter et al., 2020; for recent developments see also Lux et al., 2025), or data donations (Boeschoten et al., 2022) is being developed at various academic institutions.

The four different types of DBD (user-centered found, platform-centered found, user-centered designed, and platform-centered found behavioral data) differ not only in the collection modi (Wagner et al., 2025) but also in the data quality aspects that should be considered (cf. Table 1).

Table 1.

Two Dimensions of Digital Behavioral Data and Their Relationship to the Contributions in this Special Issue.

	Found data nonreactive, no control or interruption of the data-generation process	Designed data researchers control and influence the data-generation process
Platform-centered data collection without user collaboration	Collection Modi: • APIs • Web scraping • Platform cooperations • Buying platform data Data quality considerations include, e.g.: • Representation and measurement errors described in TED-on (Sen et al., 2021) • Intransparency and instability of APIs may lead to incomparable samples • Web scraping data allows for capturing all data that users see, but the quality of collected data may suffer from platform changes over time • No data sharing or restricted data sharing due to lack of informed consent, copyright issues, privacy issues, and terms of services Contributions in this volume: • Daikeler, J., Fröhling, L., Sen, I., Birkenmaier, L., Gummer, T., Schwalbach, J., Silber, H., Weiß, B., Weller, K., & Lechner, C. (2024). Assessing Data Quality in the Age of Digital Social Research: A Systematic Review. • Dahlke, R., Kumar, D., Durumeric, Z., & Hancock, J. T. (2023). Quantifying the Systematic Bias in the Accessibility and Inaccessibility of Web Scraping Content From URL-Logged Web-Browsing Digital Trace Data. • Grigoropoulou, N., & Small, M. L. (2024). Are Large-Scale Data From Private Companies Reliable? An Analysis of Machine-Generated Business Location Data in a Popular Dataset. • Schneck, A., & Przepiorka, W. (2024). Meta-Dominance Analysis – A Tool for the Assessment of the Quality of Digital Behavioural Data. • Yu, Z., Sen, I., Assenmacher, D., Samory, M., Fröhling, L., Dahn, C., Nozza, D., & Wagner, C. (2024). The Unseen Targets of Hate: A Systematic Review of Hateful Communication Datasets.	Collection Modi: • Experiments on platforms, e.g., via platform cooperations Data quality considerations include, e.g.: • Representation and Measurement Errors described in TED-on (Sen et al., 2021) • Control over confounders thanks to random treatment assignment • No data sharing or restricted data sharing to protect the privacy of participants Contributions in this volume: none
User-centered data collection with user collaboration	Collection Modi: • Data donations via participant self-selection Data quality considerations include, e.g.: • Self-selection bias • Intransparency and instability of data download options offered by platforms or of the research software used to record users’ data • Data sharing is facilitated via the informed consent of the data owner Contributions in this volume: • Daikeler, J., Fröhling, L., Sen, I., Birkenmaier, L., Gummer, T., Schwalbach, J., Silber, H., Weiß, B., Weller, K., & Lechner, C. (2024). Assessing Data Quality in the Age of Digital Social Research: A Systematic Review.	Collection Modi: • Online Experiments on users, e.g., via crowdsourcing platforms or artificial online environments • Online Panel Infrastructure • Survey Infrastructure • Data donations via systematic recruitment of participants Data quality considerations include, e.g.: • Control over selection biases via strategic sampling • Control over confounders thanks to random treatment assignment • Intransparency and instability of data download options offered by platforms, or the research software used to download or record user’s data • Data sharing is facilitated via informed consent of the data owner Contributions in this volume: • Adam, S., Makhortykh, M., Maier, M., Aigenseer, V., Urman, A., Gil Lopez, T., Christner, C., De León, E., & Ulloa, R. (2024). Improving the Quality of Individual-Level Web Tracking: Challenges of Existing Approaches and Introduction of a New Content and Long-Tail Sensitive Academic Solution. • Antoun, C., & Wenz, A. (2024). Nonparticipation Bias in Accelerometer-Based Studies and the Use of Propensity Scores. • Cernat, A., Keusch, F., Bach, R. L., & Pankowska, P. K. (2024). Estimating Measurement Quality in Digital Trace Data and Surveys Using the MultiTrait MultiMethod Model. • Daikeler, J., Fröhling, L., Sen, I., Birkenmaier, L., Gummer, T., Schwalbach, J., Silber, H., Weiß, B., Weller, K., & Lechner, C. (2024). Assessing Data Quality in the Age of Digital Social Research: A Systematic Review. • Wenz, A., Keusch, F., & Bach, R. L. (2024). Measuring Smartphone Use: Survey Versus Digital Behavioral Data.

Note. This table is based on Wagner et al. (2025) and has been modified to reflect the purpose of this editorial.

For example, platform-based DBD is often used to study collective or platform-specific phenomena such as civic engagement, political polarization, and dynamic social processes (e.g., spreading information over time). Controlling confounders for found platform-based data collections is difficult since this data is typically collected under the algorithm, and a change in the algorithm can have dramatic and instant effects on the signals in the data collected (Lazer, 2015; Wagner et al., 2021) (see also the later section on “An Integrated DBD Quality Perspective” for a refined discussion on conclusion validity and causal inference in particular). Further, improving the sample quality of found data is typically not possible since the sampling frame (i.e., the platform population) is not well-defined and all risks and challenges associated with nonprobability samples apply when using this kind of data (for a recent overview, see Freese & Jin, 2025). The opacity and instability of data access software (e.g., API and data owner download options) and research software to record data (e.g., web tracking software) also limit the quality of DBD.

User-centered data collections using a probability-based sampling design allow for making inferences about a given target population since the inclusion probabilities are known for every selection step. Typically, participants are invited to participate in a survey and then, after survey completion, to participate in a DBD collection. Hence, participants must consent twice, increasing the likelihood of nonresponse bias (Antoun & Wenz, 2024) since participation in these data collections is associated with factors such as privacy concerns, tech-savviness, education, and age or data type (Beuthner et al., 2023; Elevelt et al., 2019; Silber et al., 2022). However, for cost-related reasons, many user-centered DBD collections rely on nonprobability samples, limiting the extent to which findings can be generalized. Some studies propose statistical weighting approaches to fix this issue (e.g., propensity scores, see Antoun & Wenz, 2024). However, while we have sound statistical theory when it comes to using probability-based samples, more research is needed when it comes to variance estimation and advanced weighting techniques, especially concerning the collection of weighting variables that are correlated with key survey variables and the data-generating process (Cornesse et al., 2020).

An Integrated DBD Quality Perspective

While the preceding discussion has mostly focused on intrinsic data quality—that is, whether the data meet objective measurement- and representation-related standards regardless of their intended use—we conclude by offering initial reflections on how further quality dimensions might be integrated to develop a more comprehensive understanding of DBD quality in social science research. We have already introduced the concept of extrinsic (or contextual) data quality (Batini & Scannapieco, 2006), which represents the task-dependent element of overall data quality. It refers to the fitness for purpose of data and, thus, their adequacy for addressing specific analytic tasks such as parameter estimation, effect identification, prediction, classification, or pattern detection. Relevant dimensions of extrinsic data quality and related concepts from official statistics include accessibility, value-added, relevancy, timeliness, completeness, and an appropriate amount of data (e.g., Batini & Scannapieco, 2006; Eurostat, 2019; Karr et al., 2006; Wang & Strong, 1996).

This expanded perspective shifts the focus toward the extent to which the data support attaining the intended research objectives. For example, consider a scenario where the available DBD meets all established intrinsic quality criteria, yet lacks information on confounders essential for causal effect identification (e.g., Pearl, 2009). In such a case, the high intrinsic quality is insufficient to ensure valid causal inferences if confounding bias in the causal effect estimates derived from these data cannot be eliminated.

It follows that both intrinsic and extrinsic data quality are essential components of what we term conclusion quality. This comprehensive quality concept encompasses the entire research process and builds on the principles of statistical conclusion validity (e.g., Shadish et al., 2002). It reflects the degree to which conclusions drawn from the results of data analysis are valid and contribute to achieving the research objectives. Conclusion quality is a multidimensional construct shaped by various factors, including theoretical reasoning, estimand identification (Lundberg et al., 2021), study design, intrinsic and extrinsic data quality, analytical decisions (e.g., model specification, estimation strategy; for survey data, see West et al., 2016, 2017), interpretation of results, and integration with domain knowledge.

From a DBD perspective, high conclusion quality can only be achieved when the data meet established intrinsic quality standards and are fit for purpose—that is, they contain the necessary information to complete the underlying research tasks effectively. In case of designed DBD, fitness for purpose can be actively ensured through deliberate design choices during the study planning phase, although the feasibility of doing so depends on the specific research tasks and the level of control over the data-generation process. For example, establishing data fitness is relatively straightforward when utilizing a self-programmed tracking app to collect health-related behavioral data from respondents’ smartphones alongside a large-scale survey (e.g., Antoun & Wenz, 2024 in this volume; Munzert et al., 2021; Thornton et al., 2021), while implementing rigorous study designs on external social media platforms typically requires collaboration with platform providers (e.g., González-Bailón et al., 2023; Nyhan et al., 2023).

In contrast, found DBD are typically unstructured and not fit for purpose in their raw form. This necessitates a posteriori optimization strategies such as data pruning, data fusion, and missing data handling to align them with the research objectives (Leitgöb & Keusch, forthcoming). The required adjustments depend on the data at hand and the specific analytic tasks required to address the research objectives. For example, eliminating confounding and endogenous selection bias is essential for causal inference (e.g., Elwert & Winship, 2014; Pearl, 2009), while prediction and classification tasks demand the inclusion of variables with high predictive power. For descriptive purposes, the focus is on achieving a representative depiction of the target population. Thus, the data-related aspect of conclusion quality depends on the extent to which available DBD with given intrinsic data quality can be refined to achieve fitness for purpose. Given the considerable researcher degrees of freedom (e.g., Simmons et al., 2011) in selecting the data from the universe of available sources and making them fit for purpose, we argue that transparent and well-reasoned documentation of these decisions constitutes another element of conclusion quality enhancing the traceability and replicability of DBD-based research.

Obviously, the degree of control over the data generation process differs between found and designed DBD, with substantial implications for conclusion quality. While the prospective nature of designed DBD allows for the proactive implementation of features to ensure high intrinsic quality and data fitness, found DBD can only be optimized retrospectively, if at all. Accordingly, designed DBD are generally expected to support higher conclusion quality. However, because found DBD are generated without scientific intervention in real-world settings, conclusions based on these data may yield higher ecological validity. In research practice, scholars must balance these quality dimensions to best align the use of DBD with their specific research objectives.

Among the dimensions of extrinsic data quality, accessibility and timeliness have become particularly salient for DBD in recent years. This is mainly attributable to the 2018 Cambridge Analytica scandal, which involved the unauthorized commercial exploitation of Facebook data of up to 87 million users. The scandal, referred to by Bruns (2019) as the APIcalypse, “shifted the lid on the very severe privacy issue in the digital world” (Trezza, 2023, p. 1) and ushered in what Freelon (2018) terms the post-API age, characterized by radical restrictions on access to social media data for research purposes. More recently, Twitter (now X) placed its API behind a paywall in March 2023, replacing free academic access with a tiered pricing model (Developers [@XDevelopers], 2023). These developments have significantly curtailed the availability of large-scale social media data and the promptness of accessing and analyzing such data for research purposes. Such limitations affect conclusion quality, with considerable real-world consequences. For example, selective or outdated training data may introduce algorithmic bias in automated decision-making systems, reinforcing discrimination against marginalized groups underrepresented in digital spaces (e.g., Ferrara, 2023).

Harvesting DBD can also raise unresolved questions around legal and ethical concerns. This is particularly true for web scraping conducted outside the official channels offered by platform providers via APIs, which potentially intersects with legal frameworks, including privacy and data protection laws, intellectual property rights, trespassing laws, and computer misuse regulations (e.g., Brown et al., 2024; Krotov et al., 2020; Trezza, 2023). From an ethical perspective, key questions include “whether informed consent is necessary when dealing with ‘found data’, and […] at what stage computational research becomes human subjects research requiring particular ethical protection” (Brown et al., 2024, p. 12; for further discussion, see e.g., Boyd & Crawford, 2012; Breuer et al., 2025; Metcalf & Crawford, 2016; Zook et al., 2017; Zwitter, 2014). Importantly, these issues are not limited to (scraped) social media or web data but apply broadly to DBD as a whole. Since highly individualized and granular behavior can be recorded at scale in the form of DBD, considering the consequences of DBD-based measurements and mis-measurements is crucial (Wagner et al., 2021). Consequently, we consider data processing and usage quality as a further essential component of a comprehensive quality framework for DBD, covering the extent to which all stages of data collection, processing, analysis, archiving, and sharing comply with all applicable legal regulations, adhere to established ethical research standards, and are fully documented to ensure transparency and reproducibility.

Thus far, the discussion has adopted a task-centered perspective, which assumes that DBD at hand are either selected or generated specifically to optimize the completion of given research tasks. As a final point, we propose complementing this view with a data-centered perspective on DBD quality. This perspective is typically rooted in data archiving infrastructures, which aim to provide comprehensive and broadly applicable DBD capable of supporting the empirical investigation of a wide range of research questions. Compared to the task-centered approach, which focuses primarily on data relevance for individual, particular research objectives, the data-centered perspective additionally emphasizes a higher-order form of data relevance. The latter aims to ensure that stored DBD are broadly usable across diverse social science domains and analytical contexts. Moreover, because these DBD are intended for multiple use by different researchers, particular attention is directed toward documentation, metadata provision, and user support—core criteria of the data quality framework proposed by Karr et al. (2006) for official statistics and elements of the FAIR principles (Wilkinson et al., 2016). Accordingly, high-quality archived DBD include detailed information about their generation process, rich metadata (e.g., timestamps, geolocation, processing duration), and are accompanied by tools that support efficient data access and handling.

Summing up, we propose integrating the core concept of intrinsic data quality into a comprehensive DBD quality framework with additional dimensions such as extrinsic data quality and data processing and usage quality. In doing so, we seek to address the apparent disconnection between the data quality discussion and the specific research objectives associated with the analysis of DBD. Moreover, this broader conceptualization enhances the compatibility of our understanding of DBD quality with established multidimensional data quality frameworks in the field of statistics (e.g., Eurostat, 2019; Karr et al., 2006) and the FAIR principles (Wilkinson et al., 2016). The resulting tasks are to refine and expand these initial considerations and integrate them into a structured, coherent, and comprehensive DBD quality framework that enables the development of valid quality indicators for all dimensions.

Conclusion

While digital behavioral data (DBD) can be of high scientific value when investigating social life, its scientific value hinges on a multidimensional notion of data quality. The nine contributions in this special issue show that we must look beyond intrinsic error frameworks (e.g., measurement and representation) toward an integrated view that incorporates extrinsic, task-specific fitness-for-use as well as data processing and usage quality (legal, ethical, transparency, and reproducibility). Our 2 × 2 classification of DBD by data-generation process (designed vs. found) and collection modus (user- vs. platform-based) can help to locate typical error sources, feasible remedies, and the limits of inference. Building on this, we add conclusion quality as another important quality dimension: valid, well-documented conclusions that emerge from aligning theory, design, intrinsic and extrinsic data quality, and analytic decisions.

DBD-based research must therefore (1) deliberately design studies to ensure fitness-for-purpose whenever control is possible; (2) transparently optimize found data ex post and report the resulting researcher degrees of freedom; (3) confront shrinking accessibility and timeliness in the post-API age; and (4) meet stringent legal and ethical standards. Further research is needed to consolidate existing frameworks into a coherent, indicator-ready DBD quality framework, including developing robust methods for weighting and causal inference with nonprobability and passively collected data, and investing in FAIR DBD archiving infrastructures.

Footnotes

Acknowledgements

The authors used ChatGPT-4o (OpenAI, accessed July 2025) to assist with language refinement during manuscript preparation. All content was reviewed and approved by the authors.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Bernd Weiß

Heinz Leitgöb

Claudia Wagner

Author Biographies

Bernd Weiß works at GESIS – Leibniz Institute for the Social Sciences, is head of the GESIS Panel, and deputy head of the GESIS Department Survey Design and Methodology. His research covers areas such as survey methodology, empirical social research, and research synthesis.

Heinz Leitgöb is Interim Professor at the Institute of Sociology at Leipzig University (Germany) and a Research Associate at the Institute of Sociology at Goethe University Frankfurt (Germany). His current research interests include computational social science, rare event data, survey methodology, and the sociology of crime and deviance.

Claudia Wagner is Professor for Applied Computational Social Science at RWTH Aachen University and director of the Computational Social Science department at GESIS – Leibniz Institute for the Social Sciences. Her research focuses on computational social science and AI methods to enhance our understanding of algorithmically infused societies.

References

Adam

Makhortykh

Maier

Aigenseer

Urman

Gil Lopez

Christner

De León

Ulloa

(2024). Improving the quality of individual-level web tracking: Challenges of existing approaches and introduction of a new content and long-tail sensitive academic solution. Social Science Computer Review, Article 08944393241287793. https://doi.org/10.1177/08944393241287793

Amaya

Biemer

P. P.

Kinyon

(2020). Total Error in a Big Data World: Adapting the TSE Framework to Big Data. Journal of Survey Statistics and Methodology, 8(1), 89–119. https://doi.org/10.1093/jssam/smz056

Antoun

Wenz

(2024). Nonparticipation bias in accelerometer-based studies and the use of propensity scores. Social Science Computer Review, Article 08944393241254463. https://doi.org/10.1177/08944393241254463

Batini

Scannapieco

(2006). Data quality: Concepts, methodologies and techniques. Springer.

Beuthner

Weiß

Silber

Keusch

Schröder

(2023). Consent to data linkage for different data domains – The role of question order, question wording, and incentives. International Journal of Social Research Methodology, 27(4), 375–388. https://doi.org/10.1080/13645579.2023.2173847

Biemer

P. P.

(2010). Total survey error: Design, implementation, and evaluation. Public Opinion Quarterly, 74(5), 817–848. https://doi.org/10.1093/poq/nfq058

Biemer

P. P.

Amaya

(2020). Total Error Frameworks for Found Data. In Hill

C. A.

Biemer

P. P.

Buskirk

T. D.

Japec

Kirchner

Kolenikov

Lyberg

L. E.

(Eds.), Big Data Meets Survey Science: A Collection of Innovative Methods (pp. 133–161). John Wiley & Sons, Inc. https://doi.org/10.1002/9781118976357.ch4

Biemer

P. P.

Trewin

Bergdahl

Japec

(2014). A system for managing the quality of official statistics. Journal of Official Statistics, 30(3), 381–415. https://doi.org/10.2478/jos-2014-0022

Birkenmaier

Daikeler

Fröhling

Gummer

Lechner

Lux

Schwalbach

Silber

Weiß

Weller

Wolf

Abel

Breuer

Dietze

Dimitrov

Döring

Hebel

Hochman

Jünger

Ziaja

(2024). Defining and evaluating data quality for the social sciences: Position paper. GESIS Papers . https://doi.org/10.21241/SSOAR.96764

10.

Boeschoten

Ausloos

Möller

J. E.

Araujo

Oberski

D. L.

(2022). A framework for privacy preserving digital trace data collection through data donation. Computational Communication Research, 4(2), 388–423. https://doi.org/10.5117/CCR2022.2.002.BOES

11.

Bosch

O. J.

Revilla

(2022). When survey science met web tracking: Presenting an error framework for metered data. Journal of the Royal Statistical Society. Series A, (Statistics in Society), 185(Supplement_2), S408–S436. https://doi.org/10.1111/rssa.12956

12.

Box-Steffensmeier

J. M.

Burgess

Corbetta

Crawford

Duflo

Fogarty

Gopnik

Hanafi

Herrero

Hong

Kameyama

Lee

T. M. C.

Leung

G. M.

Nagin

D. S.

Nobre

A. C.

Nordentoft

Okbay

Perfors

Rival

L. M.

Wagner

(2022). The future of human behaviour research. Nature Human Behaviour, 6(1), 15–24. https://doi.org/10.1038/s41562-021-01275-6

13.

Boyd

Crawford

(2012). Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society, 15(5), 662–679. https://doi.org/10.1080/1369118X.2012.678878

14.

Breuer

Stier

Lukito

Mangold

Wieland

Radovanović

Zens

Breuer

Weller

Wagner

(2025). Overview of ethical considerations when working with digital behavioral data (GESIS guides to digital behavioral data, 14) (Version 1.0). GESIS - Leibniz-Institute for the Social Sciences. https://doi.org/10.60762/GGDBD25014.1.0

15.

Brown

M. A.

Gruen

Maldoff

Messing

Sanderson

Zimmer

(2024). Web scraping for research: Legal, ethical, institutional, and scientific considerations (No. arXiv:2410.23432). arXiv . https://doi.org/10.48550/arXiv.2410.23432

16.

Bruns

(2019). After the ‘APIcalypse’: Social media platforms and their fight against critical scholarly research. Information, Communication & Society, 22(11), 1544–1566. https://doi.org/10.1080/1369118X.2019.1637447

17.

Cernat

Keusch

Bach

R. L.

Pankowska

P. K.

(2024). Estimating measurement quality in digital trace data and surveys using the MultiTrait MultiMethod model. Social Science Computer Review, Article 08944393241254464. https://doi.org/10.1177/08944393241254464

18.

Chancellor

De Choudhury

(2020). Methods in predictive techniques for mental health status on social media: A critical review. Npj Digital Medicine, 3(1), 43. https://doi.org/10.1038/s41746-020-0233-7

19.

Cornesse

Blom

A. G.

Dutwin

Krosnick

J. A.

De Leeuw

E. D.

Legleye

Pasek

Pennay

Phillips

Sakshaug

J. W.

Struminskaya

Wenz

(2020). A review of conceptual approaches and empirical evidence on probability and nonprobability sample survey research. Journal of Survey Statistics and Methodology, 8(1), 4–36. https://doi.org/10.1093/jssam/smz041

20.

Dahlke

Kumar

Durumeric

Hancock

J. T.

(2023). Quantifying the systematic bias in the accessibility and inaccessibility of web scraping content from URL-logged web-browsing digital trace data. Social Science Computer Review, Article 08944393231218214. https://doi.org/10.1177/08944393231218214

21.

Daikeler

Fröhling

Sen

Birkenmaier

Gummer

Schwalbach

Silber

Weiß

Weller

Lechner

(2024). Assessing data quality in the age of digital social research: A systematic review. Social Science Computer Review, Article 08944393241245395. https://doi.org/10.1177/08944393241245395

22.

Developers [@XDevelopers] . (2023, February 8). Also on February 13, we will deprecate the premium API. If you’re subscribed to premium, you can apply for enterprise to continue using these endpoints. https://x.com/XDevelopers/status/1623467619725774848

23.

Elevelt

Lugtig

Toepoel

(2019). Doing a time use survey on smartphones only: What factors predict nonresponse at different stages of the survey process? Survey Research Methods, 13(2), 195–213. https://doi.org/10.18148/SRM/2019.V13I2.7385

24.

Elwert

Winship

(2014). Endogenous selection bias: The problem of conditioning on a collider variable. Annual Review of Sociology, 40(1), 31–53. https://doi.org/10.1146/annurev-soc-071913-043455

25.

Eurostat . (2019). Quality assurance framework of the European statistical system (Version 2.0). https://ec.europa.eu/eurostat/documents/64157/4392716/ESS-QAF-V2.0-final.pdf

26.

Ferrara

(2023). Fairness and bias in Artificial Intelligence: A brief survey of sources, impacts, and mitigation strategies. Sci, 6(1), 3. https://doi.org/10.3390/sci6010003

27.

Freelon

(2018). The forum computational research in the Post-API age. Political Communication, 35(4), 665–668. https://doi.org/10.1080/10584609.2018.1477506

28.

Freese

Jin

(2025). Online nonprobability samples. Annual Review of Sociology, 51(1), 16.1–16.20. https://doi.org/10.1146/annurev-soc-090524-043117

29.

Fröhling

Birkenmaier

Daikeler

(2023). Garbage in – Garbage out? Datenqualität im Umgang mit digitalen Verhaltensdaten. Easy Social Sciences, 68, 21–30. https://doi.org/10.15464/EASY.2023.03

30.

González-Bailón

Lazer

Barberá

Zhang

Allcott

Brown

Crespo-Tenorio

Freelon

Gentzkow

Guess

A. M.

Iyengar

Kim

Y. M.

Malhotra

Moehler

Nyhan

Pan

Rivera

C. V.

Settle

Thorson

Tucker

J. A.

(2023). Asymmetric ideological segregation in exposure to political news on Facebook. Science, 381(6656), 392–398. https://doi.org/10.1126/science.ade7138

31.

Grigoropoulou

Small

M. L.

(2024). Are large-scale data from private companies reliable? An analysis of machine-generated business location data in a popular dataset. Social Science Computer Review, Article 08944393241245390. https://doi.org/10.1177/08944393241245390

32.

Groves

R. M.

Fowler

F. J.

Couper

M. P.

Lepkoweski

J. M.

Singer

Tourangeau

(Eds.). (2009). Survey methodology (2nd ed.). Wiley.

33.

Hsieh

Chi

Y. P.

Murphy

(2017). Total Twitter error: Decomposing public opinion measurement on Twitter from a total survey error perspective. In Biemer

P. P.

de Leeuw

E. D.

Eckman

Edwards

Kreuter

Lyberg

Tucker

West

B. T.

(Eds.), Total survey error in practice (pp. 23–46). Wiley.

34.

Karr

A. F.

Sanil

A. P.

Banks

D. L.

(2006). Data quality: A statistical perspective. Statistical Methodology, 3(2), 137–173. https://doi.org/10.1016/j.stamet.2005.08.005

35.

Kohne

Miller

Strohmaier

Wagner

Wolf

(2021). Unterstützung bei der Analyse digitaler Gesellschaften. Soziologie, 50(4), 440–447.

36.

Kreuter

Haas

G.-C.

Keusch

Bähr

Trappmann

(2020). Collecting survey and smartphone sensor data with an app: Opportunities and challenges around privacy and informed consent. Social Science Computer Review, 38(5), 533–549. https://doi.org/10.1177/0894439318816389

37.

Lazer

(2015). Issues of construct validity and reliability in massive, passive data collections. (access on 2025-04-17) . https://items.ssrc.org/the-cities-papers/issues-of-construct-validity-and-reliability-in-massive-passive-data-collections/

38.

Leitgöb

Keusch

(forthcoming). Causal inferences from digital behavioral data: Methodological implications. Kölner Zeitschrift für Soziologie und Sozialpsychologie .

39.

Lundberg

Johnson

Stewart

B. M.

(2021). What is your estimand? Defining the target quantity connects statistical evidence to theory. American Sociological Review, 86(3), 532–565. https://doi.org/10.1177/00031224211004187

40.

Lux

Wieland

Radovanovic

Zens

Breuer

Weller

Wagner

(2025). How to set up and monitor App-based data collections (Version 1.0). GESIS - Leibniz-Institute for the Social Sciences . https://doi.org/10.60762/GGDBD25022.1.0

41.

Lyberg

Cibelli Hibben

Pennell

B.-E.

(2018). Applying the total survey error framework to PIAAC. Quality Assurance in Education, 26(2), 153–168. https://doi.org/10.1108/QAE-07-2017-0035

42.

Mangold

Wieland

Stier

Otto

(2023). Neue Infrastrukturen für die Messung digitaler Mediennutzung. Publizistik, 68(2–3), 263–280. https://doi.org/10.1007/s11616-023-00804-9

43.

Metcalf

Crawford

(2016). Where are human subjects in big data research? The emerging ethics divide. Big Data & Society, 3(1), Article 2053951716650211. https://doi.org/10.1177/2053951716650211

44.

Munzert

Selb

Gohdes

Stoetzer

L. F.

Lowe

(2021). Tracking and promoting the usage of a COVID-19 contact tracing app. Nature Human Behaviour, 5(2), 247–255. https://doi.org/10.1038/s41562-020-01044-x

45.

Krotov

Johnson

Silva

University of Houston . (2020). Legality and ethics of web scraping. Communications of the Association for Information Systems, 47, 539–563. https://doi.org/10.17705/1CAIS.04724

46.

Nyhan

Settle

Thorson

Wojcieszak

Barberá

Chen

A. Y.

Allcott

Brown

Crespo-Tenorio

Dimmery

Freelon

Gentzkow

González-Bailón

Guess

A. M.

Kennedy

Kim

Y. M.

Lazer

Malhotra

Moehler

Tucker

J. A.

(2023). Like-minded sources on Facebook are prevalent but not polarizing. Nature, 620(7972), 137–144. https://doi.org/10.1038/s41586-023-06297-w

47.

Pearl

(2009). Causality: Models, reasoning and inference (2nd ed.). Cambridge University Press.

48.

Rahwan

Cebrian

Obradovich

Bongard

Bonnefon

J.-F.

Breazeal

Crandall

J. W.

Christakis

N. A.

Couzin

I. D.

Jackson

M. O.

Jennings

N. R.

Kamar

Kloumann

I. M.

Larochelle

Lazer

McElreath

Mislove

Parkes

D. C.

Pentland

Wellman

(2019). Machine behaviour. Nature, 568(7753), 477–486. https://doi.org/10.1038/s41586-019-1138-y

49.

Salganik

M. J.

(2018). Bit by bit: Social research in the digital age. Princeton University Press.

50.

Schneck

Przepiorka

(2024). Meta-dominance analysis – A tool for the assessment of the quality of digital behavioural data. Social Science Computer Review, Article 08944393241261958. https://doi.org/10.1177/08944393241261958

51.

Schoch

Chan

Wagner

Bleier

. (2024). Computational reproducibility in computational social science. EPJ Data Science, 13(1), 75. https://doi.org/10.1140/epjds/s13688-024-00514-w

52.

Sen

Flöck

Weller

Weiß

Wagner

(2021). A total error framework for digital traces of human behavior on online platforms. Public Opinion Quarterly, 85(S1), 399–422. https://doi.org/10.1093/poq/nfab018

53.

Shadish

W. R.

Cook

T. D.

Campbell

D. T.

(2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton, Mifflin and Company.

54.

Silber

Breuer

Beuthner

Gummer

Keusch

Siegers

Stier

Weiß

(2022). Linking surveys and digital trace data: Insights from two studies on determinants of data sharing behaviour. Journal of the Royal Statistical Society - Series A: Statistics in Society, 185(Supplement_2), S387–S407. https://doi.org/10.1111/rssa.12954

55.

Simmons

J. P.

Nelson

L. D.

Simonsohn

(2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. https://doi.org/10.1177/0956797611417632

56.

Stier

Breuer

Siegers

Thorson

(2019). Integrating survey data and digital trace data: Key issues in developing an emerging field. Social Science Computer Review, 38(5), 503–516. https://doi.org/10.1177/0894439319843669

57.

Strohmaier

Wagner

(2014). Computational social science for the world wide web. IEEE Intelligent Systems, 29(5), 84–88. https://doi.org/10.1109/MIS.2014.80

58.

Thornton

Gardner

L. A.

Osman

Green

Champion

K. E.

Bryant

Teesson

Kay-Lambkin

Chapman

The Health4Life Team . (2021). A multiple health behavior change, self-monitoring Mobile app for adolescents: Development and usability study of the Health4Life app. JMIR Formative Research, 5(4), Article e25513. https://doi.org/10.2196/25513

59.

Trezza

(2023). To scrape or not to scrape, this is dilemma. The post-API scenario and implications on digital research. Frontiers in Sociology, 8, Article 1145038. https://doi.org/10.3389/fsoc.2023.1145038

60.

Tufekci

(2017). Twitter and tear gas. The power and fragility of networked protest. Yale University Press. https://doi.org/10.25969/MEDIAREP/14848

61.

Wagner

Stier

Zens

Radovanović

Zens

Breuer

Weller

Wagner

(2025). What is digital behavioral data? (GESIS guides to digital behavioral data #1). GESIS - Leibniz Institute for the Social Sciences. https://doi.org/10.60762/ggdbd25001.1.0

62.

Wagner

Strohmaier

Olteanu

Kıcıman

Contractor

Eliassi-Rad

(2021). Measuring algorithmically infused societies. Nature, 595(7866), 197–204. https://doi.org/10.1038/s41586-021-03666-1

63.

Wang

R. Y.

Strong

D. M.

(1996). Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 12(4), 5–33. https://doi.org/10.1080/07421222.1996.11518099

64.

Webb

E. J.

Campbell

D. T.

Schwartz

R. D.

Sechrest

(1966). Unobtrusive measures: Nonreactive research in the social sciences (p. 225). Rand McNally.

65.

Wenz

Keusch

Bach

R. L.

(2024). Measuring smartphone use. Survey versus digital behavioral data. Social Science Computer Review, Article 08944393231224540. https://doi.org/10.1177/08944393231224540

66.

West

B. T.

Sakshaug

J. W.

Aurelien

G. A. S.

(2016). How big of a problem is analytic error in secondary analyses of survey data? PLoS One, 11(6), Article e0158120. https://doi.org/10.1371/journal.pone.0158120

67.

West

B. T.

Sakshaug

J. W.

Kim

(2017). Analytic error as an important component of total survey error. In Biemer

P. P.

de Leeuw

E. D.

Eckman

Edwards

Kreuter

Lyberg

Tucker

West

B. T.

(Eds.), Total survey error in practice (pp. 489–510). Wiley.

68.

Wilkinson

M. D.

Dumontier

Aalbersberg

Appleton

Axton

Baak

Blomberg

Boiten

J.-W.

da Silva Santos

L. B.

Bourne

P. E.

Bouwman

Brookes

A. J.

Clark

Crosas

Dillo

Dumon

Edmunds

Evelo

C. T.

Finkers

Mons

(2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3(1), 160018. https://doi.org/10.1038/sdata.2016.18

69.

Sen

Assenmacher

Samory

Fröhling

Dahn

Nozza

Wagner

(2024). The unseen targets of hate: A systematic review of hateful communication datasets. Social Science Computer Review, Article 08944393241258771. https://doi.org/10.1177/08944393241258771

70.

Zook

Barocas

Boyd

Crawford

Keller

Gangadharan

S. P.

Goodman

Hollander

Koenig

B. A.

Metcalf

Narayanan

Nelson

Pasquale

(2017). Ten simple rules for responsible big data research. PLoS Computational Biology, 13(3), Article e1005399. https://doi.org/10.1371/journal.pcbi.1005399

71.

Zwitter

(2014). Big data ethics. Big Data & Society, 1(2), Article 2053951714559253. https://doi.org/10.1177/2053951714559253

Conceptualizing,Assessing,and Improving the Quality of Digital Behavioral Data

Abstract

Keywords

Introduction

Data Quality in the Social Sciences

An In-Depth Introduction to DBD

An Integrated DBD Quality Perspective

Conclusion

Footnotes

Acknowledgements

Declaration of Conflicting Interests

Funding

ORCID iDs

Author Biographies

References