Abstract
Conducting organizational research via online surveys and experiments offers a host of advantages over traditional forms of data collection when it comes to sampling for more advanced study designs, while also ensuring data quality. To draw attention to these advantages and encourage researchers to fully leverage them, the present paper is structured into two parts. First, along a structure of commonly used research designs, we showcase select organizational psychology (OP) and organizational behavior (OB) research and explain how the Internet makes it feasible to conduct research not only with larger and more representative samples, but also with more complex research designs than circumstances usually allow in offline settings. Subsequently, because online data collections often also come with some data quality concerns, in the second section, we synthesize the methodological literature to outline three improvement areas and several accompanying strategies for bolstering data quality.
Introduction
These days, most research in the fields of organizational psychology (OP) and organizational behavior (OB) utilizes the Internet in one way or another. To give just two obvious examples: OP/OB scholars routinely use online crowdsourcing platforms to recruit organization members for their surveys (Porter et al., 2019) and they can also draw on a global population that uses phones, tablets, or laptops to respond to much more advanced experimental stimuli and measures than would be possible with paper-and-pencil tests (Askitas & Zimmermann, 2015; Clement, 2020; Gosling & Mason, 2015).
Nevertheless, there is still serious skepticism over online recruited samples and the reliability of the data that were collected online—to the extent that some journal editors and reviewers are more likely to reject studies that only use online surveys and experiments (Landers & Behrend, 2015; Walter et al., 2019; Zack et al., 2019). Some of the criticism is warranted when considering the segments of the general population that participate in online panels like those on the Amazon Mechanical Turk platform (Shaw & Hargittai, 2021), as well as how professional some of the panelists have become in filling in surveys (Hargittai & Shaw, 2020). In fact, some people form dedicated communities to exchange advice on how to complete studies (Sheehan, 2018) while others have even programmed machine bots to do their work for them (Storozuk et al., 2020). So, it seems that the time is ripe to shift from simply regarding online research as a way to conveniently duplicate offline research, and instead become more ambitious about such data collections. Here, we do not mean that completely different methods of data generation should be employed (for that, see, e.g., overviews on Big Data approaches to organizational research, Wenzel & Van Quaquebeke, 2018) – though that is certainly a valuable avenue – but rather that the “bread-and-butter” methods of OB/OP scholars (experimental and survey research) need to be seriously updated in the way how they are deployed online.
Needless to say, other researchers have begun to review the literature. However, despite a large body of methodological studies on different aspects of online surveys and experiments, the recent reviews are fairly limited in the scope of topics they cover. For example, Porter and colleagues (2019) and Aguinis and colleagues (2021) specifically considered issues of recruiting samples online using Amazon’s MTurk and similar online panels. Another example is the review by Göritz (2006), which only covers the topic of incentives in online surveys and experiments. While highly valuable, the focus on such specific subtopics limits researchers’ understanding of the broader implications of conducting organizational surveys and experiments online. Meanwhile, broader reviews of the online aspects of surveys and experiments are older than fifteen years (i.e., the latest relevant reviews are Eaton & Struthers, 2002; Stanton & Rogelberg, 2001). Many of these cover issues around programming or specifications for hard- and software—issues that are usually addressed automatically by any querying platform these days or that have largely been rendered moot by increases in processing power across both computers and smartphones. Lastly, there are more contemporary reviews that focus on novel ways of collecting data online (cf. Wenzel & Van Quaquebeke, 2018, on Big Data); however, these do not speak as strongly to the main research methodologies used in OP/OB research (experimental and survey research). It therefore seems timely to provide an updated overview on how to fully leverage the methodological opportunities that the online environment offers for conducting organizational research, including issues of recruiting and data quality assurance.
In short, the current paper generally strives to improve OP/OB scholars’ understanding of how they can leverage online surveys and experiments to their best effect. To achieve this, we first briefly sketch the core challenges faced by the fields of OP/OB: issue that have arguably always been around, but that may have become more pressing due to more complex theorizing lacking samples that are appropriately sized and representative. Against this background, we showcase examples of common and more ambitious research designs and illustrate for each how the Internet enables scholars to overcome traditional sample feasibility and representativeness limits. Subsequently, we synthesize extant online methodological research and best practices into three areas (1. improve response rate, 2. improve response quality, 3. respect the participant) to improve data quality. For each area, we offer concrete recommendations for successfully exploiting the unique online methodological opportunities. In sum, with the present paper, we hope to enable more informed decisions on the Internet-related aspects of organizational research among future researchers, Internal Review Boards, reviewers, and editors.
General Challenges Faced by OP/OB researchers
OP/OB scholars study the thoughts, feelings, and behaviors of individual members (e.g., employees or supervisors) and groups of members (e.g., work teams) of work-organizations. Even in traditional work arrangements, the underlying dynamics can be fairly complex. Additionally, as work becomes globally more intertwined and diverse, organizational contexts more dynamic, and concepts of work more fluid, OP/OB theories are called upon to account for these new realities (and populations).
As such, convenient (i.e., accessible) samples and methods of query may not suffice to appropriately test OP/OB theorizing (Bell et al., 2010; Molina-Azorín et al., 2019; Spector, 2019). For instance, it makes little sense to blindly apply insights about autocratic leadership obtained in a Western context to sub-Saharan Africa; to assume that all stakeholders within an organization view a topic similarly; to think that innovation in S&P500 companies mirrors innovation in start-ups; to believe that regular employees’ motivational structures can be mapped onto gig-economy workers, or that the work-family interface operates similarly for local and expatriate workers.
In effect, then, OP/OB scholars find themselves challenged on three fronts: First, they need to account for the real-world complexity in their designs; second, to simultaneously collect samples that are more representative of various (harder-to-reach) populations in said realities; and third, to reduce sampling bias and the related response biases in order to alleviate doubts about the robustness of their results and interpretations (Zack et al., 2019). In response to these challenges, we provide an updated overview of how the Internet allows for a) more complex survey and experimental study designs while b) simultaneously improving sample representativeness and c) ensuring higher data quality.
Designing and Recruiting for More Ambitious Surveys and Experiments in Organizational Online Research
While simpler (e.g., single-source, cross-sectional) research designs may suffice for some research questions (Joshi, 2016), studying theoretically important organizational problems often demands more complex designs. And whether study designs are simple or not, it is pivotal to obtain samples that reflect the populations under investigation (Kulik, 2011). Fortunately, the ubiquity of the Internet across various devices makes it feasible to conduct more complex research designs, which allow researchers to push these studies to target population and recruit participants based on desired qualifications and inclusive profiles, e.g., individuals with specific work habits, medical conditions, or cultural backgrounds (Hewson et al., 2003; Hewson, 2017). Collecting larger samples for more complex study designs from wider populations additionally allows researchers to mitigate power concerns that are prevalent in much of OP/OB research (Tenney et al., 2021) and may also help reducing non-representative bias (Chandler et al., 2014).
In what follows, we will discuss online study designs first for various forms of surveys and then for experiments. In each subsection, we illustrate examples that we identified via Google Scholar using the following keywords: “online survey,” “web-based survey,” “Internet-based survey,” “online experiment,” “web-based experiment,” “Internet-based experiment,” and “online data collection.” We filtered our query-based search based on two inclusion criteria: First, we only relay examples that recruited participants on the Internet AND collected their responses to test hypotheses using online surveys or experiments. Second, we focus on examples that investigated organizational topics.
Cross-Sectional, Single Source Online Surveys
Despite the many drawbacks of cross-sectional, single-source data, such simple online surveys still represent the majority of conducted organizational research (Spector, 2019). While not overcoming all methodological and conceptual concerns, there are ways to make such data collection more compelling for theory testing with more representative samples—namely by increasing the sample size and heterogeneity, or by targeting specific, niche samples.
Recruit for Large and Heterogenous Samples
Naturally, one big advantage of online surveys is their recruiting convenience, especially for cross-sectional, single-source study designs. Unlike traditional recruitment, commercial vendors or crowdsourcing platforms such as Qualtrics, Survey Monkey, Prolific Academic, StudyResponse, and Witmart give researchers access to large sample sizes from a global population of employees with relatively little cost and effort (Buhrmester et al., 2011; Callegaro et al., 2014; Porter et al., 2019), thereby significantly increasing their studies’ statistical power (Brock et al., 2012; Casler et al., 2013; Gosling et al., 2004; Gosling & Mason, 2015; Luce et al., 2007). These samples are usually quite heterogeneous, but can also be limited to highly specific groups (for a recent review of different panels, see Porter et al., 2019).
While getting access to senior executives or other highly paid professions is unlikely on any of these crowdsourced platforms, researchers can still quickly recruit participants who work in actual jobs and some managerial positions (Schyns et al., 2018; Scott et al., 2014; Taylor et al., 2019). Such individuals can answer questions regarding their jobs and the perceptions of upper echelon superiors. Moreover, the respondents may work in diverse jobs, including blue collar work, which is often inaccessible for regular paper-and-pencil research despite representing a large portion of the workforce. Indeed, some organizational theories may specifically speak about these working populations (Sessions et al., 2020) or develop theory about them (e.g., gig-workers, Ashford et al., 2018; Petriglieri et al., 2019) and thus need research access to them.
Self-selection may be an issue, of course. For instance, well-paid or very busy people are unlikely to volunteer for work on crowdsourced platforms (Baker et al., 2010). Likewise, while it is becoming less common, some people may still be offline, whether intentionally or due to circumstance (Gosling et al., 2004). Indeed, it needs to be recognized that the size of a sample does not protect against systematic selection bias (Wenzel & Van Quaquebeke, 2018). That said, there are a variety of online channels beyond crowdsourced panels (e.g., Facebook ads, Instagram ads, Google ads, YouTube ads) that still allow one to reach massive and more varied populations than any mono channel recruiting would. Notably, with the help of different URL-parameters attached to the study link, researchers can track participants’ entrance points to the study to account (or control) for the different targeted channels.
Employing all of the above strategies can not only significantly increase sample size (and thus statistical power), but also allow researchers to move beyond WEIRD populations (Henrich et al., 2010; Peer et al., 2017) and recruit from vocational and cultural populations that are otherwise harder to reach from within the ivory towers of a metropolitan university. In this way, scholars can improve the external validity and generalizability of their findings (Griffiths et al., 2014; Miner et al., 2012).
Recruit for Niche and Underrepresented Samples and Sensitive Topics
Conducting one’s surveys or experiments online not only allows for recruiting larger sample sizes, but also for identifying and targeting niche (e.g., employee/customer) populations and topics. Indeed, one can just think of groups such as humanitarian aid workers, activist groups, or independent remote workers - all of whom flock to their special interest online forums. As such, researchers can collect unique samples by posting adverts in online classifieds and forums (e.g., Craigslist, Backpage, Quora, Adoos, Reddit) (Antoun et al., 2015; Vogel et al., 2016) or specific pages/groups on social networking sites (e.g., Facebook, LinkedIn, Twitter) to recruit niche populations in different cultures (Autio et al., 2013; Sessions et al., 2020). Additionally, snowball sampling often works well when, for instance, attempting to capture niche, but highly networked samples (e.g., entrepreneurs, expats – or their spouses, or creative people) or communities of practice (e.g., cleaners, coders, or HR professionals) (Baltar & Brunet, 2012; Bhutta, 2012; Browne, 2005; Dusek et al., 2015). For example, Arafa and colleagues (2021) used an online snowball approach to recruit a single-source sample of 1,629 Egyptian government staff, particularly in the public health sector and across four governorates, in order to investigate COVID-19’s psychological impact. Additionally, researchers may look for and use publicly available contact lists. For example, for their study on intergroup leadership across expats and locals, Salem et al. (2018) used contact details by the United Nations and conferences on humaniatarian aid operations to identify 524 humanitaria aid field workers - a very niche population. Likewise, Ipsen and colleagues (2021) compiled the email contact information of participants from disability-related conferences, as well as relevant organizations, groups, and/or service providers, to send direct email requests with survey links. After additional screening, this approach resulted in a sample of 1,374 adults with disabilities in rural areas of the USA. Reaching such underrepresented or hard-to-reach populations helps researchers refine and extend organizational theories because they can test the applicability of extant theories in rarely explored contexts that are qualitatively different from the most common ones (Fisher & Aguinis, 2017).
Reaching such specific populations can likewise be achieved via specifications on the above-mentioned crowdsourced platforms. For example, DeCelles and colleagues (2020) invited a large number of MTurk users, who were then reduced to a specific panel of 300 social movement activists after participants answered two filtering questions. And for their study of how anger frames affect social movement mobilization, DeCelles and colleagues (2020) used the Qualtrics panel to recruit a sample of 94 New York finance industry professionals who supported the Occupy Wall Street movement. (Interestingly, before using Qualtrics, the researchers first approached many finance professionals in-person, but could only recruit nine people for their paper-and-pencil survey.) Meanwhile, Heretick and Learn (2020) recruited 180 adults through Prolific Academic for an online survey about bystander responses to coercive sexual harassment in interactions between professors and students. It is the sheer size of these crowdsourced platforms that allows scholars to filter out the sub-populations needed for theorizing.
A more targeted sampling approach is also well suited for studies on sensitive topics, which often report great challenges in recruiting research participants (Ramo & Prochaska, 2012). For instance, Brewer and colleagues (2020) turned to social media as a recruitment platform and found 242 registered nurses in seven days, using their responses to examine how organizations respond to instances of bullying in the context of hospitals. Another example is burned-out employees. While organizational scholarship does theorize about burned-out employees (Halbesleben & Buckley, 2004; Macik-Frey et al., 2007), the actual samples used in much organizational research are subclinical. This means that respective participants still partake in regular work life even if they are more stressed. Truly burned-out samples are missing, and thus, theories on employee burnout may be critically misspecified. Yet, with a bit of ambition, such people may again be found in certain forums or subscribe to special interest mailing lists.
Multi-Source and Multilevel Online Surveys
A multi-source survey is a research design where the ratings for one or multiple variables in the study are matched from different stakeholders (for instance, a focal employee with ratings from superiors, colleagues, or even clients/beneficiaries in the organization) (Magalhães et al., 2019). By a slightly different logic, multilevel surveys inquire individuals who are nested within higher levels or clusters (e.g., teams, organizations, and so on), with the aim of examining theoretical relationships between constructs across levels (Molina-Azorín et al., 2019). Collecting perspectives from multiple sources can triangulate perceptions; as such, they not only address concerns about bias in responding, but more importantly, can also lead to new theoretical endeavors (Schyns & Day, 2010; van Gils et al., 2010). Meanwhile, a multilevel structure allows one to examine how higher-order (aggregates) may exhibit dynamics that are independent from, or even interact with, the individual level (González-Romá & Hernández, 2017).
Unfortunately, both research designs remain less practiced (relative to simple cross-sectional studies) in the organizational research domain because of the various practical hurdles for their operationalization (Bell et al., 2010; Molina-Azorín et al., 2019; see also, Aguinis et al., 2013; Mathieu et al., 2012). Particularly for multilevel designs, a widely accepted rule of thumb suggests that at least 30 higher-level entities and at least 30 individuals in each higher-level entity are required to have appropriate statistical power (Hox, 2002; Scherbaum & Ferreter, 2009). And an even more challenging 50/20 rule applies for testing cross-level interaction effects, meaning that data should encompass at least 50 higher-level entities with 20 individuals per entity (Hox, 2002).
Yet, the proper use of the Internet’s opportunities can significantly reduce feasibility constraints, particularly concerning the size and matching of respondents. Consider, for instance, a study by Audenaert and colleagues (2019), who investigated employment relationships across job positions in a public organization with various autonomous agencies. The authors used two online surveys that were linked via an embedded code (see later section on “matching via URL-parameter”), but with a time-lag of three months to minimize common source bias. In this way, they could easily match the participants across both surveys and it allowed them to collect data on 936 employees within 82 job functions (i.e., each job included 2 to 39 employees) (Audenaert et al., 2019). And while it is generally more advantageous to know one’s population at the start of sampling (and have their details), these matching codes can also be auto generated within the study and attached to the URL—useful when applying a snowball sampling technique where the survey will be forwarded to other matched respondents. This makes the whole process not only very feasible, but also less prone to error because responses are automatically linked via code. For example, Taylor and colleagues (2019) examined abusive supervision by collecting a matched sample from 500 full-time employees and their supervisors. Likewise, Martin and colleagues (2016) recruited a multi-source (i.e., matched leader-follower) sample of 229 active-duty soldiers in the US Army after they asked their personal contacts at West Point U.S. Military Academy to forward their online survey to other soldiers in leader and non-leader positions. A similar approach was chosen by Korman and colleagues (2021), who targeted managers for a study on burnout. In their study, managers not only supplied their own responses, but also forwarded an automated personalized survey link to their significant others who then reported their perceptions about the manager’s health. Note that even language barriers are now surmountable given that survey wording can dynamically shift based on a language selection question (Delice et al., 2019). A case in point is a study by Lu and colleagues (2018), who collected multi-source, multi-wave data from Chinese employees and their expatriate managers in 48 teams across China (Lu et al., 2018). That said, any snowballling smapling for mulit-source data shoud be taken with a grain of salt as likely only others with whom the participants have a better relationship are approached pro (Marcus et al., 2017).
Longitudinal Online Surveys
A longitudinal survey refers to a data collection design that requires the same participants to offer their observations over a meaningful span of time, i.e., at least more than two repeated measures (Joshi, 2016). The theoretical implications of longitudinal data collection are that researchers can study the change in and role of temporal characteristics in constructs, based on findings that have better external validity and a better indication of causality (Wang et al., 2017).
Online surveys render longitudinal data collection more affordable than they used to be because they eliminate most of the costs related to logistics, preparation, and administration of the questionnaire (Lučić et al., 2018; Spitz et al., 2013). Indeed, online surveys allow one to recruit a sufficiently large sample at the outset, thus ensuring that the study retains enough respondents across multiple waves (Reips, 2013). This is a critical advantage because attrition is the main challenge in longitudinal surveys (Lučić et al., 2018). For example, Spurk and colleagues (2020) used an online longitudinal survey in three waves to collect data from 1,477 participants over nine months, finding diverse developmental relationships between career adaptability and proactive career behaviors (Spurk et al., 2020). To test for attrition bias, the authors reported dropout rates for career adaptability (i.e., 31.72% from T1 to T2 and 23.65% from T2 to T3) and proactive career behaviors (i.e., 28.71% from T1 to T2 and 23.93% from T2 to T3). They subsequently performed a multinomial logistic regression analysis, with variables at T1 as predictors of a self-constructed multi-categorical variable reflecting (non)participation across waves (1 = participated only at T1, 2 = participated only at T1 and T2, and 3 = participated at all waves). Their analysis showed that attrition bias was only a minor concern for their online longitudinal survey.
As longitudinal designs entail going beyond a one-shot participation, the perceived anonymity of online surveys becomes an additionally beneficial feature for respondents (Porter et al., 2019). Indeed, with offline methods generally seen as more infringing upon and jeopardizing of anonymity (Singh et al., 2009), online surveys are instrumental for studying sensitive topics (e.g., abusive supervision) and collecting data from people with unconventional sexual orientations, disabilities, or simply remote geographic locations. For example, Lindsey and colleagues (2017) examined why and for whom the ethnic representation of managers affects interpersonal mistreatment in the workplace. Notwithstanding the sensitivity of measuring ethnicity-related topics, they managed to recruit 330 employees (with the qualification of at least 30h/w) at T1 and still retained 284 participants at T2 (i.e., 75% retention rate) in the online survey (i.e., T1 measured ethnic representation and dissimilarity; T2, interpersonal mistreatment).
Online ESM Survey
The experience sampling method (ESM) is a variant of a longitudinal survey design. It captures the momentary behaviors, experiences, thoughts, and attitudes of a person in his or her natural everyday life at several designated time intervals, ranging from a few hours to a day (Niemann & Schenk, 2014; Riediger, 2009). Instant access to emails, web browsers, and user-friendly survey apps such as Expimetrics—all through the convenience of smartphones—has reduced the costs and simplified the logistics of ESM (Beal, 2015; Arnold & Rohn, 2019). Hence, online ESM surveys can now easily achieve larger sample sizes and greater geographic dispersions without causing excessive discomfort (Niemann & Schenk, 2014; Zirkel et al., 2015). And that data can often be matched with additional data coming from smart devices, such as movements, location, or even health markers (Wenzel & Van Quaquebeke, 2018) without the need for extra hardware that would be both costly and burdensome to respondents. Theoretically, online real-time reports are considered more accurate than memory-based reports and are therefore referred to as the “gold standard” for measuring constructs such as affect (Lucas et al., 2021; Schwarz, 2011). Therefore, the rich data obtained via online ESM surveys can open up new perspectives to the interplay among experiences, behaviors, events, and contextual characteristics within and between individuals. A case in point is the investigation of abusive supervision and self-blame by Troester and Van Quaquebeke (2020), who recruited an ESM sample of 275 participants via MTurk for ten working days, twice a day. With this data, the authors could theorize on the link between employees’ perception of abusive supervision, the cognitive interpretation of the same, subsequent emotional appraisal, and the resulting behavior response. Moreover, according to Fisher and To (2012), the study of “process” constructs (such as job satisfaction, goals, recovery, moods, emotions, and intrinsic motivation) can also benefit greatly from using online ESM data collection. For example, Kim and colleagues (2021) recruited full-time employees by posting study’s invitation and web link on several online social network and community websites (e.g., professional group pages in Facebook, volunteer sections in Craigslist, intranets) in the United States and South Korea. Their final U.S. sample consisted of 779 day-level data points in 10 workdays, i.e., compliance rate of 79%. Similarly, the final South Korean sample offered 1,024 day-level data points in 5 workdays, i.e., a compliance rate of 92%. The authors first emailed an online survey link at 8:30 a.m. (T1) to assess morning fatigue and sleep quality, and they followed up with a second online study link at 6 p.m. (T2) to measure daily microbreaks, work engagement, fatigue. This method allowed them to confirm when and how microbreaks serve as an effective energy management strategy in the workplace (Kim et al., 2021).
Online Experiments
The experimental design is the gold standard of organizational research methods, as it allows one to test for causal arguments (Antonakis et al., 2010). However, despite calls for the use of experimental designs (e.g., studies that triangulate survey findings with experiments to ensure internal validity), organizational research has largely relied on cross-sectional or longitudinal data (Antonakis et al., 2019; Thau et al., 2014). Some have argued that experiments are underused for several reasons: the potentially low statistical power of offline experimental designs, experimenter effects, the difficulty of enabling/scheduling actual interaction, and, particularly important for organizational researchers, non-random student samples (Reips, 2002; Reips & Krantz, 2010). All these issues can, however, be addressed by running experiments online.
Thanks to the Internet’s enormous recruiting capabilities, as already outlined above, online experiments can reach demographically diverse samples well beyond the scope of laboratory environments (Finley & Penningroth, 2015; Paolacci et al., 2010). For example, van Dijke and colleagues (2019) recruited an ESM sample of 338 full-time workers—reflecting regular working hours and a variety of companies—via a professional Dutch research agency, which collected five daily measures per participant. In doing so, the authors found that experimentally induced nostalgia in the treatment group increased intrinsic motivation throughout the working day, especially among participants who experienced low interactional justice. Moreover, because it is much easier to recruit larger samples online even for experiments (Finley & Penningroth, 2015; Jun et al., 2018), researchers can significantly increase the number of treatment groups without reducing statistical power and thus come closer to real-world complexities (Awad et al., 2018; Dimmery et al., 2019).
Another advantage of online experiments is the ability to run experiments around the clock with a high degree of standardization and even provide automated responses, which relieves researchers from dealing with scheduling difficulties and other operational time constraints (Reips, 2002). In that sense, online setups also limit experimenter effects due to the absence of any personal contact (Hewson, 2017).
For cases that require group interactions, online studies also allow researchers to randomly match participants, thus circumventing non-random scheduling (e.g., where friends come together to the lab). Real interactions can, for instance, be facilitated via platforms such as oTree or Lioness. If the paradigm requires more control, then scholars can preprogram dynamic fake interactions that render experimental interactions more realistic. To this end, implementing a waiting room among the actual participants can further increase realism (Reh et al., 2018). Researchers can also easily incorporate precise timings in stimulus displays, track trajectories of participants’ mouse movements (Kochari, 2019), and use graphics and animations as online experimental materials (Hewson et al., 2003; Horton et al., 2011).
Additionally, online experiments are especially suited for investigating sensitive topics (including those that may present ethical dilemmas) because particular populations are only more likely to respond to an intervention under the protection of anonymity (Reips, 2010; Reips & Krantz, 2010). Consider, for instance, research on abusive supervision (Schyns et al., 2018; Taylor et al., 2019; Troester & Van Quaquebeke, 2020) or workplace discrimination. Bailey and colleagues (2013), for instance, conducted an online field experiment to examine potential hiring discrimination based on sexual orientation. The authors reviewed over 4,600 resumes, which varied randomly with regard to gender and implied sexual orientation, finding no evidence for discrimination against gay men or lesbians by employers across four U.S. cities (Bailey et al., 2013). In a different domain, Gross and colleagues (2014) advertised their depression intervention study on Google AdWords and, by doing so, were able to recruit a multilingual sample of 26,194 participants with depressive symptoms, of whom 3,828 were English speakers, 7,477 were Spanish speakers, 5,395 were Chinese speakers, and 9,494 were Russian speakers, which allowed them to test the efficacy of their Mood Screener intervention across cultures.
Being More Ambitious about Data Quality in Online OP/OB Surveys and Experiments
Most research comparing the integrity of data from offline and online approaches suggests that there is little difference between them in terms of psychometric properties and respondents’ demographic characteristics (Dillman et al., 2014; Griffiths et al., 2014; Luce et al., 2007; Porter & Whitcomb, 2007; Reips, 2013), often producing comparable responses for behavioral research such as is conducted in the fields of OP/OB (Berinsky et al., 2012; Mason & Suri, 2012).
Nevertheless, some scholars have expressed concerns and skepticisms regarding online data quality. The main points of critique mostly pertain to a wide variety of response biases; that is, when participants respond inaccurately or falsely to questions. For example, respondents’ study involvement is usually rather low in online settings because of inattention (Fleming & Bowden, 2009): They sometimes answer on issues for which they have no real insight (Brühlmann et al., 2020), and some have even developed scripts or bots to automatically fill in surveys (Griffin et al., 2021; Prince et al., 2012). In fairness, these are reasonable concerns that researchers need to address (Brewer et al., 2018).
The core question—which we will answer below—is thus how the unique opportunities of online data collection can be leveraged for not only more ambitious study designs and more adequate samples, but also to ensure better data quality. To this end, we first studied several reviews of online surveys and experiments that highlighted a wide range of challenges that researchers face in conducting online data collection. While Stanton and Rogelberg (2001) and Thompson and Surface (2007) were the last to provide the methodological overview for collecting data online with the specific focus on organizational research, other studies reviewed a plethora of challenges and solutions related to the online data quality, e.g., response rate issues such as incentives, data quality issues such as careless responding, and online research ethics issues such as consent and debriefing (Aguinis et al., 2021; Benfield & Szlemko, 2006; Birnbaum, 2004; Fan & Yan, 2010; Gosling & Mason, 2015; Hewson et al., 2003; Kraut et al., 2004; Porter et al., 2019).
Building on the approach used in prior reviews (e.g., Stanton & Rogelberg, 2001), we used a query-based search strategy with a Boolean phrase (i.e., Ti: Web survey/experiment* OR online survey/experiment* OR internet survey/experiment*) AND (Ti: response rate* OR nonresponse* OR response quality* OR online research ethical issues*) to identify relevant papers. We subsequently conducted a review of the reference lists, which supplemented the initial literature search and resulted in a large number of papers (i.e., more than 300 research articles, book chapters, conference proceedings, and reports). Subsequently, we filtered our query-based search based on two criteria. First, to be eligible for inclusion in the review, we focused on the studies that either showed empirically validated findings or reported widely practiced strategies: for example, the papers that used experiments or similar techniques to evaluate the effectiveness of one or more strategies that improved response rate or the quality of online surveys and experiments. Second, we only considered studies if they offered a clear conclusion about a given strategy. For example, when we found generic solutions to a challenge (i.e., solutions that would equally apply to offline research) or when the underlying support for a given strategy was too vague or inconclusive to formulate a clear recommendation, we refrained from including them in the literature review and best recommendations.
In total, the search procedure resulted in a sample of 151 papers. These were published between 2002 and 2021 in 75 journals that spanned from method journals to content journals in OP/OB, as well as the fields of business, management, social and natural sciences (see Appendix 1 for the list of research method and OP/OB journals). Moreover, we identified and reviewed 20 book chapters and 8 conference proceedings. Taken together, we qualitatively synthesized the findings into three broader areas of improvement that researchers must consider when designing their online surveys and experiments in order to ensure higher online data quality.
Improve Response Rate …
The first area of improvement pertains to taking several precautions to boost response rate in online surveys and experiments. The response rate is a key indicator of data quality (Ball, 2019; Coşkun & Dirlik, 2022) and is defined as “the ratio of people who have accessed a study divided by the total number of people solicited to take part in this study” (Göritz, 2005, p. 2). Notably, a good response rate is key to being able to interpret the obtained results. It ensures that a sound recruitment strategy will also translate into sound results that are less riddled by various forms of self-selection bias that put serious bounds on the interpretations. Such efforts to improve response rate are ideally combined with efforts to simultaneously reduce dropout rate, which refers to the number of respondents who opened the online survey or experiment but did not finish it (Nayak & Narayan, 2019).
Naturally, some of the strategies that have long been used in offline studies have also been shown to work in online studies. For example, researchers generally need to frame the study title, introduction, and questions in a way that is appealing to the audience, but also does not simultaneously create a demand-effect (Fan & Yan, 2010; Shapiro-Luft & Cappella, 2013; Shropshire et al., 2009). Moreover, one can garner a higher response rate by providing academic contact information and including institutional logos in the email invitation and the online study’s first page (Göritz & Luthe, 2013; Walston et al., 2006). Unsurprisingly, long online questionnaires (i.e., 30–45 min) generally deter people from partaking in a study – this is also true online. Further, study length negatively affects completion rate (Deutskens et al., 2004), with longer studies generating more “don’t know” answers (Galesic & Bosnjak, 2009) and shorter answers to open-ended questions (Krosnick et al., 2002). So, researchers are well advised to keep their studies reasonably short and then provide assurances about the study length in their invitations (i.e., max 10–20 min). At the same time, they may want to refrain from divulging a longer length to avoid negatively affecting dropouts and completion rates (Marcus et al., 2007).
Beyond such general themes, other issues are even more pronounced online than offline. The following strategies are specifically applicable to improving response rates and mitigating dropout in organizational online surveys and experiments involving otherwise serious participants.
By Optimizing Invitations via SMS and Email
Generally, researchers have to find ways for their study invitations to actually reach the participants. While we mentioned some general recruitment strategies above (i.e., via platforms, Google AdWords, etc.), we also want to highlight that targeted studies—those directed at specific respondents by way of their email—require some consideration.
Here, research has shown, for instance, that SMS pre-notifications to inform respondents of an upcoming online study increase study response (Sammut et al., 2021). Such pre-notifications also help to inform participants that they may have to double check their spam folders because the study invitations were flagged as mass email by spam-blocking tools (de Bruijne & Wijnant, 2014; Mavletova & Couper, 2014). Alternatively, the inviting email address may need to be white-listed via the IT department of the participating organization, or invitation emails need to be sent in smaller batches across longer periods of time.
By Using Online Reminders
Following the initial invitation, researchers should send additional reminders in a timely manner (Lewis & Hess, 2017). The most optimal timing seems two days after the initial invitation (Van Mol, 2017; Bosnjak et al., 2008; Fan & Yan, 2010). Moreover, if the initial invitation was personalized, then online data collection suites have an option to only remind those that have not yet completed the study, thus reducing annoyance for those who already did. A variant of this could be that an additional calendar-file (e.g., .ics) is offered for download so that, in particular for ESM studies, participants are always reminded via the calendar of their own device.
By Personalizing Study (Invitations)
It also helps the response rate if researchers personally address respondents in the invitation letter (Sammut et al., 2021), even more so when one can highlight a prior relationship or involve some well-known employees from the same organization (Joinson & Reips, 2007; Sánchez-Fernández et al., 2012) – at least when anonymity is not a bigger issue. Here, online studies offer much more flexibility and speed to personalize invitations because emails can be sent using mail merge functions (e.g., Outlook) that employ an underlying database (e.g., .csv, Salesforce) or by feeding the information into the study platform, most of which then act as the mailer (Sauermann & Roach, 2013). That very information can subsequently be used within the study to directly address the respondent (e.g., “Thank you, Claire, you have now finished mor than 50% of the study”) to ensure continued personal involvement.
By Using Forced Answering at the Start or with Non-response Options
Forced answering is an effective high-hurdle technique (Faran & Zanbar, 2019), whereby respondents with no interest—who have a subsequent negative impact on data quality—are encouraged to drop out at the start of the online study. This means respondents can only proceed to the next question when having responded to all or some of the prior items (de Leeuw et al., 2015; Questback, 2013). Note that when researchers believe that respondents can have a real non-answer or prefer to avoid personally sensitive questions, they should still use forced answering but provide non-response options (e.g., “prefer not to answer”, “no opinion”, “don’t know”) to minimize both missing responses and potential dropouts (Valentijn et al., 2015).
By Using Simple Language
People exhibit an even lower tolerance for complicated texts when online versus offline. Thus, offering writing instructions and response options in simple language to minimize non-response rate is even more important in online studies (Christian et al., 2007; Smyth et al., 2009; Toepoel & Couper, 2011).
By Using Multi Device (Responsive) Layouts
Surveys and experiments should generally be designed for the users of various devices, e.g., smartphones, tablets, etc., with a so-called “responsive layout” (de Leeuw & Toepoel, 2018). Such device representation ensures that more people can access the study (e.g., many populations in Africa or Asia have better access to a smartphone than a workstation computer; Toepoel & Lugtig, 2015) and view the material in a sensible format. Having to scroll too much might spur frustration that leads people to drop out, especially when having to scroll horizontally (de Leeuw et al., 2018).
By Avoiding Offering Final Study Results and Instead Give Personalize Feedback
Sharing the final study results with respondents has no effect on completion rate for online studies and, oddly, seems to lower the response rate (Recklitis et al., 2009; Göritz, 2010). Thus, researchers should not try to incentivize respondents with general study results and other nonmaterial incentives (Kalleitner et al., 2020).
That said, when topic salience is high, personalized feedback at the end of the study can increase response and completion rate (Marcus et al., 2007; Ye et al., 2017; Kühne & Kroh, 2018). Fortunately, online studies can provide this feedback automatically. Indeed, any answer that lends itself to computational functions (e.g., sum, mean, max) can be used to derive a new value within the study. For instance, typical Likert-scale values may be aggregated into scores for feedback (e.g., personality type, number of correct answers, or benchmark comparisons). Moreover, computed values (e.g., “TransformativeLeadership = MEAN(A,B,C)”) in combination with Display Logics (e.g., “IF TransformativeLeadership > 5”) can serve to provide respondents with personalized, albeit pre-formulated feedback (e.g., “Your leadership style is …”).
By Choosing Unconditional Prepaid Personal Cash Incentives over Other Forms of Rewards
Material (monetary) incentives increase response rates by 19% and completion rates by 27% compared to offering no material incentive at all (Göritz, 2010). Notably, donation incentives—which promise to make a monetary donation to a cause in return for participation—have no significant effect on either (Knowles & Stahlmann-Brown, 2021; Göritz, 2005) and, if used alone, can actually lower the response rate (Göritz & Neumann, 2016; Pedersen & Nielsen, 2014). Hence, researchers are advised to go for personal incentives.
When doing so, researchers need to decide whether they will offer material incentives a) unconditionally prior to taking the study, b) conditionally only upon completion, or c) based upon attention performance. Cash incentives provide the strongest direct and positive effect on the response rate and completion rate (Becker et al., 2019; Spreen et al., 2020). Some research suggest that these should be prepaid/unconditional, meaning that all potential respondents receive the cash incentive regardless of actual completion (Becker & Glauser, 2018). This is supposedly because “beyond their monetary value, they are perceived as unconditional tokens of appreciation, which increases trust” (Veen et al., 2016, p. 3; see also Meuleman et al., 2018). Normally, delivering prepaid cash is not a major problem for online organizational studies, but researchers can also hand out electronic gift certificates (e.g., Amazon gift certificates) via emails when a sampling with mailing addresses is not feasible (Kraut et al., 2004).
By Using Lotteries Only for Specific Populations
Another common form of incentivizing is to offer lotteries (also known as prize draws) to respondents after they complete the online study (Coryn et al., 2020; Göritz & Luthe, 2013). These incentives are frequently used because they are easy to implement for a few winners, and costs can be considerably lowered when recruiting a larger sample of respondents. However, their effect on response and completion is generally small (Becker et al., 2019; Göritz, 2010; Sánchez-Fernández et al., 2010). Oddly enough, when respondents are female, have a lower income, lack topic saliency, or prior survey experience, then lottery incentives (e.g., ten 25-euro or five 20-euro gift certificates) seem to improve response and retention rates in online longitudinal studies (Heerwegh, 2006; Sammut et al., 2021; Su et al., 2008).
Improve Response Quality …
Response quality refers to whether the study responses are a good representation of what participants felt or perceived in the moment of query regarding the subject matter at hand (Cornesse & Blom, 2020). This includes, among others, not cheating or being careless about responding, inputting data in the required format, and adequately responding to the provided stimulus or target referent. In the following, we list the various measures that researchers can implement to ensure higher response quality.
By Detecting Cheating and Careless Responding
Respondents do not always pay attention to instructions and the content of response options (Ward & Meade, 2018). Experience shows that such careless responding behavior is fairly common in samples from online crowdsourcing platforms (Chmielewski & Kucker, 2020; Brühlmann et al., 2020), which can undermine the quality of online data (Jones et al., 2015). Additionally, cheating in online surveys and experiments may occur, thereby introducing a systematic response bias. For example, incentives may motivate respondents to participate multiple times, click as quickly as possible through a study, or encourage them to uncover the “right” responses from professional panel participant communities on the Internet. Thus, it is important to take actions that mitigate such threats to data quality.
In this vein, we advise that researchers code careless responses in the database without taking explicit action against the respective careless respondents. Doing so allows for a later analysis between those who were inattentive vs. those who were not, which could potentially reveal, e.g., a systematic link between careless response behavior and respondent demographics or experimental conditions (Newman et al., 2021). At the same time, it also avoids the risk that reminders evoke false response behavior (Ward & Meade, 2018). As such, researchers should not immediately correct invalid responses because this would give respondents the confidence that any critical plausibility breaches or omissions will be flagged with a chance for correction. Note that email verifications and the like are still recommended.
Another route to take would be personalized studies, i.e. where participants receive a unique login that is mapped to their email address (Sammut et al., 2021). Panel providers such as MTurk provide a variant of this, giving panelists their own unique IDs that are linked to the incentive payout (although it cannot be ruled out that people are registered with multiple emails within the panel). Should researchers opt for this approach, our recommendation is to use automatic login to avoid the errors and hassle that accompany manual logins. An additional reason to employ such recruiting is that a researcher may have a database, for instance, with performance data that needs to be matched onto responses, but preferably in a depersonalized way. An automatic login is a very convenient method because the personalization is not immediately obvious (and can also be hidden from the researcher and not written into the final dataset) when the individual login credentials (or token) are embedded directly in the unique URL parameters (Nestler et al., 2015; Thompson & Surface, 2007).
By Providing Immediate Answer Validation
Some items require a certain format that participants may not always adhere to when quickly answering the items. A typical validation relates to the recording of dates: The provision of a defined answer format by means of a smaller box for the month and a larger one for the year, as well as placing symbolic instructions (e.g., MM/YYYY) within the answer boxes, elicits more consistent responses (e.g., years in four digits) (Christian et al., 2007). Online, researchers can also design their studies to immediately validate responses against a specified format (Conrad et al., 2017). For instance, they can immediately check if inserted values represent integers or letters, if certain symbols are present (e.g., checking the @-sign for an email), and if a minimum (or maximum) threshold of character count is met (or exceeded). Additionally, researchers can use more advanced tools such as RegExr (i.e., a logical syntax that can be plugged into webforms) to validate input against predefined restrictions. Implementing such measures not only assists participants in quickly identifying the right input format, but it also helps to reduce errors in later analyses when certain values are not recognized or considered missing by the software.
By Making the Interaction Feel More Real and Personal
Dynamic pre-filling can infuse the study with more realistic-seeming interactions and increase data quality in terms of time and accuracy (Dolnicar et al., 2013). Any available information (e.g., previous answer, computed value, information from a database, such as from a different respondent) can serve as a dynamic insertion to a given item or question, and thus help to personalize or manipulate the study experience for the respondent. Researchers should explore the functionality of their chosen software platform to use pre-filling in several ways.
Moreover, in interactive online studies such as repeated public goods experiments, which involve behaviors like cooperation and punishment (Arechar et al., 2018), respondents can actually chat with each other, with a research confederate, or even with a well-programmed chat bot that can take over various conversational tasks at scale using software platforms such as SMARTRIQS (Kim et al., 2019; Molnar, 2019). If a true chat functionality cannot be programmed as an applet within one web page, then a chat can simulated through dispersed responses over multiple pages. Whatever participants enter into a text-field will then be displayed on the next page, thereby lending credibility that whatever they read from the other (bogus) respondents on the next page is also real. Obviously, if such a design is implemented, researchers should consider a forced delay between the two pages so that responses by the other (bogus) respondents seem more realistic (e.g., Reh et al., 2018).
By Minimizing Matching Error via URL-parameter
Survey designs that require data matching, such as multi-source or longitudinal designs, are especially prone to errors in the matching process. Online studies can take some of the headache out of this by linking studies via URL-parameters.
URL parameters are values added after the base URL. They start with a “?” followed by the name of the parameter, an equal sign, and the parameter’s value, e.g., ?department = 23 (all subsequent parameters can be added with a leading ampersand “&”). Any information that is known before a URL is sent out can be used as a URL parameter (for more information, see Litman et al., 2017). For example, “https://www.mysurvey.com?department=23&level=3” would pass on the parameter “department” with the value “23” and “level” with the value “3”. Via the study’s landing page, the parameter values can then be retrieved and written into the database, and later inserted into a placeholder within the study. Furthermore, if the respective information was retrieved, an item within the study could then say “dear [firstname], what impact did your leader, [bossname], have on the development of the relationship with your client, [clientname].”
Using URL parameters offers helpful capabilities beyond the personalization described above. For instance, to match responses across longitudinal study waves, researchers can generate a unique identifier code that is passed as a URL parameter along with each invite link. Likewise, such parameters can easily be used for multi-source snowball samples. For example, to match leaders and followers without error, leaders may be targeted as focal respondents (via whatever recruiting technique) and encouraged to send a corresponding link to at least three of their subordinates or significant other (e.g., Korman et al., 2021). The link to this follower-survey can comprise a referral token such as the response ID that was generated within the leader-survey, indicating which leader the link came from (e.g., https://www.mysurvey.com?leader=xyz).
By Lowering Response Frustration with Display Logics
Researchers should always aim to reduce burden by avoiding irrelevant requests. Thus, they should design online studies to dynamically display sets of questions based on certain criteria, such as a respondent profile, an intended manipulation, or prior responses that specify distinct survey routes (Molnar, 2019; Nayak & Narayan, 2019). For instance, questions about staffing numbers should only be shown to respondents with reliable organizational knowledge (e.g., work in HR), while items about supervising teams should be hidden from frontline staff.
Additionally, a planned missing data design can reduce participant burden and thus result in higher validity and reduced rates of unplanned missing data. For instance, a three-form design facilitates data for 33% more survey questions than can be answered by any one respondent, or reduces completion time by the same fraction (Graham, 2012). In such a design, the items are divided into four sets: X, A, B, and C. Set X is centrally administered to everyone, and each respondent will additionally answer two out of the remaining three item sets AB, CA, or BC. Note that the implementation of such a design requires a carefully crafted item logic and distribution so that responding feels natural (Jia et al., 2014; Jorgensen et al., 2014). Such planned missing data design results in data that are characterized by missing completely at random (MCAR), which is subsequently handled by using contemporary imputation methods (e.g., FIML for SEM in Mplus; Flatau Harrison et al., 2018; Lüdtke et al., 2017).
By Using Pictures, Audios, and Videos Only Very Selectively
Online surveys and experiments offer unique opportunities to use multimedia features such as pictures, audio, and video for garnering interesting data. However, if designed carelessly, multimedia features negatively affect the quality of data collection by introducing response bias (Chen & Tseng, 2017; Mahon-Haft & Dillman, 2010). As a result, we recommend the below strategies as a check against the careless use of multimedia features.
By refraining from using pictures for mere aesthetic reasons. Contrary to common belief, the inclusion of pictures does not help with keeping respondents interested and reducing dropout (Barbulescu & Cernat, 2012). Moreover, researchers should be aware of the systematic influence of pictures on respondents’ behavior: Because pictures prime specific memories, they can affect how respondents construct their answers based on the retrieval of those memories (Couper et al., 2007; Trübner, 2020). For these reasons, researchers should refrain from using pictures. However, the option to provide pictures either as a stimulus or as a response can be a rich source of data. For example, a study design could require participants to produce and transmit images of their work environment or office desk (e.g., online ESM surveys, Arnold & Rohn, 2019), take part in picture-rating exercises (e.g., Wilson et al., 2018), test a construct via pictures (e.g., Hou et al., 2020), or reflect on a topic specifically revolving around pictures (e.g., Rennung et al., 2016).
By using audio and video only with bolded instructions and selectively. When audio is included in online studies, researchers should highlight instructions and symbols to nudge respondents into turning on their audio to a sufficient volume level (Beier & Schulz, 2015). Given the concerns about sound presentation—ranging from background noise to poor speaker quality—an effective way to increase control over sound delivery online is to ask participants to wear headphones (Woods et al., 2017). Audio recordings are, for instance, useful for examining the impact of vocal characteristics (e.g., gender, tone, etc.) on respondents’ behavior (Couper, 2007)—an aspect that is often overlooked, for example, in leadership research. Case in point: Lavan and colleagues (2019) used audio recordings in their online experiments to investigate how familiar and unfamiliar voices affect identification perception. Likewise, it is imaginable that audio recordings can be used on the respondent side. For instance, in cases where respondents are not comfortable writing (potentially because of a high analphabetism ratio in the sample), then app-based studies or a kiosk-type tablet (with symbolic instructions) may use voice recordings to nevertheless generate rich insights (Revilla et al., 2020).
With respect to videos, issues such as slow Internet speed, data caps, computer configuration, or workplace policies may prevent or demotivate respondents from watching video clips completely and thus cause nonresponse bias (Dixon & Tucker, 2010). Therefore, researchers are generally advised to refrain from using videos to minimize indiscernible distortions in the sample (Shapiro-Luft & Cappella, 2013). Yet, if videos are considered essential for the online data collection, forced watching should be activated so that respondents cannot proceed to subsequent screens unless the entire video is watched. For example, video data collection may be necessary in online experiments that involve preferential looking tasks, whereby two stimuli are presented side-by-side and the experimenter usually records fixation durations to each stimulus (Ismail et al., 2021; Semmelmann et al., 2017). A more recent development includes the option of checking body language, head movements, and eye gazes via the built-in webcams on participants’ computers (Aghajanzadeh et al., 2020). With the popularity of HTML5 technology, Internet browsers now support the activation of webcams without the need to install additional webcam plug-ins.
By Using JavaScript (or Software Solutions) to Tackle Experimental Challenges of Similarity in Stimulus and Assessment of Time
Online studies pose challenges for particular types of experiments such as reaction-time and positioning designs (for an overview, see Kohavi et al., 2020; Woods et al., 2015). For example, screen positions within a browser can be hard to interpret as researchers cannot usually know whether a browser window has been rescaled and to what size. Likewise, reaction times cannot be reliably assessed across digital devices by normal means (Sauter et al., 2020). To ensure accuracy and test-retest reliability in presentation and response recording, researchers should use specialty scripts that require a working knowledge of programming languages such as JavaScript (Anwyl-Irvine et al., 2020; Garaizar & Reips, 2019). However, if researchers are not able to program and code, they should turn to dedicated online research software providers. Currently available software solutions include Gorrilla.sc (Anwyl-Irvine et al., 2020), jsPsych (de Leeuw, 2015), Millisecond (i.e., its product Inquisit or LIONESS; Giamattei et al., 2020), Psychstudio, ScriptingRT (Schubert et al., 2013), Tatool (von Bastian et al., 2013), QRTEngine, WebExp (Keller et al., 2009), and SMARTRIQS (Molnar, 2019), which have all been developed around offering precise timing and a full interface for task design and experiment administration, with no to minimal programming skills.
Respect the Participant…
Naturally, an integral part of doing research is to minimize any negative effects on people and organizations. Doing research online entails taking participants’ digital security, privacy, and dignity seriously so that they feel respected and safe to participate truthfully in the study, as well as remain motivated to help researchers out on future studies (James & Busher, 2015). Generally, tools such as the Data Ethics Canvas—which help researchers identify and manage respective issues (Data Ethics Canvas, 2021)—can be helpful support resources to this end. More specifically, we recommend that scholars consider the following strategies.
By Following the Established Legal Framework
Needless to say, researchers must adhere to the legal frameworks of the countries in which their research is being undertaken to ensure privacy beyond data security. This is especially critical for online research, which is often undertaken across different countries. As it stands, the current EU general data protection (GDPR) regulation is one of the strictest in the world and offers guidance as to what is expected (more information can be obtained here: https://www.eugdpr.org). It is too comprehensive to cover here, but adhering to it is imperative for doing research with organizations and individuals within Europe, and it represents a useful guide about what other jurisdictions may mandate in the future.
By Securing Data Transmission and Storage
Researchers should always ensure that research data transmission and storage involve several layers of security. First, encrypt data during transmission by ensuring that the survey system uses a secure transmission protocol (as indicated by “https://”) (Benfield & Szlemko, 2006; McInroy, 2016). Second, protect access to the research database, related file directories, and other storage entities. For instance, using longer (but easy to remember) passphrases alongside 2-factor authentication is superior to using an 8-character password (Keith et al., 2007). Third, manage file-privileges (e.g., open, edit, share) differentially across collaborators (e.g., study lead, co-authors, research assistants) so they only have the absolute necessary level of control (Denissen et al., 2010; Reips, 2002). Naturally, all data that are exported from the database into a datasheet on a local hard drive should be protected at all times (e.g., via password encryption).
As part of the above, researchers should take necessary precautions so that the server is secure against hacking, short-circuit fault, or theft (Barchard & Williams, 2008; Wilt et al., 2012). This includes, among other things, ensuring that the server is self-maintained, frequently installing security updates, doing remote backups (preferably on other media or servers in different locations), and securing the building where the server is stored. Luckily, most of these issues are handled by the major software providers themselves (Callegaro et al., 2015).
By Minimizing Privacy Risks
Although online studies can offer respondents anonymity and confidentiality, the threat of respondent identifiability remains a serious ethical concern (Roberts & Allen, 2015; Whelan, 2007). For example, combinations of variables (e.g., sex, age, and ethnicity) can expose individual respondents’ identity relatively easily if the collected data comes from employees of a single company and/or contains identifying information such as their IP address (Barchard & Williams, 2008). Researchers should mitigate these immediate risks by capturing identifying information via a separate survey or web page that is technically separate from the main study. The overt identifiable information and the main study response can then be linked through a meaningless but unique identifier that connects the two data sets, if warranted (e.g., when required for statistical analysis of controls) (e.g., Leung & Unal, 2013). In addition, researchers may want to obfuscate the data by only allowing certain researchers to have access to, for instance, the meaning behind otherwise meaningless variable names and values. Benfield and Szlemko (2006) also suggest that the IP addresses should either be deleted early in data analysis or not recorded at all if possible. Ultimately, researchers who want to record IP addresses should explicitly mention so in the consent request (Granello & Wheaton, 2004; Saari & Scherbaum, 2020).
By Enabling Informed Consent Opportunity
Obtaining informed consent and doing debriefings can be a challenge online. Yet, similar to issues of privacy and data security, we recommend that researchers take these steps seriously, particularly before engaging participants in online experiments.
Researchers should prepare a consent request in understandable language that clearly asks respondents for their digital permission upfront (e.g., using animated videos when soliciting informed consent; McInroy, 2017). In this light, researchers should brief respondents upfront about a) the content of the research, b) the identity and contact options of the researchers, c) the extent of confidentiality and anonymity of the responses, d) the average length of time to complete the study, and e) the risks involved in participating, such as being asked to disclose uncomfortable or potentially embarrassing information (Kunz et al., 2020; Sue & Ritter, 2012). Given that people are increasingly accustomed to using computer software, online accounts (e.g., Facebook, Gmail), and some forms of online payment without signatures, it has become customary for consent to be given via a button that indicates “I consent” after reading the introductory information (Rowbotham et al., 2013).
By Allowing a “Quit the Study” Option
Some also argue that researchers should equip each screen with a “Quit the study” button that takes participants to a page with researchers’ contact information and other debriefing information, so that respondents who drop out of the study still receive the debriefing (Barchard & Williams, 2008; Gupta, 2017). Alternatively, researchers should program a window with the debriefing information that will pop up when respondents close the study’s browser window. We want to caution, however, that the latter may not always work because of the pop-up blockers that exist in modern browsers.
By Debriefing Respondents
Debriefing is an essential research ethics procedure wherein respondents are informed about their participation in research and provided with controls over their data privacy (Wang & Kitsis, 2013). Yet it remains a major ethical concern in online studies, especially as some respondents drop out prematurely (Benfield & Szlemko, 2006; Hilbig & Thielmann, 2021; Pittenger, 2003; Vitak et al., 2016). The most optimal strategy is that debriefing information should be given via a link and not included in the body of an email (Benbunan-Fich, 2017; McCambridge et al., 2012). This increases the likelihood that recipients will actually read the debriefing information. Technically, researchers should program a debriefing so that respondents are emailed automatically by way of a trigger that executes as soon as the online study’s window is closed, assuming that participants were asked to provide a valid email address (e.g., Zong et al., 2018). Otherwise, a text at the end of the survey or experiment must suffice.
Limitation of the Covered Literature
We must make one limiting remark regarding the presented synthesis. Conducting studies online is naturally interwoven with the state and adoption of technology within the sampled population. The last decade alone has seen a tremendous expansion of Internet technology and connected devices. Some studies on screen resolutions (Callegaro, 2010; Erens et al., 2019), for instance, might already be outdated despite only being published a few years ago. The possibilities of tomorrow are already on our doorstep: Early studies are, for example, experimenting with online remote eye-tracking, facial expressions, and heart rate monitoring (Chandler & Shapiro, 2016; Goodman & Paolacci, 2017). It may only be a few years before we see a revival of the classic interview format, led not by humans, but by artificial intelligence (AI)-enabled assistants such as Alexa, Siri, Google Assistant, or Cortana in virtual reality (VR). Thus, the present paper can and should only be considered a snapshot in time.
Conclusion
As noted at the onset of this paper, much research in the fields of OP and OB takes the Internet for granted – often more or less duplicating offline research strategies – without a clear understanding of the unique opportunities and challenges for conducting surveys and experiments online. In the present paper, we showed that the development and spread of the Internet across the world and into organizations has increased the feasibility of conducting and recruiting for more ambitious research designs. That said, certain data quality challenges remain that should be actively tackled. To this end, we presented some guidelines that are to assist researchers, reviewers, and editors.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Author biographies
Appendix 1
Journals of Articles Used in Literature Search for Best Practices.
