Abstract
The authors causally identify the effects of intense survey participation on key labor market outcomes by randomly excluding individuals willing to sign up for a high-intensity survey with a focus on job search and well-being. Using administrative data, they find that, on average, survey participation had no effect on labor market outcomes during the year after signing up. They also demonstrate that an alternative selection-on-observables approach would yield misleading results. These findings underscore the value of experiments in examining effects of survey participation.
In the 1920s and the 1930s, the US National Research Council conducted several experiments on workplace productivity at the Hawthorne plant of Western Electric (see, e.g., Levitt and List 2011). It is handed down that, to the surprise of the researchers, productivity changes were observed not only in the experimental group whose working conditions had been altered but also in the control group of workers whose conditions remained unchanged. This finding seemed to reveal that the mere awareness of being observed can lead to changes in behavior, a phenomenon later termed the “Hawthorne effect.” 1 Such participant reactivity effects are a threat to the internal validity of study results: The information gathered is biased by the fact that the participants were surveyed.
A related threat can arise in panel surveys, which are of paramount importance for investigating dynamic processes and estimating causal effects. Over time, repeated participation in surveys can result in changes of either actual behavior or reporting behavior (Chadi 2013; Bach 2021; Eckman and Bach 2021; Cernat and Keusch 2022). In both cases, survey participation has an impact on observed outcomes during later waves of a panel study, a phenomenon called “panel conditioning.” By means of a field experiment, we study Hawthorne effects and thus changes-in-behavior panel conditioning in job seekers who took part in a high-intensity survey.
Our contribution to the literature is at least threefold: First, we answer the question of whether participation in an intense monthly app-based survey affects subsequent labor market outcomes of job seekers, such as employment transitions (e.g., terminating a job, taking up a new job) and job quality (e.g., earnings). Much of the existing literature is related to the areas of voting, retirement savings, and health. It often confirms that participation indeed has a (context-dependent) impact on behavior (for overviews see, e.g., Bach 2021; Cernat and Keusch 2022).
Second, we assess the need for an experimental design to identify the real-life effects of survey participation. One issue to consider here is that generating a control group of randomly excluded people who are actually willing to participate in the survey prolongs the recruitment phase and requires additional resources. We therefore conduct a separate analysis employing a lower cost “selection-on-observables” approach instead. The additional analysis compares the treatment group to individuals who were invited to take part in the survey but did not respond to the invitation, the so-called no-sign-up group. We check if controlling for a vast range of observable characteristics allows us to identify the same effects of study participation we find in the experimental data. While being less costly than the experiment, the selection-on-observables approach comes at the risk of confounding from unobserved differences between participants and non-participants.
Third, our analysis focuses on participation in a high-intensity panel survey involving detailed monthly measurements. Existing research on reactivity effects often stems from surveys that are relatively non-invasive. More demanding surveys, such as those requiring frequent, detailed measurements, are more likely to suffer from reactivity and other data quality issues (Gochmann, Ohly, and Kotte 2022; Eisele et al. 2023). By studying this type of survey, we contribute to a better understanding of reactivity effects in more demanding contexts. While yearly panel surveys are an indispensable source of information for economics research, higher-than-yearly frequency surveys become popular, too. Aside from the German Job Search Panel we study here, the Dutch Longitudinal Internet Studies for the Social Sciences effectively compartmentalizes its extensive yearly core study questions into shorter monthly modules, alongside additional custom surveys from various researchers (for an example using the data, see Achard et al. 2025). Some of the other traditional yearly household panel surveys launched higher frequency spin-off surveys during the COVID-19 pandemic, such as the Understanding Society COVID-19 project with up to nine individual waves in 2020 and 2021 (e.g., Chaudhuri and Howley 2022).
Theoretical Considerations and Previous Findings
Job search is an important process affecting a person’s future income, job quality, and work–life balance, among other things. Ending a period of insecure employment or unemployment is also key to improving health and well-being (Clark, Diener, Georgellis, and Lucas 2008; Cygan-Rehm, Kuehnle, and Oberfichtner 2017; Reichert and Tauchmann 2017; Lawes et al. 2022). This means researchers would have to consider profound ethical issues beyond internal study validity if study participation was found to interfere with job search behavior as a form of a Hawthorne effect. For instance, feeling closely monitored could make job seekers accept job offers too quickly, resulting in bad job quality. Moreover, job seekers may try harder to align their behavior with the social norm to work (e.g., Stutzer and Lalive 2004; Günther, Conradi, and Hetschko 2025) when participating in a survey about job search makes their non-compliance with this norm more salient (Halpern-Manners, Warren, and Torche 2017). Overall, these are good reasons to expect survey participation to increase the probability of being employed in our context.
By comparison, study participation as a contribution to the greater good could be seen as a way of compensating for a lack of job search effort (Groves, Cialdini, and Couper 1992; Misra, Stokols, and Marino 2012). When survey participation becomes rather time-consuming, it may reduce the time spent on job search, similar to the lock-in effect of program participation (e.g., Sianesi 2008). As a result, the individual’s probability of being unemployed could increase and, hence, their dependency on public income support. In any case, it seems straightforward to assume that the intensity of study participation may amplify survey participation effects. In a longitudinal study, high participation intensity can originate from the frequency of measurements (e.g., monthly versus yearly) and the number of items that are to be answered at each measurement (“survey wave”).
Studying the effect of survey participation on real-life outcomes involves at least two key challenges: finding an adequate control group and measuring outcomes of interest independently from survey-related issues such as changes in reporting biases or panel attrition. When addressing the first challenge, the gold standard is to randomly assign study participants to a control group surveyed only once or not at all (e.g., Warren and Halpern-Manners 2012; Axinn, Jennings, and Couper 2015). Other approaches are to compare answers of longer-term panel participants with those of panel refreshers (Van Landeghem 2014), or to use instrumental variables for study participation (Bach and Eckman 2019).
Regarding panel studies, actual Hawthorne effects (i.e., changes in behavior) need to be disentangled from changes in reporting behavior, time trends, and other error sources such as interviewer effects (e.g., Das, Toepoel, and van Soest 2011; Bach 2021). Thus, to resolve the second challenge, matching survey data with administrative records is considered the gold standard for identifying such outcomes, as administrative data are usually reported independently from the survey in question. Alternatively, digital trace data may be used to investigate the impact of survey participation, however they come with substantial measurement challenges of their own, such as non-response bias given that the willingness to share digital trace data is non-random (Bähr et al. 2022; Cernat and Keusch 2022; Cernat, Keusch, Bach, and Pankowska 2025).
We combine both gold standards to investigate if participation in a high-intensity panel survey affects the labor market outcomes of participants: First, we randomly assigned part of the individuals willing to participate in the German Job Search Panel (GJSP) to a control group excluded from participating in the survey. By contrast, treatment group individuals were allowed to continue participating. The GJSP was a high-intensity panel survey following the same people for up to two years (for details, see Hetschko et al. 2022). It used an innovative survey app for frequent detailed measurements every month, including real-time assessments, a diary method, and biomarker measurement (for details, see the following section).
Second, we link the GJSP survey data with administrative data of the Integrated Employment Biographies (IEB) to compare the labor market outcomes of the actual (treated) survey participants to the not surveyed control group. The IEB are provided by the German Federal Employment Agency (Bundesagentur für Arbeit) and contain comprehensive information about periods in employment, unemployment, as well as participation in active labor market programs, among other things. We augment the data with additional information on whether job seekers attend meetings at their local employment agency to measure their job search efforts. None of these data can be influenced by attrition or changes in reporting behavior in the GJSP.
Three studies that also meet the two gold standards are Persson (2014), Crossley, de Bresser, Delaney, and Winter (2017), and Zwane et al. (2011), none of which examined labor market outcomes. Persson (2014) showed that being randomly assigned to participating in a high-intensity election survey prior to the election increased turnout compared to participating after the election. Presumably the survey triggered some interest in the election and/or increased the perceived social pressure to vote. Voting was measured by official register files. Crossley et al. (2017) implemented a random assignment to modules with detailed questions on needs in retirement within a population-representative internet panel. From administrative wealth data, they linked information on actual savings. They found that households reacted to being confronted with retirement questions by reducing their non-housing saving rate. The authors’ explanation for this finding is that surveyed individuals had a salience shock and realized that they indeed needed fewer savings. In a series of experiments in a development context, Zwane et al. (2011) found that being surveyed about health increased the demand for water treatment products and medical insurance, whereas being surveyed about borrowing behavior did not influence the demand for a microloan.
We are aware of only one study focusing on labor market effects of panel participation: Bach and Eckman (2019) used the random assignment of invitations to participate in the annual German Panel Labor Market and Social Security within a larger population of eligible households as an instrumental variable for actual participation. They focused on welfare recipients and found that participation in the panel led to increasing take-up of active labor market programs. While their instrumental variable strategy acknowledges the necessary assumptions, any such approach is open to speculation about whether the exclusion restriction holds. 2 Although their identification strategy is convincing, an experiment like ours eliminates even the most unlikely issues in this regard by design. Furthermore, the authors relate a specific population (welfare recipients) to a specific outcome (active labor market program participation). Although our population is also specific (originally registered job seekers), our analyses cover a broader range of labor market outcomes, and survey participation was more invasive in that it occurred at a greater frequency (monthly instead of annually).
Experimental Design and Data Sources
Our field experiment took advantage of the collection of data from the German Job Search Panel (GJSP; see Hetschko et al. 2022 for a detailed data report). The purpose of the data set is to provide longitudinal survey data for examining the effects of job search and unemployment on well-being and health on a monthly basis (e.g., Lawes et al. 2022, 2023, 2025; Schmidtke et al. 2024). Potential survey participants were drawn from not-yet unemployed job seekers who registered with the German Federal Employment Agency. To be eligible for unemployment benefits, job seekers must register three months before their employment ends, or within three days if they learn about the end of their employment later. Among these cases, a sizable fraction of workers actually entered unemployment, whereas a similarly large share remained employed (Stephan 2016). For instance, many workers register as job seekers as their fixed-term contract expires, but oftentimes the contract is eventually extended or made permanent. Others expect their company to close down, or that they will be part of a mass layoff, which then does not happen. Among registered job seekers, people identified as part of an upcoming mass layoff were oversampled. 3 The sample was restricted to individuals with German citizenship in order to avoid language issues with the survey questionnaires.
From November 2017 to May 2019, job seekers of ages 18 to 59 years meeting the criteria described above were invited to take part in the online entry survey of the GJSP. This survey provided access to the survey app if a number of inclusion criteria were met. These included a random group assignment for the purpose of our field experiment. Approximately two-thirds of the participants were randomly selected and allowed to further participate in the survey. In the following, we refer to these people as the treatment group. One-third of eligible participants were excluded from further participation. These constitute our control group. Comparing the labor market outcomes of these two groups produces causal evidence about Hawthorne effects from GJSP participation.
As mentioned above, additional time and other resources were needed to conduct the experiment, mostly because the sample filled up more slowly as people willing to participate were excluded. It is thus worthwhile to test whether a selection-on-observables approach simply comparing the labor market outcomes of people unwilling to partake with those of the survey participants produces the same insights regarding Hawthorne effects. We therefore also separately compare a no-sign-up group with the treatment group. These individuals were invited but did not participate in the entry survey. As Hetschko et al. (2022) showed in their analysis of non-response to the GJSP, non-participation was non-random in respect to observable characteristics: High-skilled workers, young individuals, and women were more likely to sign up. Reassuringly for users of the GJSP, however, the average non-response bias across all characteristics examined by Hetschko et al. (2022) is rather low between 3% and 4%. We use the no-sign-up group in the analysis to find out about the scientific benefit of the costly field experiment and analyze whether controlling for observable characteristics would have led to the same conclusions regarding Hawthorne effects as the experimental design. A valuable alternative to this approach would have been to include a comparison with a randomly chosen group of registered job seekers who were not invited to take part in the survey. However, this was not possible for the larger part of our sample as all individuals registering as job seekers due to a mass layoff in Germany during the recruitment period were invited to participate in the GJSP.
Figure 1 provides an overview of the three groups and their roles in our study. After applying appropriate sample restrictions, our final sample comprises 1,526 persons in the treatment group, 804 persons in the control group, and 63,740 individuals in the no-sign-up group. In an Appendix at the end of this article, we document all sample restrictions in detail.

Overview of the Studied Samples and Timeline of the Study
For our analysis, we merge information from the GJSP entry survey and paradata on subsequent survey participation with data for all invited persons from the IEB (V16.00.01-202012; see Frodermann, Schmucker, Seth, and vom Berge 2021 for an IEB data report) and with appointment data from the meeting scheduling software of the Federal Employment Agency. The IEB have been widely used in previous labor market research (e.g., Bossler, Mosthaf, and Schank 2020; Dauth 2020; Bachmann, Demir, and Frings 2022). They contain administrative spell data (accurate to the day) on periods of employment subject to social security contributions, registered job search, unemployment or welfare benefit receipts, and participation in active labor market programs administered by the Federal Employment Agency. Trainings taken up in our sample mostly (90%) comprise active labor market programs with a firm or private provider and longer-lasting further trainings. Appointment data cover information on scheduled, attended, and missed appointments of job seekers.
For data preparation of the IEB, we compute all individual and job characteristics on the day of signing up for the entry survey (which is known for the treatment group and the control group). Furthermore, we compute the previous and subsequent labor market history before and after the day of signing up. As the date is not available for the no-sign-up group, we compute a hypothetical signing up day for this group using the mean number of days between the job seeker registration and survey sign-up observed in the treatment and control group.
We argue that taking part in the survey was a substantial burden on individuals in light of the monthly frequency of questionnaires and the numerous questions to be answered each month. Monthly experience sampling (six measurements on one day every month, to be answered within 30 minutes) and quarterly day reconstructions were used to elicit momentary happiness and time use (Kahneman et al. 2004; Stone and Litcher-Kelly 2006). 4 Cognitive well-being and mental health data were collected on a monthly basis using multiple items. Several instruments measured eudaimonic well-being, including a 24-item version of the Ryff (1989) scales. On a quarterly basis, respondents were invited to send in samples of their hair for the measurement of the stress hormone cortisol (for details, see Lawes, Hetschko, Sakshaug, and Eid 2024).
Participants were also asked each month to indicate information about various sociodemographic characteristics, personality traits (three monthly), coping resources, and their current labor market status. If unemployed, they were asked, for instance, about their re-employment prospects, reservation wage, and job search activities (e.g., “Have you actively looked for a job in the last four weeks?”). Employed individuals were asked about job characteristics, earnings, working hours, and the likelihood of upcoming changes in their employment status (e.g., “How likely is it that the following changes […] will occur within the next six months?,” followed by separate items for specific events, such as “You look for a new position,”“You actually lose your job,”“You become self-employed,” and so on).
To spread out the burden of participation, different questionnaire modules would pop up in the survey app on up to eight days each month. The time needed to complete the daily surveys was about five minutes. On average, individuals responded to 150 items per survey month. On top of that, more time-consuming and burdensome measurements were carried out on selected days, especially experience sampling, the day reconstruction method, and hair sampling.
While we argue that participation was quite intense, panel attrition works against treatment intensity in terms of Hawthorne effects. An extreme example would be a situation where all participants (i.e., the treated) drop out quickly after the random exclusion of the control group, implying a weak treatment. Given that even the control group completed a short part of the entry survey until exclusion, one might argue that they were minimally treated as well. This process makes the issue of attrition in the treatment group particularly relevant. It is therefore reassuring that approximately 50% of the sample retained after the entry survey continued to participate for at least one year (i.e., they completed 13 monthly waves; see Hetschko et al. 2022). We examine the issues of treatment intensity and attrition in the course of our empirical analysis (see the Experimental Results section below).
Table A.1 in the Online Appendix shows the means of observed characteristics for the treatment and the control group, as well as the results from tests on equal means. To address the issue of multiple testing, we employ the Romano-Wolf multiple-hypothesis correction (Romano and Wolf 2005, 2016) using the Stata ado-file rwolf (Clarke, Romano, and Wolf 2021), with 250 bootstrap replications performed. This correction method safeguards against the likelihood of erroneously rejecting one or more true null hypotheses within a group of hypotheses being examined in the same way. The procedure considers the actual dependence structure among the test statistics by means of resampling, leading to enhanced power in comparison to previous multiple-testing approaches such as the Bonferroni method. We consider basic sociodemographic characteristics, such as age, sex, and education, as well as the characteristics of the last job, belonging to the mass layoff sample, and the employment history over the past five years (e.g., years in employment subject to social security contributions, in unemployment, and with benefit receipt). None of the means differ between the treatment group and the control group at conventional levels of significance, confirming randomization success. Note that this also holds true if we do not correct for multiple testing. Table A.2 in the Online Appendix additionally displays results from chi-square tests for differences in the distribution of these variables, which are in line with the previous findings.
Table A.1 also shows the means of observed characteristics for the additional comparison group not signing up for the entry survey and the results from multiple-hypothesis corrected tests on equal means between the treatment group and the no-sign-up group. Here, we do find significant differences for many characteristics, implying that any comparison between the treatment group and the no-sign-up group is at risk of endogeneity bias. While the observed differences may partly be attributable to the considerably larger sample and, thus, enhanced statistical power, the mean deviations from the treatment group are also larger for the no-sign-up group than for the control group. This finding confirms that participation in the GJSP was non-random (Hetschko et al. 2022).
Labor Market Outcomes and Methods
We present findings for six outcome variables. With respect to duration outcomes, analyzing unconditional probabilities of transitions within certain durations is regarded the most appropriate method (e.g., van den Berg, Hofmann, Stephan, and Uhlendorff 2025). The randomization is compromised if the analysis is conditioned on survival at a specific time point, as the composition of survivors may vary within groups over time (Abbring and van den Berg 2005). A competing risk analysis is thus unsuited for analyzing data from a randomized controlled trial, as it requires censoring the data as soon as a transition into one competing state occurs. We thus present results on three important unconditional labor market transitions and three outcomes that can be interpreted as job features or indicators of search effort. All outcomes are measured until 360 days after signing up for the survey, as Hawthorne effects might take some time to arise and/or require repeated participation.
We first investigate whether individuals transitioned out of regular employment during the 360 days after signing up, which we observe for half of all observations. “Regular employment” is subject to social security contributions, thus excluding marginal employment but including wage-subsidized employment. It may take place in a continuing or new employment relationship. Many job seekers search successfully for a new job when expecting to terminate an employment relationship without ever entering unemployment. 5 We bridge gaps between two separate episodes of employment of up to seven days to allow for short transitions between jobs.
As a natural counterpart, we analyze whether individuals entered unemployment after registering as a job seeker. This variable is not an exact mirror of employment exits. A substantial share of individuals transition from employment into states other than unemployment.
Individuals who register as job seekers or are unemployed may take part in active labor market programs. We thus also examine transitions into subsidized (short) training during the 360 days after signing up.
As an indicator of employment quality, we compute average daily earnings within this period. For days without labor earnings, we impute a wage rate of zero. We analyze average earnings over a period of time as our sample was drawn from registered job seekers. By conditioning on employment, we would lose the advantage of randomization.
As a job-related indicator of job search outcomes, we investigate if individuals took up a job in a different municipality.
As another aspect of individual search behavior, we examine if individuals have at least one cancelled appointment with the employment agency during the 360 days after signing up.
Ideally, we would also have studied outcomes related to the GJSP’s focus on well-being and health, but we naturally lack the corresponding data for any non-participants and thus the control group. 6
For each outcome, we estimate two specifications of linear probability models or ordinary least squares (OLS) (for wages), respectively, to compare the treatment group with the control group (see the following section). First, we include only a dummy variable in the estimates for the treatment group, which constitutes a simple comparison of means. Second, we present the multivariate estimates controlling for a wide range of explaining variables. For a well-conducted field experiment, however, a comparison of means should already be sufficient to identify causal effects.
For further analyses of the treatment group and the control group, we include variables for the intensity of the treatment. In this context, we also discuss the possibility that attrition influences treatment effects via lowering treatment intensity.
Comparing the treatment group and the no-sign-up group (see the Comparison with the No-Sign-up Group section below), any estimated effects of survey participation might rather reflect differences correlating with the willingness of signing up than actual effects of survey participation. Observable characteristics are controlled for using OLS models. In addition to that, we present estimates using entropy balancing as a non-parametric way of controlling for observables (Hainmueller and Xu 2013). Here, observations in the no-sign-up group are reweighted upon the condition that they perfectly match the first and second moments of observables in the treatment group. 7
As we analyze several outcome variables, we account for multiple testing by conducting the Romano-Wolf correction with 250 bootstrap replications (Clarke et al. 2021). In the following, Tables 1 to 3 contain information on point estimates, uncorrected p values (in parentheses) as well as multiple-testing-corrected p values (in braced brackets). We consider all estimates using the same specification and sample as a group of tested hypotheses. For instance, we consider the comparison of the treatment group and the control group across six outcomes in Table 1 as one group of hypotheses as the sample and the specifications are the same.
Estimated Effects of Survey Participation on Labor Market Outcomes until 360 Days after Sign-Up
Source: GJSP and IEB (V16.00.01-202012).
Notes: The table displays coefficients, uncorrected p values (in parentheses), and multiple-hypothesis corrected p values (in braced brackets) of linear probability models/ordinary least squares (OLS). Observations: 1,526 in the treatment group, 804 in the control group. List of control variables: See Table A.1 in the Online Appendix for descriptive statistics of the covariates and Table A.3 for full regression results. Corrected p values are computed using the Romano-Wolf correction for multiple-hypothesis-testing (Clarke et al. 2021).
Estimated Effects of Participation Intensity on Labor Market Outcomes until 360 Days after Sign-Up
Source: GJSP and IEB (V16.00.01-202012).
Notes: The table displays coefficients, uncorrected p values (in parentheses), and multiple-hypothesis corrected p values (in braced brackets) of linear probability models/ordinary least squares (OLS). Observations: 1,526 in the treatment group, 804 in the control group. List of control variables: See Table A.1 in the Online Appendix. Corrected p values are computed separately for panels I and II, using the Romano-Wolf correction for multiple-hypothesis-testing (Clarke et al. 2021).
Heterogeneous Effects of Participation Intensity on Labor Market Outcomes until 360 Days after Sign-Up
Source: GJSP and IEB (V16.00.01-202012).
Notes: The table displays coefficients, uncorrected p values (in parentheses), and multiple-hypothesis corrected p values (in braced brackets) of linear probability models/ordinary least squares (OLS). Observations: 1,526 in the treatment group, 804 in the control group. List of further control variables: See Table A.1 in the Online Appendix. Corrected p values are computed using the Romano-Wolf correction for multiple-hypothesis-testing (Clarke et al. 2021).
Experimental Results
Main Findings
Table 1 presents our main results for the randomized experiment, displaying estimates for the treatment variable with and without covariates. It also informs about the respective mean values of the outcome variables for the reference group. In addition, Figure A.1 in the Online Appendix presents a coefficient plot for the estimates with covariates. Our first set of outcome variables focuses on transitions until 360 days after signing up. In the control group, approximately 53% of all individuals transitioned out of employment, 44% entered unemployment, and about 13% participated in a (short) training program. Table 1 shows that all three transitions do not differ significantly between the treatment and the control group. 8 This finding holds true with and without controlling for covariates, with respect to both statistical and economic significance, and does not depend on correcting for multiple testing. Compared to the control group mean, estimated relative effect sizes are generally of a small magnitude. This provides convincing evidence that participation in our demanding, high-intensity survey did not have an impact on the labor market outcomes of participants.
Even if we find no effects for labor market transitions, survey participation might still have an impact on job quality. Within 360 days after random assignment, individuals in the control group realized on average daily wages of around 109 euros (imputing zeros for days without employment). Table 1 shows that daily earnings in the treatment group are approximately 4 euros higher than in the control group, but this difference is not statistically significant.
Most outcomes discussed up to here are not entirely controlled by the job seekers. For example, consider our finding that survey participation does not impact transitions out of regular employment. Here, survey participation might still increase the job search efforts of a newly registered job seeker, but not enough to actually improve job finding chances before entering unemployment, which also depends on labor market conditions. In the following, we therefore examine two more direct measures of job search efforts, namely job take-up in a different municipality and attending appointments with the employment agency.
First, we investigate if a job seeker took up a job in a different municipality within 360 days of signing up for the survey. Indeed, around one-third of job seekers started working in a different municipality. However, we find no statistically significant differences between the treatment and the control group. Second, we check whether the individual job seeker had a scheduled appointment at their local employment agency and whether that meeting took place. 9 Cancelled appointments are mostly due to the job seeker not showing up. As the duration of both job search and of potential unemployment varies across individuals, we analyze whether at least one appointment scheduled with the local employment agency did not take place within the 360 days after signing up for the survey. The share of individuals with at least one missed appointment was approximately one-third. We find, however, no statistically significant differences between the treatment and the control group.
Treatment Intensity and Attrition
To investigate issues of treatment intensity and attrition, we revisit the effect of survey participation for parts of the treatment group who were intensively treated. We interact survey participation with two indicators of treatment intensity: continued participation until at least month seven and participation in the cortisol study, which involved sending in strands of hair to receive an objective stress measure. For the control group, both treatment intensity dummies were assigned a value of zero. As a result of non-random nonresponse, the findings based on this indicator of treatment intensity should be cautiously interpreted. Only 26% of the sample were eligible (e.g., sufficient hair length) and willing to partake in the hair sampling (Lawes et al. 2024).
Table 2 presents estimates for the six outcome variables examined above, again controlling for the set of control variables described in Table A.1, and for 360 days after signing up. The correction for multiple-hypothesis-testing is applied again, too, but this does not alter our conclusions: We find no statistically or economically significant effects of treatment intensity.
A related issue arises from the fact that attrition is a non-random process. Treatment group individuals who continued to participate are potentially different from those who dropped out. Hence, while the randomization has ensured balanced samples at the point of sign-up, treated and control group observations potentially started to differ at any later point, obviously including month seven. Having said that, Hetschko et al. (2022) reported little evidence for systematic differences between participants and non-participants even as late as month seven across a variety of individual characteristics. Females seem more likely to continuously participate; however, the effect is weakly significant only. As we show further below, females do not differ when it comes to the treatment effects. Overall, little evidence suggests that our results are confounded by attrition.
Heterogeneous Effects
We first conduct a heterogeneity analysis based on theoretical considerations. For our experimental sample, we interact the treatment group with registering due to a mass layoff, gender, having a temporary contract at the time of registering, and having experienced at least one recall during the past five years, that is, they were re-employed by a previous employer (e.g., seasonal workers). Economic research on the effects of unemployment often restricts itself to mass layoffs, which are less prone to be correlated with individual unobserved characteristics in comparison to other types of job terminations (e.g., Kassenboehmer and Haisken-DeNew 2009; Schmieder, Von Wachter, and Heining 2023). A gender-specific analysis seems appropriate as the labor market behavior of men and women differs in many respects (e.g., Borella, De Nardi, and Yang 2023). Individuals on temporary contracts often have to register as job seekers because of institutional constraints, even if the chance of a contract extension is high (Stephan 2016). Furthermore, individuals who expect a recall have a smaller incentive to exert search effort.
The results are presented in Table 3, controlling for the full set of covariates and correcting for multiple-hypothesis-testing. We find no significant interactions between survey participation and female gender, being on a temporary contract, or having been recalled to a previous job. However, individuals taking part in the survey who were dismissed as part of a mass layoff seem to enter unemployment somewhat earlier than those who were dismissed for other reasons. For this group, survey participation appears to cancel out the fact that they generally enter unemployment later than those who were dismissed for other reasons.
Second, we estimate causal forests to identify at a more general level whether treatment effect heterogeneity is present in any of our outcomes (e.g., Wager and Athey 2018). 10 The estimated average treatment effects of the causal forests are practically identical to those obtained from our regression analysis. The conditional average treatment effect (CATE) represents the individual treatment effect conditional on covariates. Based on a median split, we compute a binary variable for belonging to an either low or high CATE group. Next, we interact the treatment indicator with this CATE variable to analyze whether treatment effects differ significantly between the two groups. The resulting interaction effect is statistically significant for two of our six outcomes only, training participation and taking up a job in another municipality, suggesting some degree of treatment effect heterogeneity in these two outcomes. To analyze whether any of our covariates might explain the heterogeneity, we interact all control variables with our treatment indicator. The results are displayed in Table A.4 in the Online Appendix (for all outcomes). When correcting for multiple-hypothesis-testing, none of the initially six statistically significant interaction effects remain significant. This implies that variables not present in the data set explain the treatment effect heterogeneity in training participation and taking up a job in a different municipality. Overall, however, we find little evidence of treatment heterogeneity across subgroups. At least for most of our outcomes both regression analysis and causal forest estimation imply zero effects that do not originate from mutually cancelling subgroup effects.
Comparison with the No-Sign-up Group
As outlined above, we are also interested in whether our randomized experiment was worth the effort. Therefore, we also compare the treatment group with a no-sign-up group. Table 4 presents findings without (panel I) and with (panel II) entropy balancing, and Figure A.1 in the Online Appendix displays coefficient plots for the full estimates. Without entropy balancing, and including only the treatment variable but no further control variables, we find no significant differences in transitions and the number of cancelled appointments (at least, once we correct for multiple-hypothesis-testing).
Comparison of Labor Market Outcomes until 360 Days after Sign-Up between Survey Participants and the No-Sign-Up Group
Source: GJSP and IEB (V16.00.01-202012).
Notes: The table displays coefficients, uncorrected p values (in parentheses), and multiple-hypothesis corrected p values (in braced brackets) of linear probability models/ordinary least squares (OLS). Observations: 1,526 in the treatment group, 63,740 in the no-sign-up group. List of control variables: See Table A.1 in the Online Appendix. Corrected p values are computed separately for panels I to II, using the Romano-Wolf correction for multiple-hypothesis-testing (Clarke et al. 2021).
Controlling for observable attributes of both groups, however, the results suggest a significantly positive relationship of survey participation and transitions into training, even when correcting for multiple-hypothesis-testing. These estimates are also economically significant as they account for approximately 20% of the constant from models without covariates. We obtain the same results if we use entropy balancing to achieve similar distributions of observable characteristics in the no-sign-up group and the treatment group. Taken together with the experimental analysis in which we found no such effects, this implies that at least some unobserved differences between the treatment group and the no-sign-up group remain after controlling for observable characteristics, and that these unobserved differences are correlated with the propensity of participating in training.
Finally, we find significant differences in daily earnings and work in another municipality, if we compare the treatment group with the no-sign-up group and do not control for further covariates (Table 4, upper part panel I). These differences, however, are no longer significant once we take the entire set of covariates into account and correct for multiple-hypothesis-testing or conduct entropy balancing.
Conclusions
We investigated whether participation in an intensive app-based survey on job search and well-being had an impact on labor market outcomes within a year of signing up for the survey. To this end, we combined two gold standards of research into Hawthorne effects: First, we conducted a field experiment, randomly excluding one-third of individuals willing to partake in a survey from participation for use as a control group of actual survey participants. Second, we merged information on survey participation with administrative data on labor market outcomes, ruling out that our results are in any way related to reporting bias.
Our most important finding is that participation in the survey, on average, had no impact on any of the investigated labor market transitions of initially employed job seekers, namely out of employment, into unemployment, and into subsidized (short) training. We also found no effects of survey participation on daily wages, taking up a job in a different municipality, and cancelled appointments with the employment agency. There was also little evidence for effect heterogeneity across subgroups. A causal forest analysis implied some heterogeneous effects on the outcomes of training participation and taking up a job in a different municipality only. This heterogeneity appeared unrelated with the variables in our data set and thus constitutes a topic for future research.
In addition, we showed that even controlling for a wide range of observable characteristics and correcting for multiple-hypothesis-testing, a comparison of survey participants with individuals not signing up for the survey would have led to misleading conclusions. Regression results showed that survey participants statistically and significantly more often take up subsidized (short) training if compared to the no-sign-up group. Thus, there was some remaining selection into survey participation based on unobservable characteristics, creating a false sense of an impact of survey participation on training participation in a selection-on-observables setting. This finding reiterates the importance of experimental research designs for identifying effects in our context. In this sense, our field experiment was worth the effort, even though excluding the control group from the survey meant we had to spend more time and other resources to fill up our sample.
How generalizable are our main findings from the experimental study? Previous research found that outcomes such as saving (Crossley et al. 2017) or voting (Persson 2014) reacted to survey participation, while we found no such effects for labor market outcomes. Thus, the specific study context clearly matters. One reason for the lack of measurable reactivity effects in our experiment may be that most of our outcomes (e.g., having a job at a certain point in time) are less controllable by the survey respondent than those for which the previous literature has found effects (e.g., turning up at a polling station). The fact that our outcomes are partly influenced by characteristics reflecting a person’s employability (e.g., previous work experience, labor demand) may weaken the link between behavioral change triggered by survey participation and realized outcomes.
Other aspects of the survey we examined likely enhance the generalizability of our results, at least when it comes to labor market outcomes. Participation in the GJSP was highly intense, given the monthly measurements with modules appearing on respondents’ smartphones over several days of each month, including real-time measurement and diary methods, as well as hair sampling. A logical assumption regarding survey participation effects on labor market outcomes is that any such impact would increase with treatment intensity, defined by the extent and frequency of being surveyed. Yet our experimental study found no such effects, suggesting that most other surveys relevant to labor market research, such as less intense yearly household surveys, are also unlikely to influence real-world labor market outcomes.
On the contrary, more targeted surveys, even if less intense than the GJPS, might impact behavior more significantly. Unlike the GJPS with its broad scope (job search, employment, well-being, health), a more targeted survey may focus respondents’ attention on certain areas, in particular if these are of relevance for the specific population. Bach and Eckman (2019), for instance, found survey effects on participation in active labor market programs among welfare recipients. Our population of registered job seekers was perhaps less likely to show these effects because they were not the primary target of such programs. A significant share of these job seekers did not enter unemployment at all, and those who became unemployed typically received unemployment insurance benefits. By contrast, Bach and Eckman (2019) analyzed welfare benefit recipients, who were overwhelmingly long-term unemployed and thus more strongly targeted by active labor market policies.
Notwithstanding the potential caveat of context-dependency, our findings provide good news for survey researchers especially in the area of labor economics. The lack of reactivity effects speaks to the internal validity of research results obtained from analyzing survey data, even in cases where participation is frequent and burdensome. Further research should aim to obtain a more complete picture of the circumstances under which reactivity occurs. For instance, future studies could study a variety of populations of survey participants, countries, and labor market conditions.
Supplemental Material
sj-pdf-1-ilr-10.1177_00197939261444258 – Supplemental material for Feeling Observed? A Field Experiment on the Effects of Intense Survey Participation on Job Seekers’ Labor Market Outcomes
Supplemental material, sj-pdf-1-ilr-10.1177_00197939261444258 for Feeling Observed? A Field Experiment on the Effects of Intense Survey Participation on Job Seekers’ Labor Market Outcomes by Gesine Stephan, Clemens Hetschko, Julia Schmidtke, Michael Eid and Mario Lawes in ILR Review
Footnotes
Appendix: Sample Restrictions
Out of 127,201 persons who were invited to take part in the online entry survey of the GJSP, 4,698 persons signed up for the entry survey (see Hetschko et al. 2022 for details). 11 Of those starting to participate in the entry survey, 2,747 persons fulfilled all substantive criteria (i.e., other than the random assignment) for further participation in the survey and used the app at least once. 12 Of the 2,747 workers who signed up, 940 randomly chosen subjects were excluded for the purpose of our field experiment. The remaining 1,807 randomly selected participants were invited to further participate in the survey. Of the people invited, 122,503 did not sign up for the entry survey.
Based on the IEB information, we include only the focus group of the GJSP in our analysis sample, namely German individuals who were regularly employed at the date of signing up and who had at least half a year of tenure at their current employer. This excludes disproportionally many individuals from the no-sign-up group, as they entered unemployment or started a new job between being invited to participate in the GJSP and the hypothetical sign-up date. One reason might be that our invitation letter made clear our sole interest in “still-employed” job seekers. This observation reiterates the non-random nature of the no-sign-up group in contrast to the control group when compared to the treatment group.
Individuals younger than 20 and older than 59 years at the date of (hypothetically) signing up for the survey were also not considered as the control group lacks any 18- or 19-year-old job seekers. For data preparation, we exclude employment spells with unrealistically low wages below 5 euros per day, and we impute missing values of the education variable based on entries in previous spells of a person. A small number of individuals are excluded as they could not be found in the IEB or information on their education is missing even after the imputation procedure. Our final analysis sample then consists of 1,526 persons in the treatment group, 804 persons in the control group, and 63,740 individuals in the no-sign-up group (see also Figure 1).
Acknowledgements
We are grateful for comments by Ruben Bach, Michael Cooper, Simon Trenkle, as well as participants of the yearly meetings of RES (Birmingham, 2025), ISQoLS (Luxembourg, 2025), AIEL (Milan, 2025), EALE (Bergen, 2024), BeWell (Magdeburg, 2024), and SES (Glasgow, 2024).
The DIM unit of IAB, in particular Stephan Grießemer, provided crucial support in carrying out the sampling. We also appreciate financial support by the German Science Foundation (DFG) through grants EI 379/11-1, SCHO 1270/5-1, and STE 1424/4-1. The experiment underpinning the article was approved by an ethics committee of Freie Universität Berlin as part of an overarching project (approval no. 169/2017) and pre-registered at the AEA registry (
) in January 2018.
1
2
For instance, people who are selected for the survey, but cannot be reached due to missing, outdated, or incorrect address data, might also be more difficult to engage in a training program.
3
Our definition of mass layoffs largely follows §17(1) of the German employment protection act (Kündigungsschutzgesetz): > 5 layoffs in plants with up to 59 employees, 10% in plants with 60–250 employees, > 25 layoffs in plants with 251–499 employees, ≥ 30 layoffs in plants with 500+ employees.
4
Recent work by Eisele et al. (2023) suggested there were reactivity effects of completing the experience sampling method, however not necessarily in the form of behavioral change. Previously,
reported that high attention to feelings can be beneficial to momentary well-being if individuals have strong mood regulation abilities, whereas it could be detrimental if mood regulation abilities are weak.
5
We do not analyze subsequent transitions out of unemployment as this would require us to condition on previously entering unemployment and into employment at the cost of compromising the randomization.
6
Not all pre-registered outcomes (duration of job search, relocation, commuting when re-employed, wage when re-employed, future unemployment probability, characteristics of future employer) could be examined. In particular, we decided not to investigate the duration of a job search as a registered job search might take place during times of employment as well as unemployment and is therefore difficult to interpret. Instead, we added cancelled meetings with the employment agency as an alternative indicator of search effort. For mobility, we analyze changes in the address of the employer as information on the home address may not be consistent between employer notifications and data from the operative systems of the Federal Employment Agency.
7
Entropy balancing works well in our sample with respect to the distribution of observable variables. Alternatively, we could have used propensity score matching (e.g.,
). It creates matched observations based on the estimated probability of receiving treatment. In additional analyses (not reported here), we found that this approach produces results similar to entropy balancing. Unlike the field experiment, both methods do not comprehensively tackle endogeneity issues arising from unobservable characteristics.
8
Note that under standard assumptions (power = 0.80, significance level = 0.05), for our randomized sample (N = 2,330) with a treatment share of 0.65, the minimum detectable effect for a dummy variable with a mean of 0.134 (the control group mean for training participation, which has the smallest mean value among our dependent variables) would be 0.042, or 4.2 percentage points.
9
These meetings take place between the job seeker and a staff member responsible for their case. Employment agencies offer an appointment for an early meeting soon after registration as a yet-employed job seeker. While early meetings do not prevent unemployment, the literature shows they significantly accelerate subsequent job finding (Rosholm 2014; Schiprowski 2020;
).
10
We use the R package grf (
) for estimating a causal forest for each of our six outcome variables. We apply cross-validation with five folds and loop over three random seeds to decrease potential dependence of our results on a specific random seed. We estimate 5,000 trees in each iteration using honest trees and tuning all model parameters by default.
11
12
We exclude all individuals who did not submit the entry survey (246), were already unemployed (1,424) or on job probation (215), never used the app (35), or mistakenly took part in the survey (31).
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
