Self-Assessment Variables as a Source of Information in the Evaluation of Intervention Programs: A Theoretical and Methodological Framework

Abstract

The article discusses the incorporation of individuals’ assessments regarding the effect of intervention program on themselves as a source of information in commonly used quantitative program evaluation methods. The incorporation of Self-Assessment Variables (SAV) into the evaluation process enables the researcher to utilize the information contained in SAV while utilizing other available sources of information as well (such as administrative data). The analysis is based on the assumption that individuals possess valuable and unique information which they employ before self-selection into a program. The theory of planned behavior is used as a framework for examining different aspects of integrating SAV into program evaluation. The article elaborates on the integration of SAV into the matching method and on the possible advantages of that approach. In addition, the article discusses different aspects of the process of eliciting SAV from individuals. Finally, the article outlines possible directions for future research.

Keywords

behavior change intervention matching program evaluation self-assessment self-expectation self-selection theory of planned behavior

Introduction

Public social intervention programs are a major policy tool used in many fields, such as economics, education, public health, and criminology. To engage in comprehensive policy planning, it is essential to evaluate the impact of these intervention programs on the participants. The fundamental difficulty encountered in quantitative evaluation of these intervention programs is the lack of information about the outcomes of individuals based on their participation status. Notably, information is lacking because there is no way of observing an individual as a participant or a nonparticipant in the same intervention program at a given point in time. Heckman et al. (1999) and Imbens and Wooldridge (2009) reviewed the evaluation fundamental difficulty and a variety of empirical methods developed to cope with this challenge, which rely on the use of experimental and nonexperimental data sets. The reviews demonstrate that no one method will always be optimal for achieving a reliable evaluation of the intervention programs.

This article analyses the use of individuals’ assessments regarding the effect of the program on their own outcomes as a source of information to alleviate the fundamental evaluation problem. The individuals’ self-assessments are relevant in any social intervention program designed to change certain aspects of the participants’ lives: for example, self-assessments of the unemployed about the impact of vocational training on their employment prospects, self-assessments of college students about the impact of a program designed to reduce binge drinking, or self-assessments of youth at risk about the impact of a program designed to reduce dropping out of school. The analysis examines the theoretical justification for incorporating Self-Assessment Variables (SAV) into program evaluations and the methodological implications of that approach. In the analysis, the use of SAV was based on the assumption that individuals possess valuable and unique information about the program’s impact on their own outcomes, and that they use this information to decide whether or not to enroll in the program, as described by Heckman (1997). The analysis also explores the use of the theory of planned behavior (TPB; Ajzen, 1991, 2012) as a framework for examining different aspects of integrating SAV into program evaluation. In the context of the use of mixed methods in program evaluation, the integration of SAV into a quantitative evaluation method allows the researcher to integrate the personal “story” of each individual into the evaluation process. Thus, the integration of SAV is complementary to the use of mixed methods in program evaluation. (The reader is referred to Burch and Heinrich (2015) regarding the use of mixed methods in program evaluation.)

Usually, researchers who use quantitative methods to evaluate the effect of intervention programs do not integrate SAV as a source of information. SAV is unique because it refers to the impact of the intervention program on the individuals, whose estimation is the goal of the evaluation process itself. Moreover, SAV is the outcome of the assessment by individuals of the program’s effect on themselves. The uniqueness of SAV has implications for its elicitation, its integration into the estimation model, and the interpretation of the evaluation results.

SAV can be incorporated into a variety of estimation methods. However, due to methodological considerations, the present analysis focuses on integrating SAV into the matching method. To conduct comprehensive empirical research on the contribution of SAV to program evaluation, it is necessary to have an exceptionally rich and carefully designed data set. The article lists the necessary characteristics of the data set and deals extensively with various aspects of the process of eliciting SAV from individuals. Furthermore, the article outlines possible directions and topics for future research on the use of SAV in program evaluation.

SAV as a Source of Information

In 1997, James Heckman defined a research environment in which the effect of intervention programs is heterogeneous and in which “individuals possess and act on private information about gains from the program that cannot be fully predicted by variables in the outcome equation” (Heckman, 1997, in the Abstract). Four assumptions establish the prevalence of Heckman’s Research Environment (HRE; Eyal, 2010):

A1. The impact of the intervention program is heterogeneous.

A2. Individuals have an assessment about the expected impact of the program on themselves.

A3. The self-assessments of individuals are based on valuable information. At least some of that information is unique (i.e., not available to the researcher).

A4. Individuals take the information at their disposal into account when deciding whether or not to enroll in the program.

The prevalence of HRE in a given research environment justifies the integration of SAV into program evaluation. If individuals possess valuable and unique information about the impact of an intervention program and if they use that information when they decide whether to enroll in a program, SAV will be a useful source of information for estimating the program’s effect.

The TPB offers another perspective regarding the use of SAV in program evaluation. For a description of the theory, see Ajzen (1991, 2012, 2018). Figure 1 (Ajzen, 2018) depicts the TPB. According to the TPB, the intention to act (e.g., to participate in an intervention program) is influenced by attitude toward the behavior, subjective norm, and perceived behavioral control. Attitude toward the behavior refers to the individual’s evaluation of the behavior as favorable or unfavorable; subjective norm refers to perceived social pressure to engage in the behavior or refrain from engaging in it; and perceived behavioral control refers to the individual’s perceived ability to act. According to the TPB, the actual behavior of an individual is a function of intention and actual behavioral control. The determinants of intention—that is, attitude, subjective norm, and perceived behavioral control—are, respectively, based on beliefs about the probability that the behavior will lead to specified outcomes (behavioral beliefs), beliefs about the normative expectations of significant others (normative beliefs), and beliefs about the presence of factors that may affect the performance of behavior (control beliefs). The attitudes, subjective norms, and perceived behavioral control are conceptually independent, but still empirically interrelated. Armitage and Conner (2001) conducted a meta-analysis of research that used the TPB and found that the TPB accounted for 39% and 27% of the variance in intention and behavior, respectively. A number of factors that may affect the predictability of the TPB regarding the future behavior of individuals are reviewed by Ajzen and Dasgupta (2015).

Figure 1.

The theory of planned behavior.

HRE and the TPB are related in that SAV is a behavioral belief about the probability that participating in an intervention program will lead to a specified outcome. According to the TPB, SAV will affect the individual’s attitudes toward the program, which in turn will affect the individual’s intentions and ultimately the probability of participation in the program. In terms of A2, A3, and A4 (the three behavioral assumptions), the TPB refers to assumptions A2 and A4, that is, it refers to the assumptions that individuals have an assessment about the expected effect of the program on themselves, and that they use this assessment when deciding whether to enroll in a program. However, the TPB has no bearing on the prevalence of A3, that is, the assumption that SAV contains valuable and unique information about the program outcome (Ajzen, 2011, 2012). Thus, TPB itself cannot justify the use of SAV in program evaluation.

The following discussion of SAV as a source of information for program evaluation uses the two potential outcomes model (Roy, 1951), where Y₁ and Y₀ represent the outcomes of participants and nonparticipants in the program, respectively. The subscript i, which denotes individuals, has been deleted to simplify the expressions. X_R=k_,I=k are individual and aggregate variables; R (Researcher) and I (Individual) indicate whether the specified variables are observed (K = 1) or not observed (K = 0) by the researcher or by the individual. Thus, the variables X_R_=1,I=1 are observed by the individual as well as by the researcher (e.g., gender), the variables X_R_{=1, I=0} are observed only by the researcher (e.g., aggregate variables such as unemployment rate), and the variables X_R_=0,I=1 are observed only by the individual (e.g., SAT scores). Neither the researcher nor the individual observes the last group of variables X_R_=0,I=0 (e.g., local demand for a specified vocation such as computer programmers). The classification of a specified variable by the above categories might change according to the specific research environment. For example, SAT scores may be available to the researcher in one research environment but not in another.

T is a binary variable: 1/0 for participation/nonparticipation in the program, respectively.

U₀, U₁, U_T are errors, and α_0, α_1, β are the coefficients of the model.

The following parametric estimation model is used by the researcher:

Y_{0} = g_{0} (X_{R = 1, I = 1}, X_{R = 1, I = 0}, α_{0}, U_{0})

(1)

Y_{1} = g_{1} (X_{R = 1, I = 1}, X_{R = 1, I = 0}, α_{1}, U_{1})

(2)

T = h (X_{R = 1, I = 1}, X_{R = 1, I = 0}, β, U_{T})

(3)

The first two equations (Equations 1 and 2) refer to the two potential outcomes Y₀ and Y₁, respectively. Naturally, only the variables observed by the researcher are used (X_R_=1,I=1, X_R_=1,I=0). The third equation refers to the selection process that determines who will actually participate in the program and whether Y₀ or Y₁ will be observed by the researcher for a specific individual. The errors (U₀, U₁, U_T) refer either to unobserved variables (i.e., not observed by the researcher) or to measurement errors (Marschak, 1953).

The treatment effect on the treated (TT) is a commonly used parameter for measuring the treatment effect:

\begin{array}{l} T T = E (Y_{1} - Y_{0} | X, T = 1) = E (Y_{1} | X, T = 1) \\ - E (Y_{0} | X, T = 1) \end{array}

(4)

where X represents conditioning variables, and TT is defined as the difference between the observed outcome (Y₁) that the participants (T = 1) attain in the program and the counterfactual outcome (Y₀) that they would have attained had they not enrolled in the program. The lack of information needed to identify the effect of the intervention program stems from the fundamental inability to observe the counterfactual outcome for the participants (Y₀| X, T = 1).

The following equation describes how individuals derive SAV:

S A V_{J} = s p_{J} (X_{R = 1, I = 1}, X_{R = 0, I = 1})

(5)

where J = 1, 0 for participation or nonparticipation in the program, respectively.

For example, SAV₁ and SAV₀ may refer to the individuals’ self-assessments of their earnings after they have either participated or not participated in a vocational training program. In this case, ∆_SAV = SAV₁ − SAV₀ denotes the individuals’ calculated assessments of the program’s effect on their future earnings based on SAV₁ and SAV₀. The process of deriving SAV_J will usually vary depending on the specific process (sp_J) and on the specific data (X_R=_1,I=1, X_R=_{0, I=1}) used by each individual. For example, Dominitz et al. (2003) found that some individuals may simply rely on the opinions of acquaintances when formulating their expectations about receiving social security benefits.

Both the individuals and the researcher(s) have an interest in estimating the effect of the program. Individuals are in an advantageous position in that they possess a broader set of data, at least regarding issues related to personal abilities, possibilities, and plans (X_R_=0,I=1). Furthermore, individuals can choose the most appropriate assessment process for themselves (Equation 5), whereas researchers encounter an inherent difficulty in their attempt to construct a uniform quantitative model for the entire population (Equations 1 and 2). However, researchers possess theoretical and methodological knowledge, which may facilitate successful estimation of the program’s effect. Moreover, researchers have access to information (e.g., panel data) that is not available to individuals.

To alleviate the fundamental difficulty caused by the lack of information needed for program evaluations, researchers can utilize SAV as capsules of information that are elicited directly from individuals. As such, even though the researchers do not have full knowledge about the process or about the specific data that individuals use to derive SAV (Equation 5), these variables can still be a useful source of information.

The Value and Uniqueness of Self-Assessment Variables: Empirical Findings From the Literature

Unfortunately, literature on the use of SAV in program evaluation is scarce. Furthermore, it is limited in that it examines SAV as a criterion for evaluating the program effect, as a possible substitute for conventional estimation methods (experimental or nonexperimental). In contrast, the present analysis seeks a way to integrate SAV into the conventional estimation methods as a supplementary source of information to improve their performance. The currently available empirical studies in the literature use an experimental data set to estimate the “real” program effect as a benchmark and directly compare it with the program effect as is directly derived from SAV. These studies are important in that they examine the cognitive ability of individuals to make meaningful assessment of the program’s effect on themselves. It should be noted, though, that this cognitive ability has no direct bearing on the usefulness of SAV as a source of information in program evaluation. For example, SAV may be accurate but still not informative, given other information (variables) available in the program estimation model, and on the contrary, it may be informative though inaccurate.

The comparison of SAV to program impact yielded mixed results. Heckman and Smith (1998) and Smith et al. (2013) used the JTPA (U.S. Job Training Partnership Act) experimental data set to compare SAV to the impact of the JTPA on the participants’ outcomes in the labor market. The authors did not find evidence of a consistent relationship between the participants’ self-assessments and the estimations of program outcomes. Smith et al. noted that these findings should be interpreted with caution because the participants did not base their assessments on a well-posed question.

Mueller et al. (2014) and Mueller and Gaus (2015, 2018) conducted a series of studies of short-term interventions (surfing at an Internet portal, watching a televised documentary or an educational video) whose objectives were to bring about a change in a certain aspect of the participants’ behavior. Mueller et al. (2014) used experimental data on an intervention that aimed to change the motivation of consumers to engage in climate-friendly behavior. Six of the 12 intention variables that were examined using the participants’ self-assessments yielded an estimated treatment effect that was comparable to the one yielded by the experimental data. It was also found that gender and age were related to the precision of the participants’ self-assessments. A similar research design was used by Mueller and Gaus (2015) to examine an intervention that dealt with consumption of organic food. The author examined the program impact on intentions and attitudes and on self-reported behavior. SAV was found to be comparatively reliable regarding intentions and attitudes, but the results were inconclusive in regard to self-reported behavior. Mueller and Gaus (2018) studied an intervention that informed the participants about organ donation and encouraged them to get an organ donor card. The study used a series of random control trials (RCTs) to create an experimental data set to explore the accuracy of SAV under different conditions. The examined conditions were individual characteristics (education level), the examined outcome variables (attitudes vs. knowledge), and the way the data were collected (the placement of the rating of SAV relative to the rating of the current situation in the questionnaire). The study results indicated that SAV is a reliable indicator of program impact. These results were unaffected by changes in the examined conditions.

The Integration of SAV in Program Evaluation

Although the prevalence of HRE implies that SAV would be a useful source of information in the evaluation process, it has no bearing on the causation between SAV and the potential outcomes (Y₀, Y₁). To clarify this point, Freedman’s (2006) approach was adopted for the Neyman–Rubin–Holland model (Holland, 1986; Neyman, 1923; Rubin, 1974). For a translation into English and discussion of Neyman (1923), see Splawa-Neyman et al. (1990). According to this model, to establish the causality of SAV, it is necessary to examine whether manipulation of SAV alone is related to a change in Y₀/Y₁. To further explore the causality of SAV, Equations 6 and 7 describe how Y₀ and Y₁ are determined:

Y_{0} = p_{0} (X_{R = 1, I = 1}, X_{R = 1, I = 0}, X_{R = 0, I = 1}, X_{R = 0, I = 0})

(6)

Y_{1} = p_{1} (X_{R = 1, I = 1}, X_{R = 1, I = 0}, X_{R = 0, I = 1}, X_{R = 0, I = 0})

(7)

Had we known p₀, p₁, and the values of X_R_=1,I=1, X_R_=1,I=0, X_R_=0,I=1, and X_R_=0,I=0, we could have fully predicted Y₀/Y₁ for each individual. Because a change in X_R_=1,I=1, X_R_=1,I=0, X_R_=0,I=1, or X_R_=0,I=0 included in Equations 6 and 7 will affect Y₀ or Y₁, these variables have a causal effect on the individual’s potential outcomes. In the framework of HRE, a change in SAV_J alone either will or will not affect Y₀/Y₁, depending on the specific research environment. If SAV_J does not affect Y₀/Y₁, it must not be included either in X_R_=1,I=1 or in X_R_=0,I=1 in Equations 6 and 7. In that case,

\begin{array}{l} (Y_{J} | X_{R = 1, I = 1}, X_{R = 1, I = 0}, X_{R = 0, I = 1}, X_{R = 0, I = 0}) \\ = (Y_{J} | X_{R = 1, I = 1}, X_{R = 1, I = 0}, X_{R = 0, I = 1}, X_{R = 0, I = 0}, S A V_{0}, S A V_{1}) \end{array}

(8)

However, according to A3 (the assumption that SAV contains valuable and unique information), SAV_J is at least partially based on X_R_=1,I=1 and X_R_=0,I=1, which are included in Equations 6 and 7 and have a causal relationship with Y₀/Y₁. Hence, HRE implies an associational inference between SAV_J and Y₀/Y₁ (Holland, 1986). Yet, a possible path that creates causality between SAV_J and Y₀/Y₁ is suggested by the TPB. That possibility is based on the assumption that behavioral beliefs will not only affect the probability of program participation but will also affect the probability of behaviors that affect the participants’ outcomes. For example, high expectations of a vocational training program (high behavioral belief) will lead to a positive attitude, high intention, and finally to high prevalence of behaviors that improve the participants’ outcomes in the labor market (Y₁). These kinds of behaviors will be evident during the training program itself (e.g., completing homework assignments, attendance in classes) and also at the program’s end (e.g., intensive job search in the field of training).

If a causal relationship between SAV_J and Y₀/Y₁ prevails, it will strengthen the value of SAV as a source of information for program evaluation. In any case, the researcher may include SAV in the estimation model to obtain better predictions of Y₀ and Y₁:

s₀, s₁—coefficients of SAV.

Y_{0} = {g^{'}}_{0} (X_{R = 1, I = 1}, X_{R = 1, I = 0}, S A V_{0}, S A V_{1}, α_{0,} s_{0,} U_{0})

(1′)

Y_{1} = {g^{'}}_{1} (X_{R = 1, I = 1}, X_{R = 1, I = 0}, S A V_{0}, S A V_{1}, α_{1}, s_{1,} U_{1})

(2′)

The inclusion of SAV₁ to estimate Y₀ in Equation 1′ and SAV₀ to estimate Y₁ in Equation 2′ is due to the possibility that both SAV₀ and SAV₁ will affect the probability of engaging in behaviors that may affect Y₀ and/or Y₁. Based on Equation 5, it is assumed that SAV₀ and SAV₁ correlate with X_R_=0,I=1, which is included in the error terms of Equations 1′ and 2′. In that case, SAV₀ and SAV₁ may add valuable and unique information to the estimation process. However, it is assumed that SAV₀ and SAV₁ correlate with X_R=1,I=1 as well (Equation 5). Because X_R=1,I=1 is already used in the estimation, the correlation between these variables and SAV₀/SAV₁ may bias the estimates of α₀ and α₁. Either way, as mentioned, caution should be exercised when interpreting the relationship between SAV₀ /SAV₁ and Y₀/Y₁ in terms of causality.

The integration of SAV into the nonparametric matching method is an appealing option for using SAV in the evaluation process, which circumvents the difficulties in interpreting the outcomes of the parametric estimation model. According to this method, each participant in the program is matched with one or more nonparticipants who have identical or similar observed characteristics to attain a balanced group for comparison with the treated individuals. The matching method is based on conditional independence assumption (CIA):

(Y_{0}, Y_{1}) ⊥ T | X_{R = 1}, where X_{R = 1} = X_{R = 1, I = 1} \cup X_{R = 1, I = 0}

(9)

If Equation 9 holds, then given X_R₌₁, the individual’s outcomes are independent of participation or nonparticipation in the program. In this case,

E (Y_{0} | X_{R = 1}, T = 1) = E (Y_{0} | X_{R = 1}, T = 0) = E (Y_{0} | X_{R = 1})

(10)

In light of the need to find individuals in the untreated group who match each individual in the treated group, the treated and untreated groups must have common support:

0 < P (T = 1 | X_{R = 1}) < 1, all over the examined set of X_{R = 1}

(11)

In practice, instead of matching the variables observed by the researcher (X_R₌₁), the matching procedure can be reduced to one dimension by matching the propensity score, which is defined as P(T=1| X_R₌₁) (Rosenbaum & Rubin, 1983).

Given Equations 9 and 11, it is possible to estimate TT by comparing the outcomes of the treated group with those of the matched comparison group:

\begin{array}{l} T T = E (Y_{1} | X_{R = 1}, T = 1) - E (Y_{0} | X_{R = 1}, T = 1) \\ = E (Y_{1} | X_{R = 1}, T = 1) - E (Y_{0} | X_{R = 1}, T = 0) \end{array}

(12)

Because both E(Y₁ | X_R₌₁, T = 1) and E(Y₀ | X_R₌₁, T = 0) can be directly estimated by means of the treated and matched comparison groups, TT can be identified.

The main advantage of the matching method is that it does not impose any structural constraints on the potential outcomes (Y₀/Y₁). In addition, the matching method is intuitively appealing, making it relatively easy for policy makers to interpret and utilize the evaluation outcomes. Nevertheless, the CIA is not a trivial precondition, and it holds in two situations (Heckman et al., 1997):

a. There is no individual or institutional selection into the program based on potential outcomes.

b. There are no unobserved variables (by the researcher, X_R=_0, _I=1 or X_R_=0,I=0) that affect selection into the program as well as potential outcomes (Y₀/Y₁).

Assumption (a) is not consistent with Roy’s (1951) model and is implausible in most, if not all, relevant research environments; and assumption (b) requires a rich data set, which includes all the variables that affect selection into the program as well as the potential outcomes (Y₀/Y₁). The question regarding the actual prevalence of the CIA is an empirical one, and the answer may vary depending on the specific research environment. For a discussion on the use of the matching method, including the prevalence of the CIA and the data required to use that method, see Caliendo et al. (2017), Cook et al. (2008), Dehejia and Wahba (1999), Heckman et al. (1997, 1998), Lechner and Wunsch (2013), and Smith and Todd (2005). All these researchers except Cook et al. dealt exclusively with the evaluation of active labor market programs.

The main weakness of the matching method lies in its inherent inability to cope with selection into the program deriving from unobserved variables that also affect the program outcome. This selection process contravenes the CIA assumption. Therefore, the estimation bias may be reduced by incorporating SAV into the data set used for the evaluation. Equation 13 presents the estimation bias in the matching method without incorporating SAV:

\begin{array}{l} B_{M a t c h} (X_{R = 1}) = {E (Y_{1} | X_{R = 1}, T = 1) - E (Y_{0} | X_{R = 1}, T = 1)} \\ - {E (Y_{1} | X_{R = 1}, T = 1) - E (Y_{0} | X_{R = 1}, T = 0)} \\ = E (Y_{0} | X_{R = 1}, T = 0) - E (Y_{0} | X_{R = 1}, T = 1) \end{array}

(13)

If E(Y₀ | X_R₌₁, T = 0) = E(Y₀ | X_R₌₁, T = 1), or in other words, if the CIA holds, then B_Match(X_R₌₁)= 0. Nevertheless, given X_R₌₁, X_R_=0,I=1, and X_R_=0,I=0 and assuming that p₀ and p₁ are identical for all the individuals, the CIA holds, that is, (Y₀, Y₁) ⊥ T | X_R₌₁, X_R_=0,I=1, X_R_=0,I=0 (see Equations 6 and 7). Thus,

\begin{array}{l} B_{M a t c h} (X_{R = 1}, X_{R = 0, I = 1}, X_{R = 0, I = 0}) = E (Y_{0} | X_{R = 1}, X_{R = 0, I = 1}, X_{R = 0, I = 0}, T = 0) - \\ E (Y_{0} | X_{R = 1}, X_{R = 0, I = 1}, X_{R = 0, I = 0}, T = 1) = 0 \end{array}

(14)

Unfortunately, researchers have no access to X_R_=0,I=1 or to X_R_=0,I=0. Yet, if HRE prevails, the researcher may use SAV(sp_J, X_R_=1,I=1, X_R_=0,I=1) as an additional source of information, which gives access, though indirectly, to the information contained in X_R_=0,I=1. In that case, the bias of the matching method will be

\begin{array}{l} B_{M a t c h} (X_{R = 1}, S A V_{0}, S A V_{1}) = E (Y_{0} | X_{R = 1}, S A V_{0}, S A V_{1}, T = 0) \\ - E (Y_{0} | X_{R = 1}, S A V_{0}, S A V_{1}, T = 1) \end{array}

(15)

If (Y₀, Y₁) ⊥ T | X_R_=1, SAV₀, SAV₁, the CIA holds and B_Match(X_R₌₁, SAV₀, SAV₁) = 0. The actual effect of integrating SAV into the evaluation process on B_Match(X_R₌₁, SAV₀, SAV₁) compared with B_Match(X_R₌₁) depends on the specific research environment. In general, if a significant estimation bias remains, the researcher may employ an additional estimation method which uses the matched comparison group as a basis for further adjustments. For example, Ho et al. (2007) used parametric methods, and Heckman et al. (1997) used the “difference in difference” method.

Eliciting Self-Assessment Variables

As an output of a cognitive process, SAV must be elicited directly from the individuals themselves, including participants in the program as well as nonparticipants. Furthermore, SAV must be elicited from the participants before the intervention takes place. Notably, changes in the participants’ SAV are expected to occur during the program as they gather information and update SAV accordingly (Eyal, 2010). Thus, SAV elicited from participants after the intervention has begun is incomparable to the participants’ SAV before the intervention or to nonparticipants’ SAV. Moreover, SAV should be obtained by asking well-posed questions that measure a clearly defined, relevant aspect of the individual’s performance after the intervention has taken place. For example, a question such as “If you do not attend the vocational training program, how would you predict your chances of being employed a year from now?” would yield much more useful information than a question such as “If you do not attend the vocational training program, how would you predict your chances of being successful in the job market a year from now?” The scale of responses also needs to be constructed carefully. Notably, SAV may be based on a verbal scale (e.g., very high, high, neither high nor low, low, very low) or a quantitative scale (e.g., 0%–100%). Another concern is whether to add the don’t know option to the possible responses. The advantage of adding the don’t know option is that it enables interviewees who do not have an assessment (because they are either unable to make an assessment or unwilling to invest the effort in doing so) to give a precise answer to the question. Furthermore, the rate of respondents who choose that option may be applied toward the empirical examination of whether HRE prevails in the specific research environment (Eyal, 2010). The disadvantage of providing the don’t know option is that some of the respondents may use it to avoid the cognitive burden of making an assessment. Finally, to elicit valuable and unique self-assessments, interviewees must have a comprehensive picture of the relevant intervention program. In addition, they need the ability to fully comprehend the assessment question as well as the answers. Thus, it would be useful to provide the interviewees (participants and nonparticipants) with information about the program (e.g., the target population, the length, and the contents) before they are asked about their assessments. However, eliciting SAV from people with low literacy levels might be challenging, even when they have comprehensive information of the program. For a study on eliciting probabilistic assessments in developing countries, in which a significant portion of the population is illiterate, see Delavande (2014).

Discussion

The TPB and HRE as a Framework for Program Evaluation

The current analysis used the HRE and the TPB as a framework for examining the use of SAV in program evaluation. It is worth noting that the TPB framework may be beneficial for program evaluation in other ways as well. First, the TPB could be used to construct a model of self-selection into the program (Equation 3), as a component of the overall program evaluation. The ability to appropriately model the process of selection into the program is especially important when using a nonexperimental database (Burch & Heinrich, 2015). Still, the TPB focuses on predicting a specific possible behavior rather than a choice between several behavioral options (i.e., self-selection). Thus, on the face of it, according to the TPB, SAV₁ alone should be considered when predicting the probability of attending a program, whereas SAV₀ which refers to the option of not attending a program should not be included. However, it is possible to adjust the model by adding other predictors (Ajzen, 2011). Furthermore, as mentioned above, attitudes toward the behavior may directly affect program outcome. Similarly, using the same rational, subjective norms, and perceived behavioral control may affect program outcome as well. It should be noted that findings in the literature support the notion that perceived behavioral control influence the amount of effort expended and the extent of perseverance in applying the intended behavior (Ajzen, 2012). In that case, the researcher may use attitudes, subjective norms, and perceived behavioral to obtain better predictions of Y₀ and Y₁.

Finally, to conduct a reliable and useful program evaluation, it is important to portray the broad picture of the program and its mechanisms (Deaton, 2010; Deaton & Cartwright, 2018; Heckman & Smith, 1995; Kabeer, 2019; White, 2009). Thus, exploring the process of selection into the program and the relationship of this process to the individuals’ program outcomes in the framework of both HRE and TPB will enhance the evaluation and the usefulness of its outcomes for policy makers. Actually, the TPB is already being used as a basis for planning interventions aimed at changing behavior and is often used to gain insight into the mechanisms through which these programs affect (or do not affect) the participant’s relevant behavior. See, for example, the review by Hardeman et al. (2002) on the application of TPB in program planning and evaluation; Van Ryn and Vinokur (1992) on job search behavior; Elliott and Armitage (2009) and Rosenbloom et al. (2009) on road safety; Todd and Mullan (2011) on reducing binge drinking; Kothe et al. (2012) and Lv and Brown (2011) on eating habits; and Aarø et al. (2006), Schmiege et al. (2009), and Tyson et al. (2014) on promoting healthy sexual behavior. In a meta-analysis conducted by Sheeran et al. (2016), modifying attitudes, norms or self-efficacy were shown to be affective in changing health behavior.

The Usefulness of SAV in a Specific Research Environment

Examination of the prevalence of HRE is essential in assessing the potential for using SAV in a specific research environment. As a first step, it will be useful to assess whether the assumption that HRE prevails is plausible. For example, if the program is mandatory, SAV will not affect selection into the program, contradicting A4 and implying that HRE does not prevail. One should also observe whether the participants have the knowledge and cognitive abilities required to make informative assessments (A3). If the assumption that HRE prevails is plausible, one can further follow the empirical method proposed by Eyal (2010), which examines each of the assumptions (A1–A4) to empirically establish the prevalence of HRE.

One of the key findings of the analysis is that the value of SAV as a source of information in program evaluation stems from its predictive power given X_R₌₁, not from its accuracy (Equation 15). Notably, there are systematic and predictable cognitive biases in individuals’ assessments (Kahneman & Tversky, 1979; Tversky & Kahneman, 1974), which would make SAV inaccurate in many cases. When SAV is integrated with commonly used evaluation methods to complement other sources of information (i.e., other variables), researchers can utilize the information inherent in SAV even when SAV itself is biased. Juster (1966), for example, found that although the average assessments that individuals made regarding their purchasing probabilities were lower than the actual probabilities, their assessments were still a significant predictor of future purchasing. Dominitz (1998) and Eyal (2010) obtained similar findings regarding earning expectations and working in the field of training after vocational training, respectively.

Future Directions

To empirically examine the contribution of SAV as a source of information in the evaluation process, there is a need to conduct within-studies which rely on a combined experimental and nonexperimental data set. This data set should include a measure of SAV that relates to the treatment under examination and that is elicited appropriately. The use of nonexperimental methods with and without SAV, and comparison of the results of these methods with the results of the experimental estimation (the “real” program effect) allow for examination of the contribution of SAV as a source of information. For examples of the use of within-studies, see Cook et al. (2008), Dehejia and Wahba (1999), Heckman et al. (1997, 1998), Heckman and Hotz (1989), LaLonde (1986), Smith and Todd (2005), and Steiner and Wong (2018) who dealt with the possible criteria to determine whether the experimental and nonexperimental outcomes do correspond. For a comprehensive discussion of the design and implementation of within-studies, see Wong and Steiner (2018). To obtain the data sets required for studies of this nature, necessary steps need to be taken in the early stages of program planning and data collection. Evaluations of behavior-changing interventions based on the TPB may provide the infrastructure necessary to collect data and conduct these studies. Another opportunity for data collection may arise when using mixed methods to evaluate the intervention program. If a survey among the program target population is carried out during the course of the mixed methods study, it will create an opportunity to elicit SAV for later use when estimating the program impact. The proposed approach is consistent with the concept underlying the use of mixed methods of collecting information from a variety of sources by using qualitative and quantitative methods and integrating it into the overall program evaluation. For more information on mixed methods in general, see Creswell et al. (2011), Fetters et al. (2013), Greene et al. (1989), and Pluye and Hong (2014). For the use of mixed methods in program evaluation, see Burch and Heinrich (2015). When a variety of suitable data sets are available, they will be useful in mapping the settings and conditions under which SAV will contribute most substantially to program evaluation. One possible direction is to examine the usefulness of SAV in different areas of intervention (e.g., vocational training and treatment of drug abusers). Another possible direction is to explore the usefulness of SAV by various characteristics of the target population (e.g., age, education level, and cognitive abilities). A different direction would be to look at the impact of factors relating to the process of eliciting SAV (e.g., using verbal vs. quantitative scale assessments and using the option of don’t know).

Another possible direction for future research is to explore the use of assessments made by people involved in the institutional selection process (e.g., caseworkers) regarding the effect of the program on the outcomes of individuals (participants and nonparticipants). The same rationale that justifies the use of SAV also justifies the use of Institutional Assessment Variables (IAV). The use of assessments made by the individuals themselves (i.e., SAV) as well as by the people involved in the institutional selection process (i.e., IAV) can provide researchers with a wide range of information possessed by all parties involved in the selection process.

Summary and Conclusion

HRE and TPB were used to explore the theoretical and methodological aspects of integrating SAV as a source of information in the evaluation of social intervention programs. The analysis focused on using the matching method to integrate SAV into the evaluation process to enable researchers to utilize the information contained in the self-assessments while utilizing other available sources of information (variables) as well. SAV may allow researchers to benefit from the advantages of the matching method while at least partially overcoming the inherent inability of that method to control for unobserved variables which affect both selection into the program and program outcomes.

To shed further light on the possible contribution of SAV to program evaluation, there is a need for unique data sets that enable a within-study design. The article described the required data sets and expanded on various issues that should be considered to elicit useful SAV. A variety of suitable data sets can be used to map the conditions under which SAV contributes most substantially to program evaluation, with emphasis on different fields of research and different target populations, as well as on the process of eliciting SAV. Information about the different aspects of employing SAV will provide a comprehensive view of the empirical value of SAV and the proper way to use it so that the full potential of SAV in program evaluation can be realized.

Footnotes

Acknowledgements

I would like to thank Andrew Clark, Carolyn Heinrich, and Jeffery Smith for commenting on earlier drafts of the paper. I would also like to thank Mimi Schneiderman for her tremendous help in editing the manuscript. Judy Dotan and my son, Omer, assisted with the editing as well. Last, but certainly not least, I would like to thank my wife Sara for reviewing the paper and for her help in editing and organizing it.

Author’s Note

The views expressed in this paper do not necessarily reflect those of Myers-JDC-Brookdale Institute. Any errors are my sole responsibility.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Yonatan Eyal

Author Biography

Yonatan Eyal holds a BSc in industrial and management engineering and a PhD in economics. His area of interest is the theoretical and practical aspects of program evaluation. He has a special interest in the use of individuals’ self-assessments in program evaluations.

References

Aarø

L. E.

Flisher

A. J.

Kaaya

Onya

Fuglesang

Klepp

K. I.

Schaalma

(2006). Promoting sexual and reproductive health in early adolescence in South Africa and Tanzania: Development of a theory-and evidence-based intervention programme. Scandinavian Journal of Public Health, 34, 150–158.

Ajzen

(1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50, 179–211.

Ajzen

(2011). The theory of planned behaviour: Reactions and reflections. Psychology and Health, 26, 1113–1127.

Ajzen

(2012). The theory of planned behavior. In Van Lange

P. A. M.

Kruglanski

A. W.

Higgins

E. T.

(Eds.), Handbook of theories of social psychology (pp. 438–459). Sage.

Ajzen

(2018). Theory of planned behavior [Ajzen Icek’s website]. http://people.umass.edu/aizen/tpb.html

Ajzen

Dasgupta

(2015). Explicit and implicit beliefs, attitudes, and intentions. In Haggard

Eitan

(Eds.), The sense of agency (pp. 115–144). Oxford University Press.

Armitage

C. J.

Conner

(2001). Efficacy of the theory of planned behavior: A meta-analytic review. British Journal of Social Psychology, 40, 471–499.

Burch

Heinrich

C. J.

(2015). Mixed methods for policy research and program evaluation. Sage.

Caliendo

Mahlstedt

Mitnik

O. A.

(2017). Unobservable, but unimportant? The relevance of usually unobserved variables for the evaluation of labor market policies. Labour Economics, 46, 14–25.

10.

Cook

T. D.

Shadish

W. R.

Wong

V. C.

(2008). Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within-study comparisons. Journal of Policy Analysis and Management, 27, 724–750.

11.

Creswell

J. W.

Klassen

A. C.

Plano Clark

V. L.

Smith

K. C.

(2011). Best practices for mixed methods research in the health sciences. Office of Behavioral and Social Sciences Research. http://twhworkshop.com/wp-content/uploads/2017/03/Best_Practices_for_Mixed_Methods_Research.pdf

12.

Deaton

A. S.

(2010). Instruments, randomization, and learning about development. Journal of Economic Literature, 48, 424–455.

13.

Deaton

A. S.

Cartwright

(2018). Understanding and misunderstanding randomized controlled trials. Social Science & Medicine, 210, 2–21.

14.

Dehejia

R. H.

Wahba

(1999). Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs. Journal of the American Statistical Association, 94, 1053–1062.

15.

Delavande

(2014). Probabilistic expectations in developing countries. Annual Review of Economics, 6, 1–20.

16.

Dominitz

(1998). Earning expectations, revisions, and realizations. The Review of Economics and Statistics, 80, 374–388.

17.

Dominitz

Manski

C. F.

Heinz

(2003). Will social security be there for you? How Americans perceive their benefits (Working paper no. 9798). National Bureau of Economic Research.

18.

Elliott

M. A.

Armitage

C. J.

(2009). Promoting drivers’ compliance with speed limits: Testing an intervention based on the theory of planned behaviour. British Journal of Psychology, 100, 111–132.

19.

Eyal

(2010). Examination of the empirical research environment of program evaluation: Methodology and application. Evaluation Review, 34, 455–486.

20.

Fetters

M. D.

Curry

L. A.

Creswell

J. W.

(2013). Achieving integration in mixed methods designs—Principles and practices. Health Services Research, 48, 2134–2156.

21.

Freedman

D. A.

(2006). Statistical models for causation: What inferential leverage do they provide? Evaluation Review, 30, 691–713.

22.

Greene

J. C.

Caracelli

V. J.

Graham

W. F.

(1989). Toward a conceptual framework for mixed-method evaluation designs. Educational Evaluation and Policy Analysis, 11, 255–274.

23.

Hardeman

Johnston

Bonetti

Wareham

Kinmonth

A. L.

(2002). Application of the theory of planned behaviour in behaviour change interventions: A systematic review. Psychology and Health, 17, 123–158.

24.

Heckman

J. J.

(1997). Instrumental variables: A study of implicit behavioral assumptions used in making program evaluations. Journal of Human Resources, 32, 441–462.

25.

Heckman

J. J.

Hotz

J. V.

(1989). Choosing among alternative nonexperimental methods for estimating the impact of social programs: The case of manpower training. Journal of the American Statistical Association, 84, 862–874.

26.

Heckman

J. J.

Ichimura

Smith

J. A.

Todd

P. E.

(1998). Characterizing selection bias using experimental data. Econometrica, 66, 1017–1098.

27.

Heckman

J. J.

Ichimura

Todd

P. E.

(1997). Matching as an econometric evaluation estimator: Evidence from evaluating a job training program. Review of Economic Studies, 64, 605–654.

28.

Heckman

J. J.

LaLonde

R. J.

Smith

J. A.

(1999). The economics and econometrics of active labor market programs. In Ashenfelter

Card

(Eds.), Handbook of labor economics (pp. 1865–2097). Elsevier Science.

29.

Heckman

J. J.

Smith

J. A.

(1995). Assessing the case for social experiments. Journal of Economic Perspectives, 9(2), 85–110.

30.

Heckman

J. J.

Smith

J. A.

(1998). Evaluating the welfare state. In Storm

(Ed.), Econometrics and economics in the 20th century: The Ranger Frisch centennial (pp. 241–318). Cambridge University Press.

31.

D. E.

Imai

King

Stuart

E. A.

(2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis, 15, 199–236.

32.

Holland

P. W.

(1986). Statistical and causal inference. Journal of the American Statistical Association, 81, 945–960.

33.

Imbens

G. W.

Wooldridge

J. M.

(2009). Recent developments in the econometrics of program evaluation. Journal of Economic Literature, 47, 5–86.

34.

Juster

T. F.

(1966). Consumer buying intentions and purchase probability: An experiment in survey design. Journal of the American Statistical Association, 61, 658–696.

35.

Kabeer

(2019). Randomized control trials and qualitative evaluations of a multifaceted programme for women in extreme poverty: Empirical findings and methodological reflections. Journal of Human Development and Capabilities, 20, 197–217.

36.

Kahneman

Tversky

(1979). Intuitive prediction: Biases and corrective procedures. In Makridakis

Wheelwright

S. C.

(Eds.), Forecasting: TIMS studies in management science (Vol. 12, pp. 313–327). North Holland Publishing.

37.

Kothe

E. J.

Mullan

B. A.

Butow

(2012). Promoting fruit and vegetable consumption: Testing an intervention based on the theory of planned behaviour. Appetite, 58, 997–1004.

38.

LaLonde

R. J.

(1986). Evaluating the econometric evaluations of training programs with experimental data. American Economic Review, 76, 604–620.

39.

Lechner

Wunsch

(2013). Sensitivity of matching-based program evaluations to the availability of control variables. Labour Economics, 21, 111–121.

40.

Brown

J. L.

(2011). Impact of a nutrition education program to increase intake of calcium-rich foods by Chinese-American women. Journal of the American Dietetic Association, 111, 143–149.

41.

Marschak

(1953). Economic measurements for policy and prediction. In Hood

W. C.

Koopmans

T. C.

(Eds.), Studies in econometric method (pp. 1–26). John Wiley.

42.

Mueller

C. E.

Gaus

(2015). Assessing the performance of the “counterfactual as self-estimated by program participants”: Results from a randomized controlled trial. American Journal of Evaluation, 36, 7–24.

43.

Mueller

C. E.

Gaus

(2018). Treatment effect estimation using self-estimated counterfactuals under varying conditions. Journal of Multidisciplinary Evaluation, 14(30), 16–36.

44.

Mueller

C. E.

Gaus

Rech

(2014). The counterfactual self-estimation of program participants: Impact assessment without control groups or pretests. American Journal of Evaluation, 35, 8–25.

45.

Neyman

J. S.

(1923). Sur les applications de la théorie des probabilités aux experiences agricoles. Essai des principles. Roczniki Nauk Rolniczych, 10, 1–51. (In Polish)

46.

Pluye

Hong

Q. N.

(2014). Combining the power of stories and the power of numbers: Mixed methods research and mixed studies reviews. Annual Review of Public Health, 35, 29–45.

47.

Rosenbaum

P. R.

Rubin

D. B.

(1983). The central role of the propensity score in observational studies for causal effects. Biometrica, 70, 41–55.

48.

Rosenbloom

Levi

Peleg

Nemrodov

(2009). Effectiveness of road safety workshop for young adults. Safety Science, 47, 608–613.

49.

Roy

A. D.

(1951). Some thoughts on the distribution of earnings. Oxford Economic Papers, 3, 135–146.

50.

Rubin

D. B.

(1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688–701.

51.

Schmiege

S. J.

Broaddus

M. R.

Levin

Bryan

A. D.

(2009). Randomized trial of group interventions to reduce HIV/STD risk and change theoretical mediators among detained adolescents. Journal of Consulting and Clinical Psychology, 77, 38–50.

52.

Sheeran

Maki

Montanaro

Avishai-Yitshak

Bryan

Klein

W. M.

. . . Rothman

A. J.

(2016). The impact of changing attitudes, norms, and self-efficacy on health-related intentions and behavior: A meta-analysis. Health Psychology, 35, 1178–1188.

53.

Smith

J. A.

Todd

P. E.

(2005). Does matching overcome LaLonde’s critique of nonexperimental estimators? Journal of Econometrics, 125, 305–353.

54.

Smith

J. A.

Whalley

Wilcox

N. T.

(2013). Are program participants good evaluators? https://pdfs.semanticscholar.org/0b4c/4d4ed563c645b9982bf7766c30fec1b56e9e.pdf?_ga=2.207827474.183340695.1577527974-1189006956.1577527974

55.

Splawa-Neyman

Dabrowska

D. M.

Speed

T. P.

(1990). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statistical Science, 5, 463–480.

56.

Steiner

P. M.

Wong

V. C.

(2018). Assessing correspondence between experimental and nonexperimental estimates in within-study comparisons. Evaluation Review, 42, 214–247.

57.

Todd

Mullan

(2011). Using the theory of planned behaviour and prototype willingness model to target binge drinking in female undergraduate university students. Addictive Behaviors, 36, 980–986.

58.

Tversky

Kahneman

(1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124–1131.

59.

Tyson

Covey

Rosenthal

H. E.

(2014). Theory of planned behavior interventions for reducing heterosexual risk behaviors: A meta-analysis. Health Psychology, 33, 1454–1467.

60.

Van Ryn

Vinokur

A. D

. (1992). How did it work? An examination of the mechanisms through which an intervention for the unemployed promoted job-search behavior. American Journal of Community Psychology, 20, 577–597.

61.

White

(2009). Theory-based impact evaluation: Principles and practice. Journal of Development Effectiveness, 1, 271–284.

62.

Wong

V. C.

Steiner

P. M.

(2018). Designs of empirical evaluations of nonexperimental methods in field settings. Evaluation Review, 42, 176–213.