Abstract
Open-ended survey questions can provide researchers with nuanced and rich data, but content analysis is subject to misinterpretation and can introduce bias into subsequent analysis. We present a simple method to improve the semantic validity of a codebook and test for bias: a “self-coding” method where respondents first provide open-ended responses and then self-code those responses into categories. We demonstrated this method by comparing respondents’ self-coding to researcher-based coding using an established codebook. Our analysis showed significant disagreement between the codebook’s assigned categorizations of responses and respondents’ self-codes. Moreover, this technique uncovered instances where researcher-based coding disproportionately misrepresented the views of certain demographic groups. We propose using the self-coding method to iteratively improve codebooks, identify bad-faith respondents, and, perhaps, to replace researcher-based content analysis.
Introduction
Political scientists often use open-ended survey questions to allow respondents to express ideas in their own words, and then the researchers code those responses using a pre-established codebook. For example, a wealth of scholarship in American and comparative politics on policy and valence issue priorities relies on open-ended questions (e.g., by Gallup or MORI) that gauge citizens’ perceptions of the most important problem/issue currently facing the country (for overviews see Green and Hobolt, 2008; Wlezien, 2005). Other studies use open-ended questions to assess citizens’ ideological positions (Swyngedouw, 2001). And in the running example we use here, more than three decades of scholarship based on Iyengar’s (1994) canonical study commonly use open-ended survey responses, often embedded in experiments, to examine how framing an issue differently can affect who/what people hold responsible for that issue. The approach of giving people open-ended survey questions and then having researchers code the responses can provide rich and nuanced information about respondents’ attitudes, helping advance and inform these and other areas of research.
Yet, coding open-ended responses means inferring preferences and sentiments by applying summary codes, opening the possibility that some responses are coded out of line with respondents’ intended statements. Relatedly, when we apply a standard codebook to all open-ended responses, we may end up with biased results, with some types of respondents more likely to have their responses miscoded—an error that has an impact on the semantic validity of the data. In short, by assuming that all respondents who mentioned a certain phrase meant the same thing and should be coded the same way, we run the risk of misrepresenting the true opinions of some (non-random) portion of our samples.
Here, we present a “self-coding” method where respondents first provide open-ended responses and then code their own responses into categories. 1 We demonstrated this technique by comparing data from a traditional coding process using an established codebook to data using the self-coding method. We found that the response codes assigned according to each process were not within the acceptable range for standard intercoder reliability. Moreover, we found that the variation in agreement between the codebook-based coding and respondents’ self-coding was significantly correlated with key demographic characteristics. This demonstration suggests that the standard approach of researchers coding open-ended responses may not only lead to misinterpretation of respondents’ attitudes but also introduce bias into the data. However, by iteratively using the self-coding method, refining the codebook to account for problems identified, and then repeating until no more problems are uncovered, researchers can arrive at a codebook that is optimally positioned to measure the attitudes of the survey respondents. In addition, as online data quality concerns become more prevalent, self-coded open-ended questions could help ensure that respondents are acting in good faith. And, where appropriate, researchers might even choose to use the self-coding method for their entire data collection process in order to maximize validity while saving resources.
The benefits and challenges of open-ended survey questions
Public opinion research has long struggled with the trade-off between rich open-ended responses and easy-to-analyze closed-ended responses. This choice is particularly vexing for large datasets that require quantitative analysis, and it has meaningful consequences for the data collected (e.g., Reja et al., 2003; Schuman and Presser, 1996). Closed-ended questions are much easier to analyze, but suffer other limitations such as response priming and measurement error (Bishop, 1987; Krosnick and Alwin, 1987; Schuman and Presser, 1996; Schwarz et al., 1992).
Open-ended questions can provide responses that are specific and nuanced in ways that are useful to researchers seeking to understand public opinion (Hickey and Kipping, 1996; Lazarsfeld, 1944; Montgomery and Crittenden, 1977). Yet in coding open-ended statements into categories for analysis, researchers run the risk of misrepresenting respondents’ true intentions. A noted example is research on the judicial political knowledge questions on the American National Election Study (Gibson and Caldeira, 2009), which are coded according to a strict codebook and most likely underestimate public knowledge of the court (Boudreau and Lupia, 2011).
Krippendorff addresses the concern over misinterpreting respondents’ intentions in his discussion of “sematic validity,” which he defines as “the extent to which the results of coding correspond to how ordinary readers, or better still the respondents themselves would categorize what they say” (Krippendorff, 2008). Semantic validity is especially salient when we consider the possibility that a researcher-designed codebook might systematically misinterpret the responses of those with specific demographic characteristics.
Self-coding as a diagnostic method for semantic validity
Accurately coding opinions into discrete categories is an arguably difficult coding task for a professional coder. Best practices call for developing a detailed codebook (i.e., a coding schema that instructs researchers to categorize one type of observation into category X, and another type of observation into category Y, etc.) based on a close read of a random sample of observations (e.g., Baumgartner et al., 2008; Simon and Xenos, 2000). Yet these approaches maximize the degree to which researchers can consistently code responses from the point of view of the codebook they designed. It is possible that codebooks could be improved even further—and in some cases even replaced—by asking respondents to code their own responses, since they should know best what they are trying to say. 2 Although survey research is fraught with various measurement error challenges, including those rising from the potential instability and/or malleability of attitudes, most researchers agree that respondents indeed have “true attitudes,” and that our job is to try to capture them (Geer, 1988; Zaller and Feldman, 1992). 3
We suggest that researchers embarking on quantitative content analysis of open-ended survey responses employ the “self-coding” diagnostic method to support the semantic validity of their codebook and guard against biased data. The self-coding method involves first asking respondents to offer an open-ended response to a question about their opinion on something, and then asking them to code their own open-ended responses into categories. Then, without consulting the respondents’ self-codes, the researchers code the open-ended responses into the same categories using the codebook. Finally, researchers compare the self-codes to the researcher-based codes to assess their similarity overall and to test for any bias in the researcher-based coding, asking two research questions:
RQ1: How similar is respondents’ self-coding of their open-ended responses to the researcher-based coding of their responses using the codebook?
RQ2: Do any variances in the degree of agreement between the self-coding and the researcher-based coding correlate with respondents’ demographic characteristics?
Demonstration of the self-coding method
In October 2018, we recruited a sample of 1,746 respondents based in the United States using Amazon’s Mechanical Turk (MTurk) platform (Berinsky et al., 2012; Coppock, 2019; Mullinix et al., 2016). 4 We recruited subjects with a brief call stating, “Complete an online survey about your attitudes and opinions, approximately 7 minutes” and offered to pay them $0.70 for a completed survey. We utilized best practices for MTurk data collection, including limiting respondents to those with an approval rating of 95% or higher and having completed 100 jobs or more (Peer et al., 2014). However, even with these measures, there are increasing concerns about data quality on MTurk (Ahler et al., 2021; Kennedy et al., 2020; Ryan, 2020), including inattentive respondents, foreign respondents, and spoofed IP addresses. As we will discuss, an added benefit of open-ended questions is their ability to help identify good-faith respondents (Ryan, 2020).
For other research-related purposes, we were interested in replicating and extending the foundational research by Iyengar (1994) on media framing effects and so we used his questions and codebook here. 5 Iyengar found that when respondents are exposed to news reports that frame poverty in an episodic way (stories about individual poor people), respondents are more likely to hold individuals (vs. government/society) responsible for the problem of poverty than when exposed to news reports that frame poverty in a thematic way (stories about broad trends in poverty rates). The study by Iyengar used open-ended responses that were then coded into 93 categories and later aggregated according to a codebook into three broader categories used for analysis: “Government/Society,” “Individual,” and “Other.” Although this is only one codebook, this foundational study has led to 30 years of research furthering our understanding of how framing influences responsibility attribution for social problems, from gun violence (Haider-Markel and Joslyn, 2001) to business failures (Williams et al., 2011) to economic adversity and voter turnout (Arceneaux, 2003).
Following Iyengar, we asked our respondents to read a brief news report about poverty, randomized to include either an episodic or thematic frame, in line with the substantive interests of our larger research agenda (see Appendix A). We then asked, “In your opinion, what are the most important causes of poverty?” and allowed them to enter up to 10 open-ended responses. Unlike Iyengar’s original survey, however, we added a follow-up question that piped in each cause of poverty the respondent had entered, surrounded by the following text:
When you say that one of the most important causes of poverty is [response from text box], do you think that cause is mostly because of individual people and the choices they make or is it mostly because of how government and society are these days?
Mostly about government/society
Mostly about individual people
Mostly about something else
This question was repeated for each of the open-ended causes the respondent provided. Of the 1,746 respondents, 1,541 provided at least one cause of poverty (on average each respondent offered 3.8 causes). In total, respondents offered 6,649 unique, open-ended causes of poverty, which they then coded into one of the three categories listed above. We then coded a sample of the open-ended responses ourselves using Iyengar’s codebook. Finally, we focused on four specific codes (Education, Laziness, Unemployment, and Cost of Living), and used a series of tests to compare the respondents’ coding of their own responses to our researcher-based coding in order to assess the semantic validity of our codebook and identify any potential bias in our researcher-based coding.
Analysis
We performed our researcher-based coding by drawing a random sample of 200 respondents (approximately 13% of the total 1,541 respondents who offered at least one response), yielding 740 open-ended responses. Each of these open-ended responses was then coded by a researcher according to the three higher-level codes (Government/Society, Individual, and Other) using the original codebook. Crucially, this was done without looking at the self-codes the respondents gave to their open-ended responses. We then calculated the intercoder reliability between our traditional coding and the respondents’ self-coding, dropping responses that were coded “Other” by either respondents or researchers, leaving 667 responses to compare. 6 Comparing the two sets of “coders”—the researcher and the survey respondents—the agreement was 75% but the Cohen’s Kappa was only 0.46, indicating substantial coding divergence (Freelon, 2013). In other words, when we asked respondents to code themselves, they provided codes that differed substantially from what we assumed when following the codebook.
To better understand this disagreement, we looked closely at four categories of causes given for poverty: Education, Laziness, Unemployment, and Cost of Living. We selected these categories for two reasons. First, they offer balance on higher-order codes: Education and Laziness fall under Individual in the original codebook, whereas Unemployment and Cost of Living fall under Government/Society. Second, in our previous replication of Iyengar’s study (Feezell et al., 2019), these four categories were some of the most frequently coded. In that study, 897 respondents gave a total of 2,699 responses on the causes of poverty, which researchers then coded into the same 93 possible coding categories mentioned above: 10% were coded as Education, 5% as Laziness, 16% as Unemployment, and 4% as Cost of Living.
In order to identify all the responses that should fall under each of these four categories, we developed a set of keywords to flag any responses that might be about the given category (see Appendix B for details). The responses identified by the keyword searches for each category were then checked by hand and false positives were eliminated. To clarify: our intent in this exercise was not to assign each response in our dataset a specific code, or even to examine all the instances of responses under these four causal categories according to the codebook. Rather, our goal was to select a subset of responses we were certain belonged in these four categories of the codebook as applied by researchers, allowing us to compare whether the self-codes for these items (Government/Society, Individual, Other) assigned by respondents matched the researcher codes assigned using the codebook.
Figure 1 shows the respondents’ self-codes of their responses for each specific causal category (Cost of Living, Unemployment, Education, Laziness) into the three higher-level categories of Individual, Government/Society, and Other. When a respondent identified Cost of Living as a cause of poverty (the first column of Figure 1), the codebook would have instructed researchers to code this response as attributing responsibility to Government/Society. Respondents’ self-codes reflected a similar instinct, with 93% of respondents self-coding their Cost of Living responses as Government/Society. In this case, the codebook and the respondents performed quite similarly on the coding task.

Respondents’ Self-Coding of Four Causes of Poverty.
The second column of Figure 1, however, shows significant divergence. Seventy-six percent of respondents who identified an Unemployment response coded it as the responsibility of Government/Society (in line with our codebook), but nearly 16% of respondents coded their Unemployment responses as Individual instead. The vast majority of all the Unemployment responses were short phrases like “no jobs” or “unemployment,” meaning researchers would be hard-pressed to find insights in the open-ended responses that would indicate to whom/what the respondent attributed responsibility.
The third column of Figure 1 shows responses that identify Education as a cause of poverty. Following the codebook directives, each Education response should be categorized as attributing responsibility to the Individual. When respondents coded themselves, however, they were almost evenly split between attributing responsibility to the Individual (45%) and Government/Society (50%). That is, half of the respondents indicated the researcher-assigned code was not an accurate interpretation of their response.
Finally, the fourth column of Figure 1 shows that the vast majority (94%) of Laziness causes were self-coded by respondents as Individual, consistent with the original codebook. Thus, for two of the topic categories in Figure 1, the codebook and the respondents performed similarly, but for two of them, there was substantial divergence. Looking both at Figure 1 and at the open-ended responses, we can see that while our strict codebook rules allowed researchers to easily categorize responses as either Government/Society or Individual (leaving only four Other responses), our respondents saw more nuance. They selected Other for 73 of the 740 responses, choosing to categorize responses like “lack of education,” “lack of employment,” and “bad luck” as Other.
This mismatch in terms of the codes assigned by the codebook and those that respondents assigned themselves should give researchers pause. When we assume what respondents mean by their responses and apply strict codebook rules, we risk misrepresenting respondent attitudes and biasing results in the initial coding and subsequent analysis.
Even more concerning is the possibility that we may be systematically biased in our interpretation of verbatim response coding. To assess this potential, we looked at the two categories with the greatest divergence in coding among the four we examined: Education and Unemployment. We ran
The rows in Table 1 offer comparisons of the extent to which different demographic groups attributed responsibility to Government/Society, out of the total amount of responsibility items they attributed to the Individual or to Government/Society (thus dropping “Other” items). Looking at the Education-related responses in the first column, we see significant differences by income; those who reported they made less than $75,000 a year (the bottom third of the sample) were also significantly more likely to attribute responsibility for Education-related causes to Government/Society (
Testing demographic differences in attributing responsibility to Government/Society (vs. the Individual) for two identified causes of poverty: Education and Unemployment.
Turning to Unemployment-related responses in the second column, we see that women were significantly more likely than men to attribute responsibility for poverty as a result of Unemployment to Government/Society (
But to what extent does traditional researcher-based coding harm our overall analysis? For our study of whether news framing of poverty as episodic or thematic affects respondents’ responsibility attributions, we tried to replicate the substantive findings of Iyengar’s original study. Specifically, we used respondents’ self-coded data to calculate whether respondents in the thematic condition were more likely to attribute responsibility to Government/Society (vs. to the Individual), compared with respondents in the episodic condition. These analyses confirmed the substantive findings already documented in the literature: respondents who received the thematic news frames were more likely to provide open-ended responses on the causes of poverty that attributed responsibility to government and society (
Yet the fact that we replicated the substantive findings of Iyengar’s earlier research does not change our documented evidence that, for two major codes utilized, we are misrepresenting many respondents’ intended answers, and in a non-random way. The self-coding method allows researchers to identify inconsistencies between self-codes and researcher-assigned codes and improve their codebooks accordingly. As our test case has illustrated, if drawing a new sample of respondents to engage in self-coding is a possibility, the self-coding method could be used in replication studies to ensure previous findings are robust to any demographic-related bias in how the codebook treated respondents’ answers.
Applying the self-coding method
Researchers can apply the self-coding method in multiple ways. In the case of our demonstration above, we could refine the codebook instructions based on the problems related to semantic validity and bias we found, for example by coding responses involving education and unemployment into finer grained categories, some of which would be categorized as Individual and some as Government/Society. 7 Our test would then be repeated, iterating until we reach satisfactorily high levels of semantic validity and low levels of bias. Or at a minimum, if unable to change a codebook long-established in the literature, we would be able to interpret our substantive findings in the context of any bias we discovered through the self-coding method, potentially opening up new avenues of research. If at any point we were satisfied with the broad nature of the coding categories we gave respondents, we could decide to proceed with the “code yourself” method for the remainder of our data collection, perhaps repeating the researcher-based coding of a random sample of observations as a reliability check. Either way, we would have a stronger understanding of the degree to which our original conceptualization of categories matched—and did not match—how people themselves think about their responses.
As an added benefit, the open-ended questions used in the self-coding method are also valuable because they can help identify good-faith respondents on online survey platforms like MTurk. Professional survey-takers who are completing surveys as quickly as possible, or foreign survey-takers who do not understand the nuances of experimental treatments, are more likely to skip open-ended questions or to give nonsensical responses or responses that frequently repeat words like “good” and “nice” (Ryan, 2020). Researchers could search for these words, or create a word matrix to identify outlying words, to identify respondents they may want to exclude from analyses. High-level open-ended questions may also lead bad-faith actors to skip those questions altogether (Ryan, 2020), making it easy for researchers to identify them (approximately 12% of our original respondents did not provide any open-ended responses at all). The increasing use of web surveys has the potential to revitalize the use of open-ended questions (Smyth et al., 2009). By including open-ended questions, researchers can not only get richer data, but they can also quickly identify bad-faith respondents to be removed from the sample.
In addition, the self-coding method may serve as an effective way to measure individual-level attitudes without priming respondents with closed-ended options, leading to more accurate opinion measurement, of particular use when investigating questions with high degrees of social desirability. As one final advantage to this technique, contingent on agreement between self-coded and researcher-coded responses, the self-coding method may serve as a cost-effective way to accurately code large samples of open-ended responses.
Conclusion
Open-ended questions can give researchers a depth of understanding on complicated issues, like the causes of poverty. But sometimes we want to categorize those detailed responses into broad groupings. In these cases, a self-coding method could help ensure that researchers do so in a way that reflects the intent of the respondent.
We offer the self-coding method as one way to help researchers improve the semantic validity of open-ended response coding and help reduce unintentional bias introduced in the coding process. Our demonstration of this method revealed an unacceptably low level of intercoder reliability between our coding of responses according to an established codebook and to the respondents’ own coding of their responses. Moreover, we showed cases where respondents’ demographic factors were significantly correlated with their choice of code, meaning that if we coded their responses for a given cause into one lump code, we would be misrepresenting some demographic groups significantly more than others. By using the self-coding method, researchers could develop more nuanced codebooks or, were they using set codebooks for longitudinal or replication studies, they could at least be aware of the non-random ways in which the codebook might be misrepresenting respondents’ intentions.
It is important to note that our test relied on a single codebook. Yet, other codebooks may suffer from similar problems of semantic validity manifested in different ways. It is easy enough to realize in hindsight that our codebooks should be nuanced. But as the 30-year-old literature based on—and continuing to use—Iyengar’s responsibility attribution codebook illustrates, seeing the potential problems with a codebook is not always as easy or practical as we would want, especially given that researchers traditionally develop codebooks without any iterative “feedback” from the non-researcher humans whose latent attitudes they are designing these codebooks to capture. In cases where researchers are building on—and thus need to start by replicating—past research using established codebooks, researchers could use the self-coding method at least to be aware of any non-random ways in which the codebook might be misrepresenting respondents’ intentions.
For example, with better insights from respondents, we could differentiate how
Supplemental Material
sj-pdf-1-rap-10.1177_20531680211031752 – Supplemental material for Self-coding: A method to assess semantic validity and bias when coding open-ended responses
Supplemental material, sj-pdf-1-rap-10.1177_20531680211031752 for Self-coding: A method to assess semantic validity and bias when coding open-ended responses by Rebecca A. Glazier, Amber E. Boydstun and Jessica T. Feezell in Research & Politics
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Notes
Carnegie Corporation of New York Grant
This publication was made possible (in part) by a grant from the Carnegie Corporation of New York. The statements made and views expressed are solely the responsibility of the author.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
