Abstract
Most social science research uses data that originate, in one form or another, as written or spoken text. Quantitative researchers code these data very strictly, categorizing answers to questions into fixed groups. In contrast, qualitative researchers typically code free-form text by marking it up according to a set of ideas about the nature and content of the text. This article suggests the use of some elementary techniques from the field of statistical natural language processing to partially automate the process of coding large quantities of free-form textual data. The article presents CodeRead, a set of tools that implement these techniques. The system’s principal innovation is its ability to generate coding rules from a precoded sample of text. This capacity allows for the analysis of much longer textual data than was previously practical. It also insures that the rules used for coding such data are specific and uniformly applied.
Get full access to this article
View all access options for this article.
