The CodeRead System

Abstract

Most social science research uses data that originate, in one form or another, as written or spoken text. Quantitative researchers code these data very strictly, categorizing answers to questions into fixed groups. In contrast, qualitative researchers typically code free-form text by marking it up according to a set of ideas about the nature and content of the text. This article suggests the use of some elementary techniques from the field of statistical natural language processing to partially automate the process of coding large quantities of free-form textual data. The article presents CodeRead, a set of tools that implement these techniques. The system’s principal innovation is its ability to generate coding rules from a precoded sample of text. This capacity allows for the analysis of much longer textual data than was previously practical. It also insures that the rules used for coding such data are specific and uniformly applied.

Keywords

coding content analysis text analysis automated coding

Get full access to this article

View all access options for this article.

References

Cerulo, K. A. (1998). Deciphering violence: The cognitive structure of right and wrong. New York: Routledge.

Charniak, E. (1993). Statistical language learning. Cambridge, MA: MIT Press.

Clausen, S.-E. (1998). Applied correspondence analysis: An introduction. Beverly Hills, CA: Sage.

Franzosi, R. (1995). The puzzle of strikes. Cambridge, UK: Cambridge University Press.

Friedl, J.E. F. (1997). Mastering regular expressions (2nd ed.). Cambridge, MA: O’Reilly.

Gamson, W. A. (1992). Talking politics. Cambridge, UK: Cambridge University Press.

Greenacre, M. , & Blasius, J. (Eds.). (1994). Correspondence analysis in the social sciences. London: Academic Press.

Kelly, E. F. , & Stone, P. J. (1975). Computer recognition of English word senses. Amsterdam: North-Holland.

Krinsky, J. D. (2000, August). Organizing the organizing of the unorganized. Opposition to workfare in New York City and the recombination of contentious repertoires. Paper presented at the meeting of the American Sociological Association, San Francisco.

10.

Manning, C. D. , & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press.

11.

Perrin, A. J. (1994). Come the revolution: Movement politics and Namibian independence. Unpublished honors thesis, Swarthmore College, Swarthmore, PA.

12.

Perrin, A. J. (1995, April). Election fetishism: Perceptions of southern African democratization. Paper presented at the meeting of the New York African Studies Association, New York.

13.

Srinivasan, S. (1997). Advanced Perl programming. Cambridge: O’Reilly.

14.

Tilly, C. (1995). Popular contention in Great Britain. 1758-1834. Cambridge, MA: Harvard University Press.

15.

Wagner-Pacifici, R. (1994). Discourse & destruction: The City of Philadelphia vs. MOVE. Chicago: University of Chicago Press.

16.

Wall, L. (1996). Programming Perl. Sebastopol, CA: O’Reilly.