This article describes txttool, a command that provides a set of tools for managing free-form text. The command integrates several built-in Stata functions with new text capabilities. These latter functions include a utility to create a bag-of-words representation of text and an implementation of Porter's (1980, Program: Electronic library and information systems 14: 130–137) word-stemming algorithm. Collectively, these utilities provide a text-processing suite for text mining and other text-based applications in Stata.
BelottiF., and DepaloD.2010. Translation from narrative text to standard codes variables with Stata. Stata Journal10: 458–481.
2.
BenoitK., LaverM., and MikhaylovS.2009. Treating words as data with error: Uncertainty in text statements of policy positions. American Journal of Political Science53: 495–513.
3.
GrimmerJ., and StewartB. M.2013. Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis21: 267–297.
4.
LaverM., BenoitK., and GarryJ.2003. Extracting policy positions from political texts using words as data. American Political Science Review97: 311–331.
5.
LoweW., and BenoitK.2013. Validating estimates of latent traits from textual data using human judgment as a benchmark. Political Analysis21: 298–313.
6.
PorterM. F.1980. An algorithm for suffix stripping. Program: Electronic library and information systems14: 130–137.
7.
RaciborskiR.2008. kountry: A Stata utility for merging cross-country data from multiple sources. Stata Journal8: 390–400.