Abstract
Turning text into numbers—and ultimately into an interpretable result—typically involves a processing pipeline. Whether explicit or hidden inside code, each step in that pipeline involves a host of decisions. This article follows a pipeline and unpacks some of the attendant decisions that analysts face at various stages, highlighting the specific challenges that confront analysts investigating historical data. Data, for example, can be less available; the quality of those data can be poor; and the development of tools and models can lag when compared to what is available for present-day English. In the face of such challenges, options are explored, but no argument is made for a particular approach or method—results and their interpretation will be shaped by analysts’ epistemic commitments and expertise. An argument is made, however, that an effective processing pipeline will be generative. It will be one that not only can be explained and defended but also can help us see the histories of English—its structures, its uses, and its users—in new ways. In that spirit, a processing pipeline can be understood not as a series of immobilizing decisions, but as an invitation.
Keywords
Get full access to this article
View all access options for this article.
