Looking for the Smoking Gun

Abstract

As a result of litigation over the past decade, major tobacco companies were compelled to make public a broad range of previously confidential documents. We have created a series of corpora from the tobacco industry documents (TIDs) for three purposes: (1) to establish baseline descriptions of various linguistic features of this unique set of texts; (2) to identify TIDs in which rhetorical manipulation (“deception”) may have occurred and to estimate the extent and prevalence of manipulation; (3) to analyze manipulation in order to classify it and develop means to identify similar manipulation in other industry document sets. Our threepart corpus creation strategy employed rigorous sampling methods. First, we drew a limited sample from the largest collection of TIDs, to determine a representative classification of text types and to estimate their proportions within the overall body of texts. Then, we created a reference corpus (500,000+ words) constituting a stratified random sample of all TIDs, whether or not they exhibit manipulation. Finally, we compiled a corpus of texts presumed to exhibit rhetorical manipulation. We assumed that multiple drafts of a text or versions of a text prepared for different audiences constituted rhetorical manipulation. This article presents our experience with the sampling methods utilized in this corpus-building process and our findings regarding text types comprising the reference corpus.

Get full access to this article

View all access options for this article.

References

Bero, Lisa. 2003. Implications of the Tobacco Industry Documents for Public Health and Policy. Annual Review of Public Health 24:267-288.

Cummings, K. Michael, and R. W. Pollay. 2002. Exposing Mr Butts’Tricks of the Trade. Tobacco Control 11 (suppl. 1): i1-i4.

Fisher, Laurie. 2000. Update: Master Settlement Agreement between the States and theTobacco Industry (United States). Cancer Causes& Control 11 (3): 285-287.

Francis, N., and H. Kucera, directors. (early 1960s). The Brown Corpus. Brown University, Providence, RI.

Glantz, Stanley, John Slade, Lisa Bero, Peter Hannauer, and Deborah Barnes. 1996. The Cigarette Papers. Berkeley: University of California Press.

Gunnarsson, Britt-Louise, Per Linell, and Bengt Nordberg, eds. 1997. The Construction of Professional Discourse. New York: Longman.

Hilts, Philip J. 1996. Smokescreen: The Truth Behind the Tobacco Industry Coverup. Reading, MA: Addison-Wesley.

The House Committee on Commerce. 1998. Chairman Tom Bliley Releases Subpoenaed Tobacco Documents to the American People. http://www.house.gov/commerce/TobaccoDocs/documents.html.

Hurt, Richard D., and Channing R. Robertson. 1998. Prying Open the Door to the Tobacco Industry’s Secrets about Nicotine: the Minnesota Tobacco Trial. Journal of the American Medical Association 280:1173-1181.

10.

Kretzschmar, William A., Jr., Charles Meyer, and Dominique Ingegneri. 1997. Uses of Inferential Statistics in Corpus Studies. In Corpus-based Studies in English, edited by Magnus Ljung, 167-177. Amsterdam: Rodopi.

11.

Malone, Ruth, and Edith Balbach. 2000. Tobacco Industry Documents: Treasure Trove or Quagmire? Tobacco Control 9:334-338.

12.

Office of the Attorney General, State of California. 2001. Tobacco Master Settlement Agreement Summary. http://caag.state.ca.us/tobacco/resources/msasumm.htm.

13.

Scott, Michael. 1998. WordSmith Tools. Oxford, UK: Oxford University Press.

14.

Tobacco Control. 2002. How to Access Tobacco Industry Documents. Tobacco Control 11 (suppl. 1): i39-i39.

15.

Tobacco Documents Online. 1999-2003. http://www.tobaccodocuments.org.

16.

Upton, Thomas, and Connor Ula. 2001. Using Computerized Corpus Analysis to Investigate the Textlinguistic Discourse Moves of a Genre. English for Specific Purposes: An International Journal 20:313-329.