Abstract
This article describes a new data base for English word-usage patterns. It improves on older efforts by including television and personal commentaries as sources for the main corpus studied. More than a third of a million words were sampled from media and nonmedia sources and analyzed to produce a parsimonious listing of 6505 words (types) and their frequencies. The reliability and validity of this list were established in a variety of ways, and a computer program based on the list was used to analyze two different sets of data (an exploratory set and one representing an a priori hypothesis about word usage). A mere 206 different words were seen to account for 57% of all the words in the corpus, and 95% of this small set had its roots in Middle English or some older form of English.
Get full access to this article
View all access options for this article.
