Abstract
This study attempts a series of quantitative analyses on a cornucopia of data in the Corpus of Scientific Journal Articles (CSJA), a special- purpose corpus consisting of 360 journal articles in 10 major scientific fields. Major findings include: (1) the average word length is 6.31 characters;(2) a word-form occurs 36.8 times on average;(3) a text category having a larger number of running words tends to have a higher word recurrence rate; (4) most of the 100 most frequent word-forms are function words; (5) in comparison with the COBUILD corpus and the LOB corpus, numbers and letters are much more frequently used in the CSJA than in the other two corpora; (6) only a very limited number of word-forms have a high recurrence rate while more than half of the vocabulary occur only once or twice; (7) despite disciplinary difference, word frequency profiles of the ten scientific fields are very similar, showing that different scientific fields bear similar patterns in the use of words.
Get full access to this article
View all access options for this article.
