Abstract
Interpreting teaching and research need a large number of real, high-quality interpreting corpus, but the existing interpreting corpus has many shortcomings, such as small scale, single type, and uneven quality. In this paper, we utilize big data technology to build a powerful, easy-to-use and open-sharing English-Chinese interpreting corpus database to provide rich and diverse high-quality interpreting examples for the teaching and research of interpreting. We collect English-Chinese interpreting data of various types, scenarios, topics, and levels from the Internet, TV broadcasts, and other channels, clean, standardize, slice, align, and annotate the data, store the metadata information in XML format, and design and implement the structure, functions, and interfaces of the corpus. This paper mainly introduces the data method, model construction, and application effect of the corpus, including the collection, organization, annotation, storage, management, retrieval, analysis, display, and application of the corpus.
Get full access to this article
View all access options for this article.
