Knowledge Engineering for the JASPER Fact Extraction System

Abstract

Though complete understanding of an unlimited range of text is beyond the scope of current natural language understanding technologies, it is possible to build systems that extract specific pieces of information from a limited range of texts, and to deploy such applications to meet real-world needs. JASPER is one such fact extraction system recently developed and deployed by Carnegie Group for Reuters Ltd. JASPER uses a template-driven approach and partial analysis techniques to extract certain key pieces of information from a limited range of text. We have restricted the scope of the problem and have separated the domain knowledge component of the system from the underlying core technology; because of this, the knowledge engineering task, often a bottleneck in developing text processing systems, is manageable, and it is therefore feasible to use the JASPER approach to build deployable systems that solve real business problems. JASPER gets excellent results in terms of both accuracy (83%) and speed (less than 25 s per text). It does this by combining frame-based knowledge representation, object-oriented processing, powerful pattern matching, and heuristics which take advantage of stylistic conventions, including lexical, syntactic, semantic, and pragmatic regularities observed in the text corpus. The shallow, localized processing approach focusses on the information to be extracted and ignores irrelevant text.

Get full access to this article

View all access options for this article.