Document identification using shape trees

Abstract

We present a technique to identify documents using an abstraction of a scanned image. The types of documents to identify must be known to the system a priori. To this end, the necessary features are saved in a case base [8] as shape trees. This file also contains rules for possible further processing. In an extremely reduced image, it is possible to filter out the significant, distinguishing information from the image and recognize it using case-based reasoning (CBR) [8]. This method has been demonstrated and proven by an example of experiments using medical order forms. An average of 97% of the forms were correctly identified; none were identified incorrectly.

Keywords

Document identification shape tree document processing case based reasoning optical mark reading similarity segmentation

Get full access to this article

View all access options for this article.