Abstract
With the rapid development of information technology, converting paper books into electronic documents will be more widely used. The document images of thick book captured by imaging device have a certain degree of distortion, which has some damage on OCR Recognition effect. To solve this problem, this paper proposes a fast restoring method for Chinese document images with arbitrarily warping. Firstly, the Chinese characters are extracted step by step using the Characters and Text lines Locate Alternately (CTLA), and the text lines are identified based on the nearest aggregation method. Then, the vertical positions of the extracted characters are corrected according to every text line, and the reconstructed texts are saved in a new image. The experiment of nearly 200 document images shows the average recognition rate can be significantly improved to 95% with rapid fast speed.
Keywords
Get full access to this article
View all access options for this article.
