Sage Journals: Discover world-class research

Abstract

Translating text in real-world images presents several challenges such as text detection, text extraction, recognition, and translation. A multi-lingual translation system must take fundamental differences between the characteristics of different alphabets such as Latin, Cyrillic, Chinese, Korean, and Arabic into account. The system presented in this paper can extract text from real-world images with appropriate heuristics for these alphabets, as well as de-skew, binarize, recognize, and translate them. OCR is utilized to recognize the text, and the translation is employed using Translation Memory and Machine Translation. Experiments were also conducted to determine the most suitable segmentation and binarization algorithms for translation.

Keywords

character recognition document image processing machine translation document analysis systems multi-lingual processing

Get full access to this article

View all access options for this article.

A MULTI-LINGUAL TRANSLATION SYSTEM FOR REAL-WORLD IMAGES

Abstract

Keywords

Get full access to this article