Abstract
Translating text in real-world images presents several challenges such as text detection, text extraction, recognition, and translation. A multi-lingual translation system must take fundamental differences between the characteristics of different alphabets such as Latin, Cyrillic, Chinese, Korean, and Arabic into account. The system presented in this paper can extract text from real-world images with appropriate heuristics for these alphabets, as well as de-skew, binarize, recognize, and translate them. OCR is utilized to recognize the text, and the translation is employed using Translation Memory and Machine Translation. Experiments were also conducted to determine the most suitable segmentation and binarization algorithms for translation.
Keywords
Get full access to this article
View all access options for this article.
