Abstract
In this work, we present a morphological segmenter for the Mexican indigenous language Wixarika. Segmentation is fundamental for rich morphological languages, a common aspect of the native American languages, to improve other tasks like machine translation, dialogue systems, summarization, etc. On top of the agglutinative nature of the language, the low amount of resources and the lack of an orthographic standard among dialects add to the challenge. Our proposal is based on a probabilistic finite-state approach that exploits regular agglutinative patterns and requires little linguistic knowledge. We show that our approach outperforms unsupervised and semi-supervised methods in a low-resource context. The dataset used in this work was openly released for future work by the community.
Get full access to this article
View all access options for this article.
