Abstract
The process of associating words with their relevant parts of speech is known as part-of-speech (POS) tagging. It takes a substantial amount of well-organized data or corpora and significant target language research to obtain good performance for a tagger. Mizo is a language that needs more research attention in computational linguistics due to its under-resourced nature. The limited availability of corpora and relevant literature adds complexity to the task of assigning POS labels to Mizo text. This paper explores two methods to potentially improve the Hidden Markov Model (HMM)-based POS tagger for the Mizo language. The proposed taggers are compared with the baseline HMM tagger and the N-gram taggers on the designed Mizo corpus, which consists of 72,077 manually tagged tokens. The experimental results proved that the two proposed taggers enhanced the HMM-based Mizo POS tagger, achieving 81.52% and 84.29% accuracy, respectively. Moreover, a comprehensive analysis of the performance of the suggested hybrid tagger was conducted, yielding a weighted average precision, recall, and F1-score of 83.09%, 77.88%, and 79.64% respectively.
Get full access to this article
View all access options for this article.
