Abstract
AdaBoost.MH is a boosting algorithm that is considered to be one of the most accurate algorithms for multilabel classification. It works by iteratively building a committee of weak hypotheses of decision stumps. To build the weak hypotheses, in each iteration, AdaBoost.MH obtains the whole extracted features and examines them one by one to check their ability to characterize the appropriate category. Using Bag-Of-Words for text representation dramatically increases the computational time of AdaBoost.MH learning, especially for large-scale datasets. In this paper we demonstrate how to improve the efficiency and effectiveness of AdaBoost.MH using latent topics, rather than words. A well-known probabilistic topic modelling method, Latent Dirichlet Allocation, is used to estimate the latent topics in the corpus as features for AdaBoost.MH. To evaluate LDA-AdaBoost.MH, the following four datasets have been used:
Get full access to this article
View all access options for this article.
