Abstract
Machine learning algorithms are often used in content-based recommender systems since a recommendation task can naturally be reduced to a classification problem: A recommender needs to learn a classifier for a given user where learning examples are characteristics of items previously liked/bought/seen by the user. However, multi-valued and continuous attributes require special approaches for classifier implementation as they can significantly influence classifier accuracy. In this paper we propose novel approaches for handling multi-valued and continuous attributes adequate for the naïve Bayes classifier and decision trees classifier, and tune it for content-based movie recommendation. We evaluate the performance of the resulting approaches using the MovieLens data set enriched with movie details retrieved from the Internet Movie Database. Our empirical results demonstrate that the naïve Bayes classifier is more suitable for content-based movie recommendation than the decision trees algorithm. In addition, the naïve Bayes classifier achieves better results with smart discretization of continuous attributes compared to the approach which models continuous attributes with a Gaussian distribution. Finally, we combine our best performing content-based algorithm with the k-means clustering algorithm typically used for collaborative filtering, and evaluate the performance of the resulting hybrid approach for a movie recommendation task. The experimental results clearly show that the hybrid approach significantly increases recommendation accuracy compared to collaborative filtering while reducing the risk of over specification, which is a typical problem of content-based approaches.
Keywords
Get full access to this article
View all access options for this article.
