Abstract
Reddit is a popular social media website where users can submit content such as direct links and text posts into a forum called subreddit. The average number of new subreddits created reaches 500 per day. Because of the vast and growing number of subreddits, users need to discover and familiarize themselves with all existing communities before submission. In this paper, we propose new feature sets for an online community which are text posts ratio, the average length of text in the post and the domain-specific features. The community recommendation framework is designed and experimented based on Reddit dataset. The framework successfully identifies and collects textual communities by finding their representatives using clustering algorithm namely DBSCAN, then a logistic regression algorithm is applied to recommend a list of communities with high content similarity to a given post. Comprehensive experimental evaluations on Reddit dataset reveal that the proposed framework achieves high precision at 90%.
Get full access to this article
View all access options for this article.
