Abstract
Many individuals, organizations, and companies have to answer large amounts of emails. Often, many of these emails contain variations of relatively few frequently asked questions. We address the problem of predicting which of several frequently used answers a user will choose to respond to an email. We map the problem to a semi-supervised text classification problem. In a case study with emails that have been sent to a corporate customer service department, we investigate the ability of the naive Bayesian and support vector classifier to identify the appropriate answers to emails. We study how effectively the transductive Support Vector Machine and the co-training algorithm utilize unlabeled data and investigate why co-training is only beneficial when very few labeled data are available. In addition, we describe a practical assistance system.
Get full access to this article
View all access options for this article.
