Abstract
Emails are the most popular and effective way of communicating over the internet. A number of applications are available today for computers and mobile devices for email messaging. Email messaging is constantly getting more popular and, as a result, numbers of sent and received emails are also increasing. It is very difficult for a user to remember emails and relate newer incoming emails to previous communications made on similar topics. Email threads provide a mechanism using which a user can obtain sequences of emails for a particular set of communication in a time frame and provides a number of benefits to users. In this work two email thread identification algorithms based on a nested textual clustering approach are presented. The work is planned in two stages; in the first stage two popular text clustering approaches, latent Dirichlet allocation and non-negative matrix factorization, are applied over the email messages to form the email clusters. Then in the second stage, clustering is again performed over the created email clusters to identify the email threads using threading features. Performance parameters like accuracy, precision, recall and F-measure are evaluated for the presented thread identification algorithms.
Keywords
Get full access to this article
View all access options for this article.
