Authorship attribution as a case of anomaly detection: A neural network model

Abstract

Writings by the same author usually share specific traits, the so-called stylome, which is defined as an abstraction of the constraints and specific sequences of words and phrases used in the texts. Although identifying a stylome has been elusive, some advancements in this area have been made. Here, we present a system trained with texts from a given author that then unveiled some of its features and, in turn, detected texts not written by that author, or written within a different style. The system is based on time series processing capabilities of an unsupervised neural network model known as the self-organizing map. The core idea is that a system trained with texts by one author should detect an anomaly when presented with texts from other authors. We present results of authorship identification in several contexts including known benchmarks as well as some examples from literature, journalism, and popular science.

Keywords

Authorship attribution anomaly detection self-organizing maps time series

Get full access to this article

View all access options for this article.