Abstract
Argumentation in academic writing is a challenging task required to communicate clear ideas. Exposed ideas have to be supported by reasoned arguments. Arguments are composed of components such as premises and conclusions. In this paper, we present an approach to classify argumentative components using language models and machine learning algorithms on a new corpus of academic theses and research proposals. We explore the use of lexical, syntactic, semantic and indicator features to tackle this task. We found that lexical features provide the best efficacy for the classification. For language models, the best features were syntactical. But our experiments showed that a document occurrence representation with unigrams achieved the best accuracy. We also tested the conclusions about the representation and classifier on theses according to their study level (undergraduate, master, and doctoral). We analyzed the information gain of features and found patterns that are part of argumentative markers.
Keywords
Get full access to this article
View all access options for this article.
