Abstract
This work investigates the role of semantic clause types and modality in argumentative texts. We annotate argumentative microtexts with situation entity (SE) classes and additionally label the segments that contain modal verbs with their modal senses. We analyse the correlation both of SE classes and of modal verbs and senses with components of argument structures (such as premises and conclusions) and their functions (such as support and rebuttal). We find interesting tendencies in the correlations between both argument components and argumentative functions with SE types.
We also see interesting differences in the distributions of modal verbs and senses within different argumentative components and functions, as well as evidence that modal senses can be helpful to distinguish conclusions and premises. We conclude that both semantic clause types and modal senses can be deployed for automatic recognition and fine-grained classification of argumentative text passages.
Keywords
Introduction
Argumentation mining has recently gained attention as a novel task in automatic discourse analysis which is of high relevance, for instance in the analysis of opinions in social media. Argumentation analysis aims at automatically identifying and structuring arguments in natural language texts. Informally, an argument can be understood as a discussion – or a monological debate – in which reasons (so-called premises) are advanced for or against some controversial issue, proposition or proposal (often called the conclusion) [18].
The aim of this study is to better understand linguistic features of argument components in argumentative texts. We take an analytical approach, investigating the roles of both modality and semantic clause types in an annotated corpus of argumentative microtexts [24].
In previous work [3], we have annotated the 112 microtexts in this corpus with semantic clause types in the form of situation entities [8].2
See Section 4 for details on situation entities. The argumentative microtext corpus with pre-annotated argumentative functions comes from [24], while we added annotations of SE types and modality as new layers of annotation. For details of the dataset and the annotation process please refer to [3].
Do particular
Do particular
Do particular
Do particular
In the following, we first, in Section 2, present the microtext corpus [24] on which we base our analysis. In Section 3 we introduce the annotation categories for situation entities and modal senses, which have both been established and annotated in prior work, for genres other than argumentative texts [8,10,16]. Section 4 analyzes the distributional properties of semantic clause types in the argumentative microtexts and their correlation with specific argument structure components. Section 5 performs a similar analysis for modal verbs and senses. In Section 6 we summarize our observations and draw conclusions on the contribution of semantic annotation categories for the linguistic characterization of arguments and argument components.
The argumentative microtexts are a collection of short written texts (of usually 5 sentences) elicited in response to a trigger question, such as (1).3
23 of the 112 texts were written by one of the authors of the microtext corpus as teaching tools and were used mainly in teaching and testing the students’ argumentative analysis, thus not associated with trigger questions.
Sollten Videospiele in den Reigen der Olympischen Disziplinen aufgenommen werden?
Should video games be made olympic?
Figure 1 shows one complete microtext (b036), written in response to the trigger question above.

Argumentation graph of a microtext.
The microtexts are dense argumentative texts; each segment contributes to the argument. Each text contains one segment stating the
Situation entity types
To investigate the semantic types of clauses found in
SE types are recognizable (and distinguishable) through a combination of linguistic features of the clause and its main verb, and it was found that the distribution of SE types in text passages correlates to some extent with whether the text passage is (e.g.) narrative, informative, or argumentative [3,17,21].
The use of linguistic features for distinguishing text passages is closely related to Argumentative Zoning [20,32], where linguistic features are used to distinguish genre-specific types of text passages in scientific texts. In this manner, those texts are segmented into types of text passages such as Methods or Results. Notions related to SE type have been widely studied in theoretical linguistics [1,4,6,29,33,34] and have seen growing interest in computational linguistics [5,11,12,19,25,28,35].
Annotation categories. The inventory of SE types considered in our analysis starts with
Modality, negation, future tense, and conditionals, when coupled with an
Carlo got the job. (
Georg has blue eyes. (
Darya answered. (
Reza is short. (
He won the race. (
It is warm today. (
The other two frequently-occurring SE categories in argumentative texts are
Birds can fly.
Scientists make arguments.
Fei travels to India every year.
Three of these situation entity types can be used to express general knowledge about the world, to varying extents.
The next category of SE types is broadly referred to as Abstract Entities. This type of clause presents semantic content in a manner that draws attention to its epistemic status. We focus primarily on a small subset of constructions – factive and propositional predicates with clausal complements. Of course a wide range of linguistic constructions can be used to convey such information, and to address them all would require a comprehensive treatment of subjective language. In the examples below, the matrix clause is in both cases a
Finally, the labels
Annotators had access to the full inventory of ten SE types described above, but only four of them occur often enough in the microtext corpus to allow for meaningful analysis (see Section 4 for details).
Modal verbs and modal senses
Modal verbs convey extra-propositional meaning of clauses, encoding information about possibility, necessity, obligation, permission, wishes, or requests [13]. Features of the verb such as modal auxiliaries, tense, and mood have been widely used in previous work for classifying argumentative vs. non-argumentative sentences [7,18]. For example, [18] include verb lemmas and modal auxiliaries as features, and [7] find that, for Greek web texts related to public policy issues, tense and mood features of the verbal constructions are helpful for determining the role of the sentences within argumentative structures. Classifiers that automatically distinguish modal senses in context have been developed by [26] and improved in subsequent work [15,16,36].
We annotate all modal verb occurrences in the microtexts with their modal sense, following the inventory of modal senses and annotation guidelines from [36]. They distinguish three senses:
DYNAMIC (DY):
Außerdem können sich nur große Unternehmen die zusätzlichen Personalkosten leisten.
In addition, only large companies can afford the additional personnel costs. (Ability)
EPISTEMIC (EP):
Eine ungewollte Schwangerschaft kann sowohl für die Eltern als auch das Kind eine schwere Belastung für den Rest des Lebens darstellen.
An unwanted pregnancy can be a heavy burden for the rest of life for both the parents and the child. (Possibility)
DEONTIC (DE):
Die Krankenkassen sollten Behandlungen beim Natur-oder Heilpraktiker nicht zahlen.
Health insurance companies should not cover treatment in complementary medicine. (Obligation)
DEONTIC (DE):
Zwar wollen die Vermieter möglichst viel verdienen […]
The landlords want to earn as much as possible […] (Wish)
DEONTIC (DE):
Der Staat darf ein medizinisches Produkt bzw. eine medizinische Behandlung nicht aus moralischen Gruenden einschränken.
The state may not restrict a medical product or medical treatment for moral reasons. (Permission)
It is important to mention that the interpretation of modal verbs can vary with their context. [36] find that specific sense ambiguities such as dynamic vs. deontic readings of can, epistemic vs. dynamic readings of could, or epistemic vs. deontic readings of should are difficult to discriminate. Likewise, senses can be expressed through different modal verbs. In the following example, a possibility (epistemic sense) is expressed by two different German modal verbs (mögen and können):
EPISTEMIC (EP): Manche mögen das Konzept der Schul-Uniform anachronistisch finden. Some may find the concept of school uniform anachronistic. (Possibility) EPISTEMIC (EP): Obwohl die Geschäftszahlen von IBM in letzter Zeit nicht umwerfend waren Although IBM’s numbers haven’t been staggering recently,
Extending argumentative annotations of microtexts to SE types and modal senses
We annotate all segments with their situation entity types, and those containing modal verbs, with their modal senses. Annotation examples are illustrated in Table 1 and Fig. 2 which also give examples of both SE types and modal verbs and their senses in argumentative texts. The premise Sure, other people have to work in the shops on the weekend (Table 1) is an example for a
Sample microtext (micro_b015 ), German and English versions, with SE labels, proponent/opponent status, and argumentative functions (support, undercut, rebuttal, addition)
Sample microtext (
Modal verbs and their senses are marked in boldface;
Segmentation. Before annotating the microtexts, we first segment them into clauses (cf. [3]). As seen in Table 1, an argumentative component can in fact consist of several SE segments. The granularity of situation-evoking clauses is different from that of argument components, requiring that the texts be re-segmented prior to SE annotation. For segmentation we use DiscourseSegmenter [27], a Python package offering both rule-based and machine-learning based discourse segmenters.4
[31] annotated the microtext corpus additionally with two alternative approaches to discourse structure, Rhetorical Structure Theory (RST, [14]) and Segmented Discourse Representation Theory (SDRT, [1]). The authors make similar observations to ours regarding the granularity of SE segments and argument components, finding that argument components may consist of several elementary discourse units (EDUs) as used in RST and SDRT.

Argumentation graph with clause type and modal sense annotations.
Agreement statistics. We train two student annotators with backgrounds in linguistics for labeling texts with both SE types and modal senses. The two annotators then label the microtexts independently. We compute agreement both for SE types and for modal senses and gain an inter-annotator agreement of 0.40 for the former and 0.75 for the latter, both reported as Cohen’s Kappa. As reported by several studies [23], the annotation of argumentative texts is a difficult task for humans. The agreement statistics reported above suggest that this is also the case for the task of labeling argumentative texts with SE types. The SE-labels that caused most disagreement among the annotators are
Distributions of SE types within different genres
We find that the most important SE types within the microtexts are
In order to investigate whether argumentative texts can be distinguished from non-argumentative texts with respect to the distribution of SE types (cf. [30]), we compare the distribution of
The microtexts, which can be described as “purely” argumentative texts, are characterized by a high proportion of

Distribution of SE types within different genres.
Distributions of SE types for argument components and functions, along with absolute number of each type of segment in the microtext corpus (in brackets: subset without coerced cases)
GEN:
Looking at premises only,

SE types and argument components and propositions.
Table 2 shows the distributions over the four major SE types for the various types of argument components in the microtext corpus.
Turning to correlations between SE types and argumentative functions of premises, we focus on the three most frequent functions: support, rebuttal, and undercut (Table 2, rightmost section, and Fig. 5). Supporting premises have a SE type distribution very similar to that of proponent premises overall. This is not surprising given that most microtexts contain only a single opponent premise. Undercutting premises show an even higher frequency of
Die Krankenkassen sollten Behandlungen beim Natur-oder Heilpraktiker nicht zahlen, (
Health insurance companies should not cover treatment in complementary medicine
What is additionally notable about rebutting premises is that they show the highest proportion of (overall low-frequency)

Correlations between SE types and argumentative functions.
These results suggest that situation entity types could be helpful even for a finer-grained analysis of argumentative functions.
Of course, our annotated corpus is very small, and the phenomena we observe can be interpreted solely as tendencies. Nevertheless, we hypothesize that the prevalence of the
As mentioned above, linguistic features such as modality, negation, future tense or conditionals cause a coercion from
To better understand the effect of these coercions on our analyses, we reiterated our analyses on the same dataset, but undoing these coercions. This means that we use a version of the dataset in which what ordinarily would be treated as cases of coercion are instead labeled with the SE type that holds prior to coercion. Example (26) e.g. would be labeled as
Deshalb sollte Deutschland die Todesstrafe nicht einführen!
That’s why Germany should not introduce capital punishment!
The distributions of SE types within the dataset with and without coercions can be found in Table 2 (version without coercions in brackets) and are visualized in Fig. 6.7
Please note that all other figures in this paper report results on the original dataset, with coercions.

Distributions of SE types within argument components and argumentative functions, with and without coercions.
This supports the assumption that episodic events are dispreferred for supporting a conclusion or a premise, but preferred for attacking a conclusion or another premise. This second claim is supported by the high proportion of
Befürworter der Todesstrafe setzen auf die abschreckende Wirkung sowie die endgültige Eliminierung einer potentiellen Gefahr. (
Proponents of the death penalty count on its deterring effect as well as the ultimate elimination of any potential threat.
Our motivation of comparing the datasets with and without coercions was to learn more about the effect that modals and other linguistic features which cause coercions may have on the distribution of SE types within argumentative texts. We find that indeed there is a considerable effect: as reported above, we naturally find fewer
Modal verbs within argumentative and non-argumentative texts
In our annotation we found that modal verbs constitute a high-frequency linguistic phenomenon in the microtexts. The overall 188 modal verbs included in the microtexts occur in 31% of all argumentative segments (57% of conclusions and 26% of premises) and are thus at least partly responsible for the high number of
Table 3 shows the German modal verbs and their available senses. Note that wollen is not a modal verb, but was included because mögen (may/want to) is ambiguous, and thus we included the synonymous wollen and tagged this sense as deontic (wish).
Our analysis looks closely at distributions of both modal verbs and modal senses and is performed on a subset of the segments from the SE type analysis, as we are concerned only with segments containing modal verbs (cf. first row in Table 4).
German modal verbs with English translations and available modal senses (ep = epistemic; de = deontic; dy = dynamic), as well as number of occurrences in the data
German modal verbs with English translations and available modal senses (ep = epistemic; de = deontic; dy = dynamic), as well as number of occurrences in the data
Distributions of modal verbs and modal senses for various argument components and functions, plus absolute number of modal-verb-containing segments in the microtext corpus
Overall, the most frequent modal verb in the microtexts is sollen (121 occurrences, 41% of modal verbs), and the most frequent sense across all modal verbs is deontic (133 occurrences, 75%).

Modal senses and argument components and propositions.
Of the
The prevalence of sollen within conclusions (86%) warrants a caveat: most of the trigger questions contain some form of sollen, and so it is possible that the predominance of sollen, particularly in conclusions (86%), is a result of the form of the trigger questions.8
The frequency of certain modal verbs could also be a feature of a specific type or subclass of argumentative texts: While most of the microtexts can be characterized as persuasive/suggestive towards certain future actions, argumentative texts may also be of an epistemic nature (e.g. arguments in scientific discourse). Therefore, our observations of features of argumentative texts first of all hold for the subclass of argumentative texts as represented in the microtext corpus, while we leave the analysis of features of other subclasses as future work.
The other nearly 40% of premises are epistemic and dynamic senses of MVs, which occur in conclusions with a combined frequency of only 3%. The prevalence of these two senses is reflected in the high frequency of können, which frequently occurs as dynamic or epistemic. As is generally the case, müssen in epistemic sense is rather rare. The tendencies can be seen clearly in the upper part of Table 4, which displays the modal sense distributions.
In
Epistemic modals refer to alternative, often counterfactual worlds, and as such they are well suited for rebutting preceding or hypothesized arguments or premises (Fig. 8). An example of this from our data is given in example (28), where bold face marks a rebutting premise:
Obwohl die Geschäftszahlen von IBM in letzter Zeit nicht umwerfend waren
Although IBM’s numbers haven’t been staggering recently,

Correlations between modal senses and argumentative functions.
Turning to dynamic sense, we find it more strongly associated with
Bereits heute dominieren Supermärkte und grosse Einkaufszentren den Markt.
Supermarkets and large shopping centres dominate the market today already.
Finally, even though deontic is the most frequent modal sense within all of the microtexts, it is especially frequent within
As an example, we find (30):9
Note that the English translation does not render the deontic sense of müssen in the German original.
Man sollte einem langjährigen Eigentümer nicht die Chance auf Angleichung seiner Rendite auf Marktniveau verwehren.
One should not deny a longtime owner the opportunity to approximate his return to market level.
In Dennoch sollten Frauen vor der Einnahme über die Risiken aufgeklärt werden und gynäkologisch untersucht werden.
However, women should be educated about the risks and gynecological studies before taking them.
This echoes the finding in the situation entity analysis that rebuttals pattern somewhat differently from the other argumentative functions.
This study extends our understanding of linguistic features of argumentative texts by investigating the utility of SE types and modal verb senses for modeling regions of argumentative texts. Although the microtext corpus is small and the observed phenomena can be interpreted only as tendencies, our analyses show that both semantic clause types and modal senses can be useful features for automated argumentation analysis – not as stand-alone features, but as part of a larger system. Table 510
Note that the results for SE types are based on the original dataset, cf. Section 4.4.
Clause types and modality as features for argumentation mining: Summary
Our analyses revealed some clear-cut distinctions as well as weaker tendencies. First, SE types, especially
The high frequency of
Proponent premises were found to be characterized by a high proportion of
Modal verbs and senses can be used to distinguish conclusions from premises since conclusions show a strong tendency to be deontic. Modal senses can also be helpful for classifying proponent and opponent premises, since the former show a higher proportion of deontic sense.
Finally, we found rebutting premises to contain more epistemic modal verbs than supporting or undercutting premises.
Moving forward, we will extend our analysis to longer texts with both argumentative and non-argumentative passages. With more data, we will be able to consider finer distinctions in the usage of modal senses, for example investigating the sense distributions for particular modal verbs in particular components of argumentative structures. We also consider broadening the scope of modality-indicating expressions to modal adjectives (possible) and adverbials (probably) or attitudinal expressions (to be expected).
Using automatic classifiers to assign situation entity [9] and modal sense labels [15,16], the contribution of these phenomena to argument structure parsing can be examined on a broader scale.
Footnotes
Acknowledgements
We want to thank Michael Staniek for building the annotation tool and preprocessing the texts. Sabrina Effenberger and Rebekka Sons we thank for realizing the annotations and for their helpful feedback on the annotation manual. We also would like to thank the reviewers for their insightful comments and suggestions on the paper. This research is funded by the Leibniz Science-Campus “Empirical Linguistics and Computational Language Modeling”, supported by Leibniz Association grant no. SAS-2015-IDS-LWC and by the Ministry of Science, Research, and Art of Baden-Württemberg.
