An annotation scheme for Rhetorical Figures

Abstract

There is a driving need computationally to interrogate large bodies of text for a range of non-denotative meaning (e.g., to plot chains of reasoning, detect sentiment, diagnose genre, and so forth). But such meaning has always proven computationally allusive. It is often implicit, ‘hidden’ meaning, evoked by linguistic cues, stylistic arrangement, or conceptual structure – features that have hitherto been difficult for Natural Language Processing systems to recognize and use. Non-denotative textual effects are the historical concern of rhetorical studies, and we have turned to rhetoric in order to find new ways to advance NLP, especially for sophisticated tasks like Argument Mining. This paper highlights certain rhetorical devices that encode levels of meaning that have been overlooked in Computational Linguistics generally and Argument Mining particularly, and yet lend themselves to automated detection. These devices are the linguistic configurations known as Rhetorical Figures. We argue for the importance of these devices for Argument Mining, especially in collocations, and we present an XML annotation scheme for Rhetorical Figures to make figuration more tractable for computational approaches, particularly with an eye on the improvements they offer Argument Mining. We also discuss the intellectual and technical challenges involved in figure annotation and the implications for Machine Learning.

Keywords

Rhetoric rhetorical figures annotation argument mining XML

1. Introduction

Rhetorical Figures are cognitively moulded linguistic devices that serve functional, mnemonic, and aesthetic purposes. Take the famous maxim from John F. Kennedy’s 1961 United States Presidential inaugural address:

Ask not what your country can do for you. Ask what you can do for your country. [34]

This expression quickly became proverbial in the American consciousness for the way it captures the spirit of a particular historical moment, the ethos of a particular administration, and the aspirations of a particular generation. Countless more prosaic formulations, by Kennedy and others, expressed that same confluence, but they left a distinctly less memorable impression. Why? Two reasons. Firstly, the formal structure and the functional structure are virtually isomorphic: Kennedy (and speechwriter Ted Sorensen) expressed the rejection of one civic attitude and its replacement by an opposite one, in the iconicity of identical terms of reference repeated in reverse order. Secondly, the form in that snug form/function coupling, on its own, is cognitively very sticky. The Kennedy–Sorensen phrase has become so widely known, in short, so easily shared, so frequently invoked and quoted and recited because of (1) the schematic congruence with which the form matches the Reject–Replace function its arrangement serves, and (2) the cognitive affinities humans have for its structural properties (opposition, repetition, sequence, and symmetry).

The cognitive affinities explain its mnemonic and aesthetic effects, but its tight form-functional correlation, studied in an approach known as Figural Logic, is what makes it so interesting for the computational modeling of natural argumentation. Form/function pairings are the pot at the end of the Argument Mining rainbow. The form is tractable for automatic detection. The function gives us its rhetorical purpose. In terms of Argument Mining, an application with the right knowledge of rhetorical figures could generate an epitome of Kennedy’s inaugural address virtually on the basis of this expression (Example 1) alone.

There is a growing interest in the convergence of rhetoric, argumentation, and Advanced Natural Language Processing (ANLP) – that is, in the new field of Computational Rhetoric – sparked by such works as [10,21–24,53,61,62].1

¹
We do not put Mann and Thompson’s [41] Rhetorical Structure Theory (RST) in this category because, while it has made some valuable insights into text linguistics, it is simply incorrectly named, by scholars who appear to know little or nothing about rhetoric. RST has really to do with text coherence rather than with rhetoric as traditionally understood, as the style-sensitive study of suasive language.

But aside from passing mentions here and there, Rhetorical Figures have been almost wholly neglected in these developments. Our work addresses this surprising omission [2,3,19,20,28–30,49,50,54,60].

Meanwhile, over in Computational Linguistics, there is an equally surprising omission. Computational Linguists have had success at detecting some Rhetorical Figures, but have shown little interest in the Rhetorical Functions those figures serve. We are building an approach to Computational Rhetoric that combines the insights of Figural Logic with the goals of Argument Mining – namely, finding patterns of reasoning in texts – which promises equally rich payoffs for both computer science (specifically, in Argument Mining and Computational Linguistics) and for rhetorical theory (specifically, the understanding of Rhetorical Figures).

Serious challenges need to be resolved in Computational Rhetoric for us to reach the Argument Mining pot of gold. We concentrate on one of the most immediate of those challenges in this paper: the “serious bottleneck” in Computational Rhetoric, identified by Marie Dubremetz and Joakim Nivre, stemming from “the lack of annotated data” [12]. This problem is especially acute for Machine Learning (ML), where we believe rich insights will come in the future, about which they note “there has been no training data available” [14]. They are clever researchers, so Dubremetz and Nivre manage some interesting results all the same, using a tiny training set, but the lack of training data remains urgent – in particular, for more significant and complex Computational Rhetoric problems than they address, the form/function correlations of Rhetorical Figures, and the combinatoric properties of Rhetorical Figures (what we call figural collocation).

Our approach involves a more sophisticated conception of Rhetorical Figures than has been adopted heretofore – in computational approaches, but also in humanistic and social science approaches. Our research operates at layers of formal and functional abstraction that have not been previously explored (though see [37] for promising new advances). It depends fundamentally on an annotation format for Rhetorical Figures, which we report in this paper.

In this paper, we argue for the importance of Rhetorical Figures for ANLP generally and Argument Mining specifically; we identify the challenges and opportunities of integrating a rich knowledge of Rhetorical Figures into ANLP; and, most specifically, we offer an eXtensible Markup Language (XML) annotation scheme for Rhetorical Figures that meets some of these challenges and therefore opens up new opportunities for Argument Mining and Computational Linguistics, especially in concert with ML.

2. Opportunities and challenges

Computationally, Rhetorical Figures are critical to the understanding of natural language for three central reasons. First, they are endemic to human language. This fact is established beyond dispute for a few tropes, such as metaphor, which is a central focus of Cognitive Linguistics and deeply entrenched in ontologies like FrameNet and WordNet. But it is equally true of literally (a word we do not use lightly) hundreds of other figures. If we want to develop language-perceptive algorithms, they must have knowledge of the figurative dimensions of language. Secondly, Rhetorical Figures epitomize argument structure, increasingly a prime concern for ANLP. Again, this is well understood for metaphor (and simile, though it gets much less overt attention), which epitomize analogic argumentation, but is largely unrecognized for most figures. Thirdly, many figures (especially the ones called schemes) are formal patterns that algorithms can detect through surface analysis; our Example 1 illustrates this aspect clearly. It has positionally bonded repetitions: position and identity are a snap for text algorithms.

The contemporary scholar most responsible for the claim that Rhetorical Figures are constructions with especially tight couplings of form and function is Jeanne Fahnestock, whose Figural Logic is brilliantly articulated in Rhetorical Figures in Scientific Argumentation [17] (see also [63]:69–85, [26]). Fahnestock charts Rhetorical Figures for the way they epitomize lines of argument, relying almost exclusively for her data on a domain with particularly rigorous standards of argumentation, the natural sciences. As she shows, the Figural-Logic view actually goes back at least to Aristotle, who links specific figures directly to specific lines of argument (that is, to topoi), and it occupies a significant place in the rhetorical tradition as late as the 19th century. But, aside from a very few exceptions, such as Perleman and Olbrecht-Tyteca [51], this view was largely forgotten in modern rhetoric, with figures coming to be associated with style, and style coming to be associated with aesthetics, aesthetics with superficiality.

Rhetorical Figures are not without their computational challenges, of course. Metaphor remains elusive, for instance, despite all the attention it has attracted in Cognitive Science, Artificial Intelligence (AI), and linguistics, including Computational Linguistics, over the last several decades. Metaphor is a type of figure known as a trope, which depends on semantic features in a kind of constrained and harmonious conflict with each other. We are not yet successful enough with straight-line semantics to get very far with these kinds of semantic turns. Some tropes (such as oxymoron, which is a juxtaposition of antonymic terms, such as square circle or deafening silence) can be reliably detected [19]. We believe antithesis (juxtaposed opposite predications, as in Sentence 2, a double antithesis) has a similar potential for reliable detection (we adopt the convention of identifying the defining figural elements parenthetically):2

²
Some rhetoricians, including Fahnestock, regard antithesis as a scheme. See [27] for an argument that it is a trope.

The young would choose an exciting life; the old a happy death. (young/old; life/death) [1]:155

But most tropes are far from tractable computationally.

Another type of Rhetorical Figure, called schemes, are not semantic turns but formal turns, shifts away from expected mundane structure, as in Example 1, an antimetabole (reverse lexical repetition; in this case, you and your country). Like tropes, schemes are pervasive in ordinary language, and like tropes they occur in novel configurations that stand out against a ground of conventionality. Consider rhyme, a prototypical scheme in which syllables repeat at the ends of words. There are many ordinary language terms and expressions that feature repeated word-final syllables (hoi polloi, willy nilly, bye-bye, fat cat; “birds of a feather flock together”), but there are far more which do not; so the rhymes stand out, sometimes to considerable effect, as in this famous example:

If it doesn’t fit, you must acquit. (fit/acquit) [42]

The computational detection of figures, including antimetabole, is finding success [2,12–15,19,20,33,37,48–50]. Virtually none of this work, however, has gone beyond detection to consider the argumentation context. Dubremetz and Nivre, for instance, in their impressive work with antimetabole [12–15], appear to be unfamiliar with the Rhetorical Functions antimetabole serves. But antimetabole is a parade example of form/function correlation. It has a small set of Rhetorical Functions, keyed to the iconicity of its formal structure (which evokes balance and opposition, as well as sequence or priority). We have very limited space in this paper to demonstrate these Rhetorical Functions, so a few examples will have to suffice.

One paradigmatic function of antimetabole is to convey Reciprocal Force, illustrated by Sentence 4, Newton’s third law of motion.

If you press a stone with your finger, the finger is also pressed by the stone. (stone/finger) [47]:15

Newton’s third law is often expressed as “for every action, there is an equal and opposite reaction” (which features another Rhetorical Figure, polyptoton – a repetition of the same lexical stem with different morphology – here, action and re action). But Newton’s own argument favored the antimetabole, whose very structure expresses “equal and opposite.” (We give the example in English, but Newton’s original Latin is also antimetabolic.)

A very similar Rhetorical Function of antimetabole is to convey Reciprocal Specification, a kind of mutual definition, illustrated by Sentence 5:

Gay rights are human rights, and human rights are gay rights. (human rights/gay rights) [9]:0:08–0:12

In this phrase the notions of ‘human rights’ and ‘gay rights’ are reciprocally identified with each other. Sentence 5 says not only that you can’t have one without the other, neither makes sense without the other.

A third Rhetorical Function of the antimetabole is to convey Comprehensiveness, illustrated by the ordinary-language example, Sentence 6:

A place for everything, and everything in its place. (place/everything) [Traditional]

The reverse repetition in Sentence 6 implicates a kind of a reciprocal coverage – largely, we think, because it has prepositional predication rather than the transitive-verb predication of Newton’s Example 4. We call this function Comprehensiveness because the sequential iconicity evokes a back-and-forth, alpha-to-omega, omega-to-alpha, coverage of some domain – in this case, the domain of tidiness. All things have assigned places; all places have their assigned things.

A fourth Rhetorical Function of the antimetabole is to convey Irrelevance-of-Order, well known from algebra and predicate calculus:

$m + n = n + m (m / n)$ [Traditional]

There are other ways to express the principle of commutation, but none as natural and iconic as formulae like 7. Opposite sequences of the same variables, on either side of the same operator, pivoted by a predication of identity, equivalence, or equality inescapably means that neither sequence has priority. Order doesn’t matter to addition (multiplication, union, etc.).

We have built a curated list of over 400 antimetaboles illustrating these functions, but only have space for a few more representative examples:

Reciprocal Force

A corollary of PHC [the Principle of Hierarchical Coincidence] is that resources flow toward political power, and political power flows toward resources; or, the power of state and of capital typically appear in conjunction and are mutually reinforcing. (resources/political power) [56]

Women are changing the universities and the universities are changing women. (women/universities) [25]:629

Reciprocal Specification

The negation of a conjunction is the disjunction of the negations and the negation of a disjunction is the conjunction of the negations. (negation of a conjunction/disjunction of the negations) [“De Morgan’s law;” Traditional; see, e.g., [57]:84]

Anger and depression, the pop-psych books tell us, are two sides of the same coin: depression is anger suppressed, anger is depression liberated. (depression/anger) [31]

Comprehensiveness

I meant what I said and I said what I meant. (I meant/I said) [58]

Whether we bring our enemies to justice or bring justice to our enemies, justice will be done. (our enemies/justice) [5]

Irrelevance of Order

With a similar qualification, in the Cambridge Grammar of the English Language, a head ‘plays the primary role’ in ‘determining the distribution of the phrase’ (introductory chapter signed by Pullum and Huddleston, in Huddleston and Pullum 2002:24) (Pullum/Huddleston) [43]:24; the alternation of first author position is a scholarly convention for signaling that neither author has intellectual priority

“Spanglish,” [is] the combination of Spanish and English (or English and Spanish) (Spanish/English) [65]

It is such clearly identifiable Rhetorical Functions as these, coupled with the relative ease of rhetorical-scheme detection, that make Rhetorical Figures so promising for computational tasks in which comprehension is central, like Argument Mining and text summarization.

Again, however, there are challenges. They are not as thorny as the challenges of most tropes because they concern formal surfaces, not semantic depths. But they exist. In particular, figures rarely come in isolation. Sometimes figures get in each other’s way. Take an example like 16:

We didn’t land on Plymouth Rock. Plymouth Rock landed on us. (We/Plymouth Rock) [38]

These two sentences look like an antimetabole (we and us are just different cases of the third-person-plural pronoun, so they are the same word). But look again. Plymouth Rock has a different signification in each sentence. In the first sentence, it is the literal geographical location where the Mayflower Pilgrims landed to found their colony. In the second sentence, it is a metonym, evoking an American mythology of progress and liberation. Example 16 occurs in the movie, Malcolm X, a biography of the titular American dissident, and is a slight variation of a remark he made in his famous “Ballot or Bullet” speech [71], compressing into one phrase the argument that Americans of African descent are decidedly outside the American ideas of progress and freedom from persecution the Plymouth colony represents for Americans of European descent.3

A version of Example 16 predates its use by Malcolm X. It first shows up in a Cole Porter song, “Anything goes” [52], where it signals topsy-turviness of sexual morés rather than of progress and freedom.

The reverse repetition is clearly functional. It signals a topsy-turviness of some kind. Things are not as they should be; the two sentences express a root-level balance or reciprocality, which the antithesis throws out of whack by focalizing one of the alternatives, so there are important functional similarities to antimetabole. But we don’t see the same tight form/function mapping among such instances as we do with examples such as 1–15. The crucial figure here is antanaclasis, a trope in which two words or phrases are used with different significations. When they show up in a chiastic (reverse-repetition) structure like 16, we call the resulting compound figure, antanametabole (a neologism). Humans are pretty good with examples like this; machines, not so much. So, it is important to have taxonomies that can discriminate between true antimetaboles (1, 4–15) and pseudo antimetaboles (16).

But often the combinatorics of Rhetorical Figures work the other way; in such cases, which are frequent, they reinforce and constrain each other in predictable ways. In some combinations of Rhetorical Figures these effects lead to clearer, even monovalent Rhetorical Functions; often, they aggregate into robust nuggets that offer particularly rich rewards for Argument Mining. When this happens – in figural collocations – the functions of two or more figures work in concert to narrow the range of meaning. We’ve already seen a clear example of such a collocation, in fact: the Kennedy–Sorenson maxim (Example 1). It instances antimetabole (you/your country), of course. But it also instances antithesis (ask not X/ask X′ – where X represents a predication, and X′ represents a predication, possibly the same, possibly very different, which is conceptually parallel to X). Thirdly, it instances mesodiplosis (clause-medial repetition; here, can do occurs between the relevant terms of both clauses).

Figural collocations present a technical challenge that has not been met, but which strikes us as quite soluble, and they present rich opportunities. The challenge is to detect multiple overlapping figures; so far, researchers have only detected individual figures. The opportunities arise because Rhetorical Functions are often enhanced and stabilized under the combinatorics of collocation; some collocations are much more common than others, because of their cognitive similarities and functional implications. When two or more figures combine in certain recurrent collocations, the functions they convey tend to be highly consistent. Collocations, that is, often lead to functional conspiracies.

For instance, when antimetabole collocates with mesodiplosis and antithesis, the combined function is primarily to reject the negated predication utterly and replace it with the positive predication. Example 1 is our paradigm, but here are two more:

Reject–Replace

We don’t build services to make money; we make money to build better services. (services/make money; to; We don’t X, we X′) [39]

“Nonviolence has not failed us,” he said, “we have failed nonviolence.” (nonviolence/us; failed; has not X, have X′) [40]:273.

Unpacking these, we have reciprocal causal relationships – between making money and building services in 17; between nonviolence and human agents in 18 – with one causal relationship rejected, the other promoted in its place. All three figures are important here. Antimetaboles give us two terms in reverse order. Mesodiploses join two sets of terms in the same grammatical and semantic relationship, a Rhetorical Function of Equivalent Relation; it is not exactly proportional or analogical, but it puts two sets of terms (A&B and C&D) into something very much like the familiar Aristotelian formula, A:B::C:D. That is, A has the same grammatico-semantic relations to B as C has to D. In 17, we have two Subject/Complement causal relationship mediated by to (itself a short form of the expression in order to); with 18 we have the Subject/Object transitive relations mediated by have failed. Or, rather, we would have those Equivalent Relations, if it weren’t for the negations, the antitheses, that expressly deny the equivalence. Antithesis, for its part, has a small range of Rhetorical Functions, one of the most prevalent being to present Binary Alternatives. It is the collocation of all three figures here – antimetabole, mesodiplosis, and antithesis – that forces a choice between two completely inverse equivalent relations, promoting the positive option and dismissing the negated option. All three of 1, 17, and 18 Reject AB (citizens desiring entitlements from country, building services to make money, nonviolence failing the African National Congress (ANC)), and Replace it, either propositionally or hortatively, with BA (country needing support of citizens, making money to build services, the ANC failing its policy of nonviolence). (In a trickier way, 16 works similarly, of course – Rejecting the proposition that African Americans are part of the so-called American Dream, Replacing it with, as Malcolm X put it elsewhere, that African Americans are the victims of an American Nightmare [70].)

We see inescapably now, too, that mesodiplosis is central to the Rhetorical Functions of 4, 5, and 7–15. We wouldn’t get Reciprocal Force without the Equivalent Relation enforced by the mesodiplosis of verbs like press, and flow and change, or Reciprocal Specification without the mesodiplosis of copula verbs. The precise way in which figural bundles and grammatical features (like transitivity or copular predication) constrain meaning remains to be explored, but they work closely together in at least some cases, to enact quite specific Rhetorical Functions. As one last example of Rhetorical Figure combinatorics, sticking with the same set of figures and features, consider what happens when antithesis collocates with antimetabole in its Reciprocal Specification function. We get the very specific Subclassification function, as in Examples 19 and 20, which say, respectively, that ultrabooks are a class of laptop, and compounds are a class of molecules:

Subclassification

Ultrabooks are laptops after all, but not all laptops are ultrabooks. (ultrabooks/laptops; are; X, not X′) [64]

All compounds are molecules (since compounds consist of two or more atoms), but not all molecules are compounds (since some molecules contain only atoms of the same element). (compounds/molecules; are; all X, not all X′) [67]:7.

Here the Equivalent Relation of the copulative mesodiplosis is crucial, exactly as it is with Reciprocal Specification – the most natural coding of the IsA relation – but the addition of antithesis negates one direction of the predication. In predicate calculus terms, the antithesis turns a bidirectional (mutual) entailment into a unidirectional entailment.

We do not pretend to have a full and complete mapping of form to function, nor a chart of Rhetorical Figure combinatorics. This work is still in the very early stages, but these examples show that it holds considerable promise, and we believe ML corpus studies can be extremely helpful, especially for figural collocation.

Figural collocation, as we come to understand the functional combinatorics better, holds perhaps the greatest promise of Rhetorical Figures for computational understanding of natural language. Our paradigm example (1), a collocation of the schemes antimetabole and mesodiplosis with the trope antithesis provides a pitch-perfect example of the Rhetorical Function, Reject–Replace. A computational analysis of Kennedy’s inaugural address tuned to the workings of Rhetorical Figures could tell us what the address was about – namely, the rejection of an ethos of entitlement and its replacement with an ethos of responsibility – virtually on the basis of this particular collocation (along with, of course, the lexical semantics of you, your country, and so on).

We can, and should, rely on rhetoricians to guide us in the functions of certain figures and certain figure-bundles, at least in these early stages. But the rhetorical tradition is haphazard at best, and too frequently conflictual. The terminology alone is forbidding, not just because of its Greek and Latin roots but because of its historical inconsistency. As much as Computational Argumentation studies can benefit from a better understanding of Rhetorical Figures, Rhetorical Figure theories can benefit from computational studies of form and meaning. (And, yes, that sentence was an antimetabole, bundled with mesodiplosis; the Rhetorical Function is Reciprocal Force, modulated by the possibility-modality of can.)

The path forward is to bootstrap rhetoricians’ knowledge by way of annotated corpora and Machine Learning, so that computationally mined data can start to clarify what functions certain figures and figural collocations have, through confirmation, through refinement, and through new discoveries – all of which we have good reason to anticipate.

We can discover the proportionality of certain collocations (observationally, both antithesis and mesodiplosis seem strongly to co-occur with antimetabole), and the correlation of the collocations with the Rhetorical Functions (as sketched above, on the basis of limited and anecdotal research). At its best, this work can revolutionize Computational Argumentation studies and rhetoric in the way corpus linguistics revolutionized lexicography and generated ontologies like WordNet and FrameNet. But even if it does not prove entirely revolutionary, we are very confident of finding important form/function correlations that can significantly inform Computational Argumentation and Rhetorical Theory in novel ways.

Linguists have a role to play here as well. We have seen that grammatical features such as transitivity and copularity interact with Rhetorical Figures to determine their Rhetorical Functions, but Grammatical Constructions also interact with Rhetorical Figures intriguingly ([6]:55–60). For instance, the well-known chiastic Easier-to-take-the-A-out-of-B-than-the-B-out-of-A catchphrase clearly fits the criteria of Construction Grammarians [32]:

[I]t was easier to take the girl out of the brothel than to take the brothel out of the girl. (the girl/the brothel) [68]:72.

It was much easier to take Kuhn out of Harvard than Harvard out of Kuhn. (Kuhn/Harvard) [18]:387.

It was found easier to take the evacuee out of the slum than to take the slum out of the evacuee. (the evacuee/the slum) [69]:30.

After twenty-five years in the field. I’ve traded the front seat of a 4 × 4 for a swivel chair and a desk. The change did not come easily for me. As the old saying goes – it’s a lot easier to take the man out of the field than to take the field out of the man. (the man/the field) [48]:61.

I could take Tarzan out of the jungle. Could I take the jungle out of Tarzan? (Tarzan/Jungle) [44]:254.

The tricky aspect of these examples is the presence of a trope, metonymy. The word in the first A location (A1) in all of these cases is the physical person (e.g., the girl), the word in the first B location (B1) is a physical location (e.g., the brothel). But the word in the second B location (B2), in every case refers to a more general phenomenon (e.g., the culture of prostitution), very similarly to the Malcolm X example (16), while the word in the second A location (A2) refers to the person’s psychology.

3. Figure detection

There have been limited successes in figure detection over the past several years [2,12–15,19,20,33,37,48–50]. But they have been restricted both in method and in scope, with no concern for figural combinatorics.

Hromada’s work [33], for instance, is very successful at the detection of antimetabole, but he defines antimetabole in a narrow, overdetermined way. Using the Waterloo Rhetorical Figure Notation (WRFN) [30]4

⁴
Hromada calls this notation, Rhetoric Figure Representation Formalism or RFRF [33], which he adapts from Harris and DiMarco [30]. Harris and DiMarco did not label their formalism in their paper, but we use their term for it here. The WRFN is a formalism for the general structure of rhetorical schemes, but it does not represent functions.

(where W stands for Word, the subscripts indicate identity, and “…” represents other linguistic matter, extraneous to the figure, possibly null), Hromada defines antimetabole as <W^A W^B W^C …W^C W^B W^A>, whereas a more accurate definition (as in Harris and DiMarco (2009) [30] is simply [W]^A …[W]^B …[W]^B …[W]^A). That is, Hromada searched only for antimetaboles when they collocate with mesodiploses (the W^Bs, with W^A and W^C coding the inverted words of the antimetabole), but this was clearly incidental. He does not mention mesodiplosis at all, nor discuss his definition as incorporating any figures beyond antimetabole.

Little of the figure-detection work looked for more than one figure at a time, and none looked for figures specifically in terms of recurrent collocations. Hromada also searched for three other repetition figures (anadiplosis, epanaphora, and epiphora), but only in isolation. Dubremetz and Nivre regularly find antitheses collocated with antimetabole [12–15], because they use negation as a correlative of antimetabole (which markedly improved their detection success), and they regularly find parison (syntactic parallelism, discussed in more detail below) as well. But they were not looking for those figures and do not report their results. Only Gawryjołek [19,20] was genuinely looking for collocated figures, but again that was largely incidental to his focus. He did not interpret the collocations at all, nor report on the statistics. He was merely looking for multiple figures in the same corpus, many of which overlapped.5

⁵

Anadiplosis is clause-final-clause-initial lexical repetition (< …W^x >< W^x …>). Epanaphora is clause-initial lexical repetition (< W^x …>< W ^x …>). Epiphora (also known as epistrophe) is clause-final lexical repetition (< …W^x >< …W^x >). See the glossary at the end of [29] for more precise definitions, with examples. Note that these researchers use somewhat different terminology. Hromada [33] uses anaphora for our epanaphora, while Dubremetz and Nivre use chiasmus overwhelmingly for our antimetabole [12–15]. In the first case, we avoid anaphora (a synonym in the rhetorical tradition for epanaphora) because of its more prominent designation in Computational Linguistics, for pronouns. In the second, chiasmus is better understood as a general pattern of reverse repetition with any constituent, not just words. Dubremetz for instance, regards the expression “Readers don’t need to write, but writers do need to read” as a chiasmus (see http://stp.lingfil.uu.se/~marie/), as do we. But she does not seem to distinguish it from the lexical chiasmi (that is, the antimetaboles) her algorithms overwhelmingly detect. The domain of Rhetorical Figures needs more precision – in this case, we need to notice that writer and write, reader and read, are different words. In fact, they exhibit the Rhetorical Figure, polyptoton (repetition of the same lexical stem with different morphology). Such instances are chiastic, since we have reverse repetition of constituents. But the repetition is of sub-lexical material, not of words; with, therefore, different functional implications. We call such instances, antimetaptoton (a neologism). The nomenclature of Rhetorical Figures, resulting from over two millennia of too-often haphazard research, is, to understate the case dramatically, a shambles. There are different labels for the same linguistic configurations (a many-one problem), multiple linguistic configurations corresponding to the same label (a one-many problem), and with much linguistic activity that doesn’t align with the core figures labeled as figurative all the same (just a problem). We have developed a much more rigorous, consistent, and principled taxonomy of figures at Waterloo. See [8,26], and [29] for clarification, explication, and argumentation about this taxonomy.

But detecting Rhetorical Figures must be only the beginning of the story. We know from millennia of humanistic research that linguistic forms correlate with Rhetorical Functions – that figures do communicative work beyond ‘mere aesthetics’ – and we can thank Fahnestock for collating and expanding this research so clearly in the contemporary era [17]. On the basis of this work, Fahnestock’s Figural Logic, we can use the detected figures to help chart meanings – sometimes very fundamental meanings.

But it is still an open question how well the form–function couplings that humanists have found stand up beyond the small sampling of discourse they have cared to explore – the orations, poetry, and killer ripostes that stock their examples. Worse, much of the collection and curation of figures has also been quite lazy, with the same examples propagating century after century for the same figures. The data is highly skewed.

Do these form–function couplings hold up in the conversations, news stories, opinion pieces, blogs, review articles, short stories, tweets, scientific arguments, and so on, that fill the vast sea of everyday and specialist human discourse? We have our hunches, but we don’t know. Corpus studies can tell us. Do Reciprocal Force antimetaboles always occur with transitive verbs, for instance? Do Reciprocal Specification and Subclassification antimetaboles always occur with copulas? Do Irrelevance-of-Order antimetaboles always show up with conjunctions and disjunctions? How frequently does mesodiplosis collate with antimetabole? What other robust collocations are there, with what functional implications? We have intuitions, and there is a good deal of particularized research (that is, specific works of rhetorical criticism), but intuitions and particularized research need to be tested on corpora.

How do figures cluster in terms of genres? Do individual authors have identifiable figural proclivities? Is sentiment a trigger for certain figures? Do certain argument types favour certain figures? Are there author-genre figural effects? Argument-sentiment figural effects? Author-sentiment? Again, intuitions and particularized research suggest answers; again, these need to be tested.

When multiple figures co-occur, as they almost always do, which functions conspire, which remain independent, which ones take precedence in the possibility of a conflict? Are there functional differences between “accidental” figures and “designed” figures? If figures are form–function couplings, does it even make sense to speak of ‘accidental’ figures?

This work can undoubtedly be strengthened by ML, one of the motivations for the format we have developed for annotating Rhetorical Figures, to complement (and interact with) the annotation formalisms developed for part-of-speech tagging, speech-act tagging, and so on. Corpora annotated with Rhetorical Figures can be used to train systems on new and more sophisticated detection tasks, especially for collocations and functional correlations.

Marie Dubremetz [12]:25 calls the work of Jakub Gawryjołek [19,20] and Daniel Devatman Hromada [33] “pioneering;” we might choose, rather, the term “exploratory,” in the sense of crucial preliminary work charting landmarks and features of a new territory, a term we would also extend to the research of Claus Strommer [60] and Mohammad Alliheedi [2,3], and certainly to Dubremetz herself, who has gone (with her doctoral main advisor, Joakim Nivre) deeper into the territory of Rhetorical Figures than any other Computer Scientist. Dubremetz’s work inaugurates a second phase of figure detection, the systematic refining of methods in recurrent studies. She has been particularly preoccupied in this work with two issues, annotation and ranking. She has redefined figure detection “as a ranking task” [12]:27, rather than a binary task. Examples 26 and 27 are from her data: 26 gets a rank of 1, the premier instance of a chiastic figure in her corpus, and a chiastic score of 99.77%; 27 is ranked 3000, the worst chiastic instance in the corpus, with a chiastic score of 0.01% (both examples, with their ranking, along with the 2,998 between them, can be found at stp.lingfil.uu.se/~marie/corpus/quote_chiasmus.txt):

There are only two kinds of men: the righteous who think they are sinners and the sinners who think they are righteous. (righteous/sinners)

You hear about constitutional rights, free speech and the free press. Every time I hear these words I say to myself, ‘That man is a Red, that man is a Communist!’ (hear/free)

All figure-detection researchers, like Dubremetz, worry about what is a ‘true figure’ and what is an ‘accidental figure,’ or a ‘nonfigure’ that just happens to resemble a ‘true figure.’ But they are not thinking in terms of instances like Example 16 above (the antanametabole), which resembles one figure but is really a different figure. They are thinking, rather, about rhetorical intension. Did the speaker/writer intend their utterance to be a Rhetorical Figure, they worry, or did the words just fall into a pattern that looks like a Rhetorical Figure “Is the figure ‘accidental’?” We think this is a non-issue, but Dubremetz’s ranking strategy is an important way of showing how figures operate. In particular, it illustrates clearly that figural activity is not just a binary on/off situation. We can see instantly that the reverse repetitions of 26 (along with the medial repetitions, the mesodiplosis, of “who think they are”) serve a clear Rhetorical Function – the one we call Comprehensiveness (reinforced in the preceding clause by the phrase “only two kinds of men”), while the repetitions in 27 are looser, almost wholly untethered to each other functionally. We would reserve Dubremetz’s term “pioneering” for work that not only searches for Rhetorical Figures but probes their Rhetorical Functions as well, work that has only just begun [37]. Such research turns computational figure detection from a pattern-matching game to a genuine investigation of Natural Language. It seeks to understand rather than just to identify.

Dubremetz has also improved our understanding of annotation, by reflecting the same themes of ‘true figures’ and ‘accidental figures’. She develops a protocol for ensuring reliable human annotation of figures to build ML training corpora, involving the comparison to curated exempla, substitution tests, and translation tests [12]:33. Her protocols are valuable because they do indeed help to find figurative instances that align with the most established and venerable examples from the rhetorical tradition – as one would expect, since her instances are chiefly calibrated against such examples.

But she helps (unintentionally) to expose how much the curating rhetoricians and literary critics have fallen short on this matter, and setting figure detection researchers from Gawryjołek to Dubremetz on the wrong path. A closer look at Examples 26 and 27 should make this point. You already know, from previous discussions, that 26 contains a mesodiplosis, the clause-medial repetition of “who think they are” – contributing to the referential stability of the instance. But there’s more. There are two figures of parallelism in 26 as well, parison and isocolon, figures which frequently collocate with chiastic figures. Parison, mentioned briefly above, is syntactic repetition: both of the clauses have the same syntactic structure; that is, the second clause repeats the structure of the first clause:

[Det N]_NP [Wh-Pro]_NP [V [[Pro]_NP [V[Adj]_AP]_VP]_S]_VP

Isocolon is prosodic repetition: both of the clauses have precisely the same prosody; the second clause repeats the prosodic contour of the first clause:

Prosodic and syntactic repetition patterns are known as parallelisms, and parallelism “serves as a linguistic icon, the formal sameness pointing to a semantic sameness” [59]:47. Its Rhetorical Function is Conceptual Amalgamation, bringing the meanings close together because the forms are close together. It works toward what Kenneth Burke calls Formal Assent. “[Y]ielding to the form,” Burke says, “prepares for assent to the matter identified with it” [4]:58. The best-known expression of Conceptual Amalgamation and Formal Assent, at least in Cognitive Science, is the ‘rhyme as reason’ effect, in which expressions like 28 are judged to be more true, more accurate, and more credible than expressions like 29:

Woes unite foes.

Woes unite enemies.

These two instances come from a study of this effect entitled, “Birds of a Feather Flock Conjointly” [45], which attributes the effect to “enhanced processing” [45]:427. Perceptual sequences that activate the same neural pathways – as parison, isocolon, and rhyme (syllabic parallelism) do – are perceived to function as a rational unit. Note that the famous Example 3, above, exemplifies rhyme as reason/formal assent.

The moral of the story, then, is that Example 26 is not so much a ‘true’ antimetabole in comparison with the ‘accidental’ or ‘false’ or ‘non-’ antimetabole 27, as it is a more effective collocation of figures in which antimetabole is made more salient and cohesive through figures of parallelism and referentially stabilized by mesodiplosis. Rhetoricians and literary scholars have routinely catalogued instances like Example 26 without identifying – for the most part, without even noticing – all the other figural activity that collectively contributes to their aesthetic and suasive effects.

Example 27 also has additional figuration, but none of that figuration collaborates with the reverse repetitions. It has, for instance, some routine metonymy (press, Red) and an unrelated epanaphora (phrase- or clause-initial lexical repetition: “that man is a”). More confounding, another expression of epanaphora, of Noun-Phrase-initial adjectives, binds one repetition (the free of “free speech and the free press”) independently of the other (the hear of “You hear about constitutional rights” and “Every time I hear these words”).

The absence of parison is particularly crucial. Indeed, now that you know what parison is, you can look back over previous examples and see how regularly it collocates with antimetabole: the relevant phrases or clauses of Examples 5 and 7–15 are all syntactically parallel; and it is only the presence of negation that puts 1, and 16–20 slightly out of syntactic parallel. One particularly central job that parison does with antimetabole is to ensure that the terms repeating in reverse order swap syntactic roles. The syntactic (and often semantic) roles almost always interchange when terms reverse under parison: subjects become objects, objects become subjects, and so on. Dubremetz does not appear to be aware of the figure parison, but she notes that “the first salient common point we notice [with paradigmatic cases of antimetabole] is their perfectly symmetrical switch of syntactic roles” [12]:37. Accordingly, she tunes her detection algorithms to look for such “syntactic features,” and (unwittingly) searches for parison.

Example 27 does not exhibit parison. There is no symmetrical switch of syntactic roles. Indeed, the two clauses are so far out of parallel that repetitions of the relevant words are interrupted by a clause boundary (the first hear and both frees are in one clause, the second hear in another), and even their lexical categories are different (hear is a verb, free an adjective). The correct way to characterize 26 versus 27 in terms of Rhetorical Figures, then, does not involve the presence or absence or degree of antimetabole. Rather, it is to say that 26 is an utterance whose figural action focalizes antimetabole, while the figural action of 27 complicates and interferes with its antimetabole. But if antimetabole is reverse lexical repetition, as every definition ever formulated has it, then both 26 and 27 exhibit antimetabole. Dubremetz’s scoring system is not, therefore, diagnosing the relative presence of antimetabole but the relative effectiveness of figural collocation.

We stress again that Dubremetz, Hromada, Strommer, and the other computational figure detectors, back to the beginnings of this work with Gawryjołek, are not to be faulted on their true/false or graduated treatment of figures. They have simply and reasonably followed what rhetoricians and literary scholars propagated, rather negligently, for millennia. Rhetoricians have consistently defined figures in terms such as “two or more words are repeated in inverse order” [66]:492 and “inverting the order of repeated words (ABBA)” for antimetabole [36]:14, and then placed those definitions alongside representative examples like 1, without noting the other figures in the instance, without identifying the importance of collocation, and without tying Rhetorical Functions to the collective operation of Rhetorical Figures. Indeed, computational figure detectors like Gawryjołek and Dubremetz are to be commended. Were it not for the issues they have kicked up in their research, attempting to find the essential, computationally tractable character of certain figures, working from the paradigmatic examples that Rhetoricians and literary scholars assembled, these fundamental aspects of Rhetorical Figures may never have surfaced.

4. Rhetorical figure annotation

If we want to resolve the serious annotated-data bottleneck Dubremetz and Nivre identify for Computational Rhetoric [12], especially if we want to tap into the resources of ML to meet the challenge of Rhetorical Figure combinatorics, we will need texts annotated for occurrences of multiple figures – mutually re-enforcing, often interpenetrating, sometimes wholly overlapping multiple Rhetorical Figures, but also independent, or even mutually interfering, multiple Rhetorical Figures as well. That prospect requires a standardized annotation scheme. We have developed such a scheme.

The XML Waterloo Annotation Scheme (WAS) for Rhetorical Figures is quite straightforward, crucially identifying the figure-span and its defining elements, with standard start (<x>) and end (</x>) tags. For the figure-span, we just use the name of the figure itself; for the elements, it is the name of the figure; followed by an alphabetic variable, to distinguish the elements; followed by a numeric designation, to mark the sequence in which the elements occur; separated by dashes; all terms mandatory. Antimetabole, for instance, includes the tags <antimetabole>, <antimetabole-A-1>, <antimetabole-B-1>, <antimetabole-A-2>, and <antimetabole-B-2>. If we take the classic example of antimetabole, familiar from The Three Musketeers, the novel, various movies, assorted popular culture spin-offs, and everyday language (30a), the antimetabole is annotated as (30b), but since 30a, as you would surely expect by now, is a particularly rich collocation of figures, 30c–e complete the annotation.

All for one, one for all. [16]:80

<antimetabole-A-1>All

</antimetabole-A-1> for

<antimetabole-B-1>one

</antimetabole-B-1>

<antimetabole-B-2>one

</antimetabole-B-2> for

<antimetabole-A-2>all

</antimetabole-A-2>

</antimetabole>.

All

<mesodiplosis-A-1>for

</mesodiplosis-A-1>one, one

<mesodiplosis-A-2>for

</mesodiplosis-A-2>

</mesodiplosis> all.

<parison-A-1>All for one

</parison-A-1>,

<parison-A-2>one for all

</parison-A-2>

</parison>.

<isocolon-A-1>All for one

</isocolon-A-1>,

<isocolon-A-2>one for all

</isocolon-A-2>

</isocolon>.

This scheme is scalable and adaptable, allowing for expansion. So, while antimetabole so frequently involves two inversely repeated words that rhetoricians use the shorthand ABBA for it, sometimes there are three (which would technically be ABCCBA), four (ABCDDCBA), or more inverse repetitions, as in 31a (antimetabole annotation, 31b; there is also a parison, 31c).

A most beastly place. Mudbank, mist, swamp, and work; work, swamp, mist, and mudbank. [11]:216

A most beastly place.

<antimetabole-A-1>Mudbank

</antimetabole-A-1>,

<antimetabole-B-1>mist

</antimetabole-B-1>,

<antimetabole-C-1>swamp

</antimetabole-C-1>, and

<antimetabole-D-1>work

</antimetabole-D-1>;

<antimetabole-D-2>work

</antimetabole-D-2>,

<antimetabole-C-2>swamp

</antimetabole-C-2>,

<antimetabole-B-2>mist

</antimetabole-B-2>, and

<antimetabole-A-2>mudbank

</antimetabole-A-2>

</antimetabole>.

A most beastly place.

<parison-A-1>Mudbank, mist, swamp, and work

</parison-A-1>;

<parison-A-2>work, swamp, mist, and mudbank

</parison-A-2>

</parison>.

Phenomena such as 31 have led us, incidentally, to redefine antimetabole as “reverse repetition of at least two words.”

Note that we use a system of standoff markup, as did Gawryjołek [19,20], with each figure annotated individually, rather than tagging all figures concurrently in the same text. Not only is this a cleaner approach than inline markup with respect to the data structure, it avoids one particularly troublesome problem for XML – the complication of overlapping tags, where one figure starts inside another’s tags but ends outside them. In our first attempt to deal with this problem we used XML attributes [55], but standoff markup is a more efficient solution. We illustrate this with another instance, while also introducing a new suite of figures, the zeugmas. Zeugmas are figures in which one term or lexical string, very frequently implicating a verb, ‘controls’ at least two other terms or phrases, as in 32a, in which “He hated” applies to both Noun Phrases (NP) in the first clause (“white oppression” and “white domination”) as well as to the NP in the final clause (“white people themselves”). (The zeugmatic suite of figures is divided in terms of the location of the governing term or phrase – in prozeugmas, as in 32, the governing term appears before any of the governed elements; in mesozeugmas, it is between the governed elements; in hypozeugma, it follows all governed elements.) Example 32a also exhibits epanaphora, the repetition of the adjective white at the beginning of all three of those NPs, annotated as 32b, parison (all three NPs are parallel), annotated as 32c, and antithesis, annotated as 32d:

He hated white oppression and white domination, not white people themselves. [40]:108b

<prozeugma-A-2> [he hated] white domination, not

</prozeugma-A-2>

<prozeugma-A-3> [he hated]

</prozeugma-A-3>

</prozeugma> white people themselves.

He hated

<epanaphora-A-1>white

</epanaphora-A-1>oppression and

<epanaphora-A-2>white

</epanaphora-A-2> domination

</epanaphora>not

<epanaphora-A-3>white

</epanaphora-A-3>

</epanaphora>people themselves

<antithesis-A-1>He hated white oppression and white domination

</antithesis-A-1>

<antithesis-B-1> not white people themselves

</antithesis-B-1>

</antithesis>.

He hated

<parison-A-1>white oppression

</parison-A-1> and

<parison-A-2>white domination

</parison-A-2> not

<parison-A-3>white people

</parison-A-3>

</parison>themselves.

It is common to treat zeugma as ‘omitting’ a version or versions of the governing term. Indeed, aficionados of the recent history of formal linguistics will remember that Transformational Grammar treated zeugmas with Raising operations that deleted the relevant terms, and various forms of Lexical Theories (like Government-Binding Theory) handled zeugmas with null proforms, such as traces or PROs. We have adopted a similar convention, signaling the governing structure in zeugmas with the kind of ghost repetition, or echo, you see in 32b (“[he hated]”).

We have developed an interface for displaying multiple figures in the same instance, an evolution of Gawryjołek’s JANTOR interface [19,20]. Gawryjołek’s interface uses different colours simultaneously, turning them on with check boxes, to signal individual figures in the same passage of text. But since Rhetorical Figures do not respect one another’s boundaries, since they overlap and interpenetrate, the colours quickly cancel each other out just when the signaling is most valuable, and the passage can become a wash of brown; some figures (as with Example 30’s parison and isocolon) overlap completely. We avoid representing them simultaneously therefore, and provide a radio-button toggling interface to display them. Figure 1 illustrates this approach for Example 32.

Fig. 1.

Depiction of interface for displaying multiple figures in a single instance, representing Example 32. For printing reasons, this depiction substitutes bold italics for colours.

We can now return to the standoff-markup solution to overlapping tags: Example 32 demonstrates the sort of problem that interpenetration of figures cause for inline-markup using XML. Figure 2 is a diagram of the figural elements of Example 32, showing that it is possible to tag some of the Rhetorical Figures in nesting ways – for instance, by marking the open and close of epanaphora inside the tags for parison; and both of those Rhetorical Figures naturally nest within antithesis and prozeugma. However, the latter two figures interpenetrate, with prozeugma starting within the text-span of antithesis but ending outside its text span. That is, inline markup would result in a syntax error for antithesis and prozeugma: <antithesis> <prozeugma> …</antithesis> </prozeugma>. Standoff markup avoids this problem altogether.

Fig. 2.

The interpenetration of figures can lead to syntax errors for inline XML markup, illustrated by the crossing antithesis and prozeugma arcs in this representation of Example 32.

There are other solutions to the interpenetration problem – most notably using an inline XML annotation scheme utilizing attributes, and avoiding XML altogether, with something like a Resource Description Framework (RDF) approach – which have certain advantages. We outlined an attribute-based solution in the previous iteration of this paper, delivered at CMNA 16 [55] and we have explored an RDF approach in unpublished work. But the standoff annotation scheme is the cleanest and mnemonically most workable for human annotators and by keeping the scheme stable we can use and maintain conversion tools for other formats, for various purposes, as it becomes necessary. For instance, conversion to RDF makes the instances highly definable in an ontology (for instance, in OWL or RDFS) that can capture much more detail about constraints, relationships and categorization, as that data becomes available.

Our approach also makes use of a unique and persistent URI for each Instance in our database, so that, for instance, algorithms can easily recognize the All of 30a as the All of 30b–e, the mist of 31a as the mist of 31b–c, and so on.

5. Conclusion

Daniel Devatman Hromada concludes his important computational exploration of Rhetorical Figures by suggesting that the goal of such work is not just to produce yet “another tool for stylometric analysis of textual corpora,” but to get at a fundamental question of Natural Language: “What is the essence of figures of speech and how could they be represented within&by an artificial and/or organic symbol-manipulating agent?” [33]. While Hromada doesn’t go there himself, his goal is precisely the objective of the kind of research we have labelled pioneering above and associated with [37]. Seen in this light, Computational Rhetoric takes us back to the root goals of Artificial Intelligence – to replicate human intelligence so that we might understand human intelligence. AI researchers have known all along that Natural Language is absolutely key to this goal, but until recently they have not had much success getting at the way language functions at deeper levels of thought, such as inferencing. Argument Mining is at the forefront of these new developments. The importance of Rhetorical Figures in this field is now indisputable.

Our current annotation scheme does not solve all the problems. The eventual target is to develop a markup scheme that provides computationally accessible information for all Rhetorical Figures, but we have concentrated almost wholly on Rhetorical Schemes in our research to this point, figures of surface phenomena (phonology, morphology, lexis, and syntax). Tropes, which concern the trickier phenomena of semantics, provide significant complications that we are just beginning to tackle. But we believe they are largely tractable within our annotation scheme. Example 33, for instance, exhibits a general metaphor (“mattresses of kelp”) and a particular subtype of metaphor known as personification (“advancing tide pushed,” which ascribes intention to a natural object), which we annotate as in 33b and 33c respectively.

The advancing tide pushed mattresses of kelp up the scarp of the beach. [7]:143

The advancing tide pushed

<metaphor-S-1>mattresses

</metaphor-S-1> of

<metaphor-T-1>kelp

</metaphor-T-1>

</metaphor> up the scarp of the beach.

The

<personification-S-1>advancing

</personification-S-1>

<personification-T-1>tide

</personification-T-1>

<personification-S-2>pushed

</personification-S-2>

</personification>mattresses of kelp up the scarp of the beach.

What is important about the annotations of 33 is that sequence is much less important for metaphor, perhaps irrelevant, though some scalar tropes (such as incrementum, a series of terms increasing on some semantic scale, such as size or monetary value) do require ordering. More important for metaphor, and similar tropes (metonymy, synecdoche, and the subtypes of all three; e.g., personification) are the notions of Source (what term marks the semantic ‘turn’ which we assign the variable S) and Target (what term marks the literal base, which we assign T) [35]. For instance, in the personification (33b), advance and push are terms associated with intentional agents; that is, people. They mark the Source. Tide marks the literal base, the non-intentional, non-agentive entity about which intentional agency (personification) is predicated; it is the Target. But other figurative phenomena, such as irony, which is best regarded as discourse tone or mood, and which can pervade an entire argument or novel, or define the attitude of a certain speaker, and prolepsis, which is a strategy of anticipation (see [46]), present greater obstacles.

Footnotes

Acknowledgements

We would like to thank our colleagues at the University of Waterloo, including Elena Afros, Tyler Black, Adam Bradley, Kyle Gerber, Alice Leung, Isabel Li, Ashley Mehlenbacher, Devon Moriarty, Omar Nafees, Ricky Rong, George Ross, Terry Stewart, Katherine Tu, and Yetian Wang; our international colleagues, including Marie Dubremetz, John Lawrence, Jelena Mitrović, Chris Reed, Jacky Visser, and James Wynn; and the Social Sciences and Humanities Research Council of Canada for financial assistance. We also thank four anonymous reviewers for CMNA and Argument & Computation, for their helpful queries and suggestions. Our figure-annotation research is part of an overall project of Computational Rhetoric at the University of Waterloo, organized around a comprehensive OWL-based ontology of Rhetorical Figures.

References

Alexis, Fifteen Dogs, Coach House Books, Toronto, 2015.

Alliheedi, Multi-document summarization system using rhetorical information, Master’s thesis, 2012, supervised by Chrysanne DiMarco, Randy Allen Harris, second reader.

Alliheedi and

DiMarco, Rhetorical figuration as a metric in text summarization, in: Proceedings, 2014 Canadian Artificial Intelligence Conference, 2014 May 6–9, Montreal, QC, 2014.

Burke, The Grammar of Motives, University of California Press, Berkeley, CA, 1969.

G.W.

Bush, [Frum, D.], Address before a joint session of the congress on the United States response to the terrorist attacks of September 11. 2001 Sep 20 [cited 2018 Jan 25], in: The American Presidency Project [Internet], G. Peters and J.T. Woolley, eds, available at: http://www.presidency.ucsb.edu/ws/?pid=64731.

Cacciari,

Gibbs,

Katz and

Turner (eds), Turner M. Figure, in: Figurative Language and Thought, Oxford University Press, New York, 1997, pp. 44–87.

Chatwin, In Patagonia, Vintage, London, 2005.

Chien and

R.A.

Harris, Scheme trope chroma chengyu: Figuration in Chinese four-character idioms, Cognitive Semiotics10(6) (2010), 155–178. doi:10.3726/81610_155.

Clinton, Statement for the Americans for Marriage Equality campaign [Internet], [place unknown]: Human Rights Campaign, 2013 March 18 [cited 2018 Jan 25], Video: 6 min, available at: http://www.hrc.org/videos/videos-hillary-clinton-supports-marriage-equality#.UXAbPys4Xvl.

10.

Crosswhite, Rhetoric and computation. Symposium on Argument and Computation, 2000 June 27, Bonskeid House, Perthshire, Scotland.

11.

Dickens, Great Expectations, R. Worthington, London, 1884.

12.

Dubremetz, Detecting Rhetorical Figures based on repetition of words: Chiasmus, epanaphora, epiphora, PhD dissertation, Uppsala (Studia Linguistica Upsaliensia 18), 2017.

13.

Dubremetz and

Nivre, Rhetorical figure detection: The case of chiasmus, in: NAACL-HLT Fourth Workshop on Computational Linguistics for Literature, 2014 June 4, Denver, CO [Internet], Curran Associates, New York, 2015, pp. 23–31, available at: http://www.aclweb.org/website/old_anthology/W/W15/W15-07.pdf#page=37.

14.

Dubremetz and

Nivre, Machine learning for rhetorical figure detection: More chiasmus with less annotation, in: Proceedings of the 21st Nordic Conference of Computational Linguistics, 2017 May 23–24, Gothenburg, Sweden, 2017, pp. 37–45.

15.

Dubremetz and

Nivre, Syntax matters for rhetorical structure: The case of chiasmus, in: Proceedings of the Fifth Workshop on Computational Linguistics for Literature, NAACL-HLT, 2016 June 1, San Diego, CA, pp. 47–53.

16.

Dumas, The Three Musketeers, Simon & Schuster, New York, 2010(translation anonymous).

17.

Fahnestock, Rhetorical Figures in Scientific Argumentation, Oxford University Press, New York, 1999.

18.

Fuller, Thomas Kuhn: A Philosophical History for Our Times, Chicago University Press, Chicago, 2001.

19.

J.J.

Gawryjołek, Automated annotation and visualization of Rhetorical Figures, Master’s thesis, 2009, supervised by Chrysanne DiMarco, Randy Allen Harris, second reader.

20.

J.J.

Gawryjołek,

R.A.

Harris and

DiMarco, An annotation tool for automatically detecting Rhetorical Figures, in: Proceedings, CMNA IX (Computational Models of Natural Argument), Held with IJCAI-09, 2009 July 13, Pasadena, CA.

21.

Grasso, Towards a framework for rhetorical argumentation, in: EDILOG 2002 – Proceedings of the 6th Workshop on the Semantics and Pragmatics of Dialogue, 2002 September 4–6, Edinburgh, UK,

Bos,

M.E.

Foster and

Matheson, eds, 2002, pp. 53–60.

22.

Grasso, Towards computational rhetoric, Informal Logic29(3) (2002), 195–229.

23.

Green, Representation of argumentation in text with rhetorical structure theory, Argumentation24(2) (2010), 181–196. doi:10.1007/s10503-009-9169-4.

24.

Green, Identifying argumentation schemes in genetics research articles, in: Proceedings of the Second Workshop on Argumentation Mining, North American Conference of the Association for Computational Linguistics, NAACL, Denver, CO, 2015, pp. 12–21.

25.

Greer, The proper study of womankind, Times Literary Supplement9 1988 June 3.

26.

R.A.

Harris, Figural logic in Mendel’s Experiments on plant hybrids, Philosophy and Rhetoric46(4) (2013), 570–602. doi:10.5325/philrhet.46.4.0570.

27.

R.A.

Harris, The fourth master trope, antithesis, Advances in the History of Rhetoric21(2) (2018).

28.

R.A.

Harris and

DiMarco (eds), Special issue on Rhetorical Figures, Argument & Computation8(3) (2017).

29.

R.A.

Harris and

DiMarco, Rhetorical Figures, arguments, computation, Argument & Computation8(3) (2017), 211–231.

30.

R.A.

Harris and

DiMarco, Constructing a rhetorical figuration ontology, in: Symposium on Persuasive Technology and Digital Behaviour Intervention, Convention of the Society for the Study of Artificial Intelligence and Simulation of Behaviour (AISB), 2009 April, Edinburgh, Scotland.

31.

Hertzberg, The spat, New Yorker, 2008 Feb 11.

32.

Hoffmann and

Trousdale (eds), The Oxford Handbook of Construction Grammar, Oxford University Press, New York, 2013.

33.

D.D.

Hromada, Initial experiments with multilingual extraction of rhetoric figures by means of PERL-compatible regular expressions, in: Proceedings of the Second Student Research Workshop associated with RANLP 2011, Hissar, Bulgaria, 2011.

34.

J.F.

Kennedy, [T. Sorensen], Inaugural Address, 1961 Jan 20 [cited 2018 Jan 25]. Jan 20 [cited 2018 Jan 25], in: The American Presidency Project [Internet], G. Peters and J.T. Woolley, eds, available at: http://www.presidency.ucsb.edu/ws/index.php?pid=8032.

35.

Lakoff and

Johnson, Metaphors We Live by, University of Chicago Press, Chicago, 1980.

36.

Lanham, A Handlist of Rhetorical Terms, University of California Press, Berkeley, CA, 1991.

37.

Lawrence,

Visser and

Reed, Harnessing Rhetorical Figures for argument mining, Argument & Computation8(3) (2017), 289–310. doi:10.3233/AAC-170026.

38.

Lee , Malcolm X, Warner Bros, 1992.

39.

Magid, Zuckerberg claims we don’t build services to make money, 2012 Feb 1, Forbes [Internet], 2012 Feb 1 [cited 2018 Jan 25], available at: http://www.forbes.com/sites/larrymagid/2012/02/01/zuckerberg-claims-we-dont-build-services-to-make-money/.

40.

Mandela, The Long Walk to Freedom, Little, Brown and Company, New York, 1994.

41.

W.C.

Mann and

S.A.

Thompson, Rhetorical structure theory: Toward a functional theory of text organization, Text8(3) (1988), 243–281. doi:10.1515/text.1.1988.8.3.243.

42.

Margolicklos, Simpson’s lawyer tells jury that evidence ‘doesn’t fit’, New York Times [Internet], 1995 September 28 [cited 2018 Jan 25], available at: http://www.nytimes.com/1995/09/28/us/simpson-s-lawyer-tells-jury-that-evidence-doesn-t-fit.html?pagewanted=all.

43.

P.H.

Matthews, Syntactic Relations: A Critical Survey, Cambridge Studies in Linguistics, Vol. 114, Cambridge University Press, Cambridge, 2007.

44.

Maxwell, Jane: The Woman Who Loved Tarzan, Macmillan, New York, 2012.

45.

M.S.

McGlone and

Tofighbakhsh, Birds of a feather flock conjointly (?): Rhyme as reason in aphorisms, Psychological Science11(5) (2000), 424–428. doi:10.1111/1467-9280.00282.

46.

A.R.

Mehlenbacher, Rhetorical Figures as argument schemes – the proleptic suite, Argument & Computation8(3) (2017), 233–252.

47.

Newton, The mathematical principles of natural philosophy, Three volumes, H.D. Symonds, London, 1803, 1687, translated by Andrew Motte.

48.

Oklahoma Department of Wildlife Conservation , Outdoor Oklahoma, Vol. 51–52, 1995.

49.

O’Reilly, Lassoing rhetoric with OWL and SWRL [Internet], [unpublished MSc dissertation], 2010 [cited 2018 Jan 25], available at: http://computationalrhetoricworkshop.uwaterloo.ca/wp-content/uploads/2016/06/LassoingRhetoricWithOWLAndSWRL.pdf.

50.

O’Reilly and

Paurobally, Lassoing rhetoric with OWL and SWRL, Unpublished [Internet], 2010 [cited 2018 Jan 25], available at: http://www.academia.edu/2095469/Lassoing_Rhetoric_with_OWL_and_SWRL.

51.

Perelman and

Olbrecht-Tyteca, The New Rhetoric: A Treatise on Argumentation, Notre Dame University Press, Notre Dame, 1969, translated by John Wilkinson.

52.

Porter, Anything goes, in: G. Bolton and P.G. Woodhouse, eds, 1934.

53.

Reed and

T.J.

Norman, eds, Argumentation machines: New frontiers in argument and computation, Kluwer, Dordrecht, The Netherlands, 2003.

54.

Reed and

G.W.A.

Rowe, Araucaria: Software for argument analysis, diagramming and representation, International Journal of AI Tools13(4) (2004), 961–980. doi:10.1142/S0218213004001922.

55.

Ruan,

DiMarco and

R.A.

Harris, Rhetorical figure annotation with XML, in: Computational Models of Natural Argumentation (CMNA) 16, a Workshop at the 2016 International Joint Conference on Artificial Intelligence (IJCAI), New York, NY, 2016.

56.

Sartwell, The left-right political spectrum is bogus, The Atlantic [Internet], 2014 June 20 [cited 2018 Jan 25], available at: http://www.theatlantic.com/politics/archive/2014/06/the-left-right-political-spectrum-is-bogus/373139/.

57.

M.L.

Schagrin, The Language of Logic: A Programed Text, Random House, New York, 1968.

58.

Seuss and

T.S.

Geisel, Horton Hatches the Egg, Random House, New York, 1940.

59.

Sopher, Parallelism in modern English prose: Its formal patterns, rhetorical functions and notional relations, English Studies: A Journal of English Language and Literature63 (1982), 37–48. doi:10.1080/00138388208598156.

60.

Strommer, Using Rhetorical Figures and shallow attributes as a metric of intent in text, PhD dissertation, University of Waterloo, Waterloo, 2011, Supervised by Chrysanne DiMarco, Randy Allen Harris, Second Reader, Chris Reed, External Examiner.

61.

Teufel, The Structure of Scientific Articles: Applications to Citation Indexing and Summarization, CSLI Publications, San Francisco, 2010.

62.

S.J.

Teufel,

Moens and

Moens, An annotation scheme for discourse-level argumentation in research articles, in: Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics, Stroudsburg, PA, 1999, pp. 110–117. doi:10.3115/977035.977051.

63.

C.W.

Tindale, Acts of Arguing: A Rhetorical Model of Argument, State University of New York Press, Albany, NY, 2000.

64.

Unknown , Ultrabooks vs Laptops, Java [Internet], 2013 Jan 26 [cited 2018 Jan 25], available at: http://java-maheshyadav.blogspot.ca/2013/01/ultrabooks-vs-laptops.html.

65.

Unknown , Western Spanglish language: The United States unofficial language. Western women in leadership and innovation: Discovering the wellsprings of metaphorical voices [Internet], Unknown date, previously available at: http://westernwomenleadershipinnovation.net/western-spanglish-language.html.

66.

Vickers, In defence of rhetoric, Oxford University Press, Toronto, 1989. doi:10.1093/acprof:oso/9780198117919.001.0001.

67.

P.E.

Volpe, Man, Nature, and Society: An Introduction to Biology, W. C. Brown Company, Dubuque, IA, 1975.

68.

Walker, God in a Brothel: An Undercover Journey Into Sex Trafficking and Rescue, InterVarsity Press, Downers Grove, IL, 2011.

69.

Waller, War and the Family, The Dryden Press, Hinsdale, IL, 1940.

70.

M[alcolm] X.

, 2002 [1964], I’m not an American, I’m a victim of Americanism, Delivered at the University of Ghana, May 13, 1964, The Militant [Internet], 2002 February 11 [cited 2018 Jan 25] 66(6), available at: http://www.themilitant.com/2002/6606/660649.html.

71.

M[alcolm] X.

, It shall be the ballot or the bullet, Delivered in Washington Heights, NY, March 29, 1964, AMDOCS: Documents for the Study of American History [Internet], [cited 2018 Jan 25], available at: http://www.vlib.us/amdocs/texts/malcolmx0364.html.

An annotation scheme for Rhetorical Figures

Abstract

Keywords

1. Introduction

2 Some rhetoricians, including Fahnestock, regard antithesis as a scheme. See [27] for an argument that it is a trope.

Footnotes

Acknowledgements

References

²
Some rhetoricians, including Fahnestock, regard antithesis as a scheme. See [27] for an argument that it is a trope.