Abstract
The paper reports on the results of an exploratory study into the topical organisation and stylistic features of argumentation in a corpus of ophthalmic clinical research papers. The study responds to the need for systematised and generalisable argumentation models in knowledge-intensive fields. We present here a schematised superstructure of the arguments from the corpus, charting the configurations of stylistic features, which signal the elements of this superstructure, epistemic topoi. We pay special attention to the role of lexical categories (or semantic fields) in the configurations, to the relations between the fields, and to their interactions with other elements of the configurations, including semantic, grammatical, syntagmatic, deictic, and coreferential features. Epistemic topoi are a promising discourse constituent in argumentation because, as we found, they are distinct from syntagmatic units, such as phrases, clauses, or argumentative zones, and because they are signalled with substantially distinctive stylistic features despite having no fixed order in the superstructure. They hold considerable promise for computational argumentation analysis and processing, perhaps especially in scientific and technical discourses, where the need for reliable detection and summarisation is particularly high. Our investigation shows that despite the complex and interpenetrating semantic and stylistic attributes of argumentation, there are significant, computationally tractable regularities.
Keywords
Argumentation analysis is gaining popularity (e.g. Association for Computational Linguistics, 2014; Palau & Moens, 2009; Saint-Dizier, 2012; Wyner, Mochales-Palau, Moens, & Milward, 2010)1
Also refer to the websites of the Computational Models of Natural Argument (CMNA) workshops (http://www.cmna.info/) and the Computational Models of Argument (COMMA) conference (http://www.comma-conf.org/) for recent developments in computational modelling of argumentation.
Topoi (singular, topos), deriving from classical rhetoric, are most fully articulated in the early literature by Aristotle, who distinguishes between common topoi, lines of argument present in all genres and discourses (such as CONTRAST, COMPARE, and FROM THE IMPOSSIBLE), and specific topoi, lines of argument present in particular genres and/or argument fields.2
See Aristotle's Rhetoric (1924), especially 1358a. There are many debates concerning topoi and many interpretations of Aristotle's text, which we will not enter. We are content with the classical insight, aligned with the term topoi, that (1) ways of arguing have structural signatures and (2) some of those ways of arguing are ‘universal’ while others are local to particular argument fields.
In this paper we report on the results of the first, exploratory part of our project inquiring into the topical organization of arguments in research papers and its surface manifestations. We offer an analytic framework that can be deployed profitably in computational argumentation research.In specific, we chart the argumentative functions of epistemic topoi and their stylistic features in a corpus of clinical ophthalmic publications. During the first stage of the project we restricted our analysis to manual annotation of the corpus by one researcher with subsequent linguistic analysis of the annotation results. Despite the obvious limitations of the exploratory format of the study, we believe the findings it produced are robust enough to be deployed computationally. Another necessary caveat is that the superstructures and the specific features of particular topoi will likely show significant variation between argument domains and genres. For example, in our corpus we did not find a tense shift similar to the one that Malcolm (1987) and Myers (1992) established in their materials. Despite such variation, however, we are confident other practitioners will derive helpful insights from our description of the features of argumentative meanings and the functions of the meanings in biomedical research papers, both in our approach and in our findings.
The particular type of surface features that we present here is the recognizable configurations of stylistic elements identifying the epistemic topoi. Such configurations are not random combinations of linguistic cues but sets or interrelated semantic, lexico-grammatic, deictic, and coreferenrtial features. We start by situating our approach in an overview of earlier research on the metadiscursive signals of statement types in research papers. The next two methodological sections outline our study design and offer insights into the stylistic properties of epistemic topoi. The fourth body section describes the system of topoi in our corpus, and the last two summarize and interpret these findings in the context of computational argumentation.
Introduced in Teun A. van Dijk’s important early text-linguistics study, Macrostructures (1980), the term superstructure refers to “the schematic form that organizes the global meaning of a text” (pp. 108–9). This form “indicates which textual ‘functions’ are relevant for this kind of discourse” (p. 69).
The superstructure (which van Dijk also calls [conventional] schema) is a semantic, not syntagmatic3
The term syntagm means units of linear organization in text and discourse, such as words, phrases, sentences, or text sections (De Beaugrande, 1997, p. 354).
IMRD stands for a now popular format of empirical publications, which includes four main sections: introduction, methods, results, and discussion.
The linguistic ‘signposts’ of argumentative organisation are typically analysed in terms of their distributions in texts. Their various types have enjoyed much attention in the literature, from passages of explicit authorial commentary (Crismore & Farnsworth, 1990) to typographic features (Kumpf, 2000) and intonation (Thompson, 2003). Hyland (2005) provides an extensive inventory of metadiscursive signals, which includes phrases, content and form words. Of all these types, content words have since long been analysts’ favourites. Such lexicalised, explicit, references to argumentative meanings are often referred to as metalanguage (Berry, 2005).
Despite its great significance, however, metalanguage has limited direct correlations with the argumentative meanings of sentences or clauses. It is not present in every statement, and, even when present, it does not necessarily label epistemic topoi. For example, most academic readers will probably recognize the semantic types of the following two statement types from our corpus. The first type talks about consistency between the reported results and earlier findings:
Crichton et al. confirmed the same issue. (E10) To the best of our knowledge, the effect of brimonidine on POBF has not yet been reported. (G24)
(We label this topos
Note that there is no direct link between the metalanguage used in these statements (issue, knowledge, effect, and reported) and their topical designations. By simple lexical search, one might expect that the metalanguage of 1.a signals the
The question is then how to reconcile common intuitions about the metadiscursive functions of lexis with the negative results of empirical findings based on metalanguage. The answer to this question lies, we think, in the notion of stylistic configurations, which include lexis along with other features.
Configurations of stylistic features
Statistical correlations between certain linguistic features and the IMRD structure are frequently reported in the literature to be (e.g. Channell, 1990; Malcolm, 1987; Salager-Meyer, 1992). Yet distribution studies, such as Teufel’s (1999), have also demonstrated that at the level of statements no linguistic feature taken in isolation is sufficient for analysis of argumentative meanings. Even the distributions of multiple types of metadiscourse are not helpful for this purpose. The smallest units that can be identified statistically by the distributions of their linguistic features appear to be about a hundred words long (Biber, Csomay, Jones, & Keck, 2004, p. 57). This is an order of magnitude larger than an average sentence in most texts, including argumentative texts (and certainly including our corpus).
Comprehensive descriptions of the linguistic signatures of particular statement types are laborious and rare. But from such work we draw confidence that combinations of linguistic features can provide access to text semantics at the statement level. Swales's influential 1990 CARS model of rhetorical moves in research papers suggests that the linguistic features of statements can index their meanings even in abstraction from their particular contents and textual environments. For instance, statements that evoke recent developments in the relevant field are signalled with recognizable sets of features:
The possibility … has generated interest in … Recently, there has been wide interest in … The study of … has become an important aspect of … The theory that … has led to hope that … The effect of … has been studied extensively in recent years. Many investigators have recently tuned to … (p. 144)
Similarly valenced diction (classic, great importance, central), however, when combined with the present indefinite tense, creates a different profile:
The time development … is a classic problem in fluid mechanics. The explication of the relationship between … is a classic problem of … Knowledge of … has a great importance for … A central issue in … is the validity of … (p. 144)
Swales’ examples demonstrate that argumentative meanings are communicated by combinations of linguistic features, not by specific isolated markers. Constellations of features, like tense, aspect, and diction, create the stylistic profiles of statements. Following Swales’ analysis, several other authors have confirmed his insights that configurations of features may indeed operate as metadiscourse (Hersh, 2009, p. 406; Liakata, Thompson, Waard, Nawaz, Maat, & Ananiadou, 2012; Litman, 1996; Stirling, Fletcher, Mushin, & Wales, 2001; Taboada, 2009).
Our findings join these confirmations of Swales’ thesis. And we go further. Work in these areas has up till now been carried out as two parallel lines of analysis, one focused on identifying the meanings, and the other on describing (or computing) their surface features. We do both in concert. We catalogue argumentative meanings based on their surface manifestations. Our contribution, that is, consists in bringing these two lines of analysis together. Our analysis covers the entire argumentative superstructure along with the stylistic profiles of its elements as they are represented in our corpus.
Study design and corpus
Our study design had significant differences from the methods currently popular in computational argumentation. Rather than evaluate an imported taxonomy based on inter-annotator agreement, we developed our typology from the bottom up using the methods of analytic induction (Strong, 1988) and triangulation (Lazaraton, 2002). The corpus annotation results produced by one annotator were triangulated with the results of linguistic analysis and insights gained by studying the field's metatheory. Thus, we conceived our study along the lines of Willard’s human as scientist approach (1989, p. 18), on the assumption that analysts can learn about significant textual and discursive patterns of a domain in the same way its novices do (cf. Wilbur, Rzhetsky, & Shatkay, 2006). As is usually the case with exploratory research, our study was iterative and cyclical (Brown, 2004). We view this approach as necessary for the development of a typology that is based both on human readers' perceptions of argumentative meanings and their surface features, which may allow for more elaborate annotation guidelines and more effective machine training materials in the future.
The choice of the publication topic for the corpus was motivated by a combination of the lead researcher's professional background and personal interest in glaucoma. Our work with the corpus consisted of a survey of a set of NTG articles followed with close reading of the literature reviews from this set and manual annotation of a smaller subset of clinical research papers. The survey helped us to gain understanding of the nature of NTG and its treatment and to achieve insight into the conceptual and methodological tools of the field's research and the argumentative features of the publications. The literature reviews introduced us to the insiders' perspectives on the state of the art. Finally, the annotation and linguistic analysis allowed us to identify, describe, and classify the most salient and significant argumentative patterns in the clinical studies and identify their stylistic manifestations.
The larger set consisted of fifty-seven NTG papers: all MEDLINE (PubMed) full-text English-language articles published by 1994 and all MEDLINE (PubMed) free full-text English-language articles with abstracts published after 1994. It included several research genres, most notably case studies, literature reviews, methodological inquires, clinical, experimental, and laboratory investigations. This set was narrowed down to a manageable size and format for manual annotation, based on the genre and technical parameters of the publications. Specifically, the annotation corpus was restricted to clinical research papers (the largest subset in the larger set), from which two papers were excluded because of their no-copy format. It consisted of seventeen articles (45,599 words), listed in Appendix 1.
To concentrate on the textual mechanics of the argument, the papers were stripped of figures, tables, end-of-text citations, and front- and end-matter. Parenthetical citations were replaced with ellipses. Using the technique of visual annotation (Gladkova, 2010), the lead author identified the recurrent statement types comprising the argumentative superstructure of the papers.5
We believe visual annotation to be a very important tool for text analysis, but do not have sufficient room and scope to include details about this method here. We will publish a justification and illustration of visual annotation elsewhere. In the meantime, Gladkova's dissertation is the best source (2010, pp. 88–93).
Syntagmatic indeterminacy of topoi
Syntagmatic realisations of argumentative functions and meanings may lie anywhere between a lexical unit and a text. This is because meanings occupy a different plane of organisation from syntax. Halliday and Hasan (1976) usefully observe that they are communicated by ‘texture’, a network of semantic links in the text (p. 8), and are subject to a different ‘kind of STRUCTURAL integration’ than clauses and sentences (Halliday and Hasan, 1976, p. 2; their emphasis).
Consider the We cannot directly compare our results with the previous data, because our work has a different In all examples the emphasis is ours.
Our
The purpose of the present study was to classify patients with untreated NTG by the degree of nocturnal BP reduction; to study BP, IOP, and MOPP parameters in each classification; and to investigate predictor variables of circadian MOPP fluctuation (CMF). (G5)
This is a retrospective review of a large number of NPG patients referred to a single hospital based glaucoma service. (G33)
In this study, we tried to evaluate the difference in nerve fibre layer (NFL) defects between NTG and POAG through analysis of RNFL photographs. (G21)
The flexible unit sizes that can perform argumentative functions highlight both the difference and the interaction between semantic and syntagmatic organization and show that argumentative meanings, or epistemic topoi, are primarily associated with the former, rather than the latter. The syntagmatic indeterminacy of topoi creates challenges for analysis and calls for methodological decisions. One obvious decision concerns the size of analytic units. The current practice for information-retrieval and knowledge-management systems favours lifting whole sentences from texts, rather than generating synthetic content from words and phrases. In keeping with this practice, we decided to focus on sentence-size units and dismiss argumentative meanings manifest below or above the sentence level. In statements with ambiguous sentence boundaries, our default analysis unit was the clause. A different decision was required when we encountered loose lexically cohesive links between sentences (connectives, anaphoric pronouns, adverbs, and adjectives). When extracted from the text, statements with such links seem incomplete or even incomprehensible:
Unless loose links were a recurrent feature of a topos (such as
Overlapping and nesting topoi also called for methodological decisions. We had to decide which configurations of features to count as basic types and which as composite. Our method consisted of treating as a basic topos any statement type with an identifiable and recurrent semantic makeup. Combinations of such basic topoi were classified as composite categories. For example, the major semantic elements of the topos
To go along with
The elements of the stylistic configurations of topoi do not function as unmalleable nuggets of semantic information but interact with one another and with their semantic and syntagmatic environments in rich but regular ways. To illustrate this property of the topoi from our corpus, we will first consider some of their lexical attributes.
Most of the lexemes that we found to be significant for our purposes provide expressions for people’s activities and for the theory that they construct about health and disease. The expressions allowing authors to talk about their activities concern clinical practice, data acquisition, and argument and discourse organisation. Theory expressions include language related to observations and knowledge.
Clinical practice lexis containing professional terminology is a prominent category in our corpus:
Clinical procedures (e.g. follow-up, management, operate, surgery, therapy, wash-out) Diseases, syndromes, symptoms, clinical instruments, and medications (e.g. latanoprost, NTG, POAG, [visual] field loss). Analysis (e.g. assess, calculate, determine, divide) Examination (e.g. [de]note, examine, measure, monitor, record, test) Designation (e.g. calculate/define/record/take as, classify as/in[to], consider as/to be, criteria, exclude, include, judge). Reasoning (e.g. conclude, consider, data, find, know, propose, reveal, suggest, support) Research (e.g. address, investigate, literature, paper, publish, question, report). Phenomena and attributes (e.g. age, deteriorate, duration, high, range, time, value) Circumscription (e.g. ≥, at least, maximum, only, or better) Generalisation (e.g. all, average, both, each, few, majority, mean, most, none, range) Numerals (e.g. two, 14.6 ± 1.7) Participants (e.g. controls, eye, patients, subjects). Association (e.g. associated, correlated, involve, linked, more likely, predictor) Cause and effect (e.g. cause, contribute, influence, lead to, mechanism, pathogenesis, risk [factor], role, susceptible; affect verbs: e.g. affect, improve, increase, reduce) Identity (e.g. characteristic, distinct, form, normal, reliable, reproducible, subset, typical). Presence and appearance (e.g. have, indicator, present with/as, reflect, show, sign). Congruity and consistency (e.g. agreement, comparable, confirm, consistent, similar, surprising) Enablement and possibility (e.g. able, can, easy, likely, may, possible) Deontic modality (e.g. have to, need, must, should) Diminution or negation (e.g. few, hardly, not, small) Evaluation (e.g. advantage, ideal, successful) Time and aspect (e.g. further, future, recent, new, no longer, remain, today’s).
The next category, which also accounts for much volume in the papers and is associated with a substantial body of expressions, is data acquisition:
Epistemic lexis is related to the organisation of arguments and discourse:
Observations of natural phenomena inform the next large body of expressions:
Knowledge lexis indexes available theory and relations between what the community considers as relevant aspects and parameters of the observed phenomena:
Several categories of expressions communicate more abstract meanings than those listed above and are used in the stylistic configurations of topoi across the board:
In our corpus, we found that most lexemes function within their topical configurations as members of their semantic categories (or fields) rather than as unique individuals, which suggests powerful computational applications. But, as always, there are complications. We found exceptions to this principle that require special attention.
First, as we learned, lexical synonymy does not always imply identical semantic functions. For example, significant and important are close synonyms in most contexts, and this is true to an extent in our corpus as well. That is why both lexemes are frequent in
Second, we found a few expressions working as unique identifiers of their argumentative meanings. Such are the phrases in summary and in conclusion, which in our corpus occur only in the
Overall, however, stand-alone or unique lexical markers were anomalies to the general stylistic complexity of epistemic topoi. First, most configurations signalling the topoi in our corpus include not single but multiple lexical field members, typically two or more per topos. Second, lexical signals are usually combined with other stylistic features. Third, in a significant number of topoi some lexical fields were found to be interchangeable within their stylistic configurations with other types of features.
To illustrate some of these points, consider the Therefore, POAG patients with uniocular field loss represent an In view of these, the As the eye with the most serious progression was operated on, the fact that in the operated eye progression was stopped and in the non-operated eye progression went on has double The This makes measurement of POBF Our analysis
As we explained above, many
Of course a relevance expression is not sufficient on its own to make a On the other hand, there are also a number of studies by different authors that describe the It is unknown whether this effect is of Therefore, POAG
A careful look at the
Thus, the identification of
Interaction between the elements of stylistic configurations
Lexical analysis is indispensable for the computational processing and modelling of arguments. Yet not only is lexis seldom the only significant attribute of a topos, it is also hardly the most reliable one. We found that lexicalised expressions may be interchangeable with morphological and grammatical meanings. Also frequent are interactions of the meanings and functions of lexis with grammar, morphology, and syntax.
One striking feature of the topoi in our corpus is the morphological fluidity of their stylistic configurations. Consider these two This study is This study was conducted
In terms of their semantic makeups, both statements have expressions of purpose. Yet, while in 6a the idea is communicated with an explicit metalinguistic tag, aimed, in 6b the same effect is achieved syntactically, with the presence of an adverbial modifier (or adjunct) of purpose, to determine. The particular lexical meaning of the adjunct plays no role here. A different word with the same grammatical meaning would work just as well: This study was conducted to analyse/ to test/ to study/ to inquire, etc.
In the same way, the idea of possibility, ability, or enablement may be expressed either explicitly, with the metalinguistic tag possible, or with a modal verb such as can. It may even be expressed morphologically, with the suffix –ible/-able. The idea of identity can be conveyed with identity lexis (e.g. characteristic, subtype) or with the help of a copula establishing a relationship of identity between the subject and predicative (e.g. Phospholipids
Another linguistic phenomenon that analysis of topical organisation must take into account is lexical polysemy. In our corpus, we found that some rather frequent or significant lexemes shift their meanings depending on the configurations in which they are integrated to form a particular topos. One such lexeme is group, a definitive feature of a number of topoi, which is typically associated with participant lexis but also communicates the idea of organisation. In particular, group is a lexical attribute of the In 10 Of the remaining 83 Six These ‘visual field failure’ events were modelled on baseline values for
On the other hand, where the organisation motifs take the upper hand, we find group in the company of such abstract analytic notions as category, sample, parameter, factor, and event. Consider this representative example of the
Here the authors categorise group as a factor in parallel with another abstraction, sex. In such statements, group does not mean a collective of study participants but an analytic entity with certain characteristics deemed significant for the investigation.
In some cases, the meaning that a lexeme brings to a topos depends on its morphological form. For example, the adjective normal communicates the meaning of identity, in the same way as characteristic or typical. Witness the similarity of function between the underscored words in these To make the diagnosis of NTG, patients must have had … a reproducible visual field defect A diagnosis of NTG was made if … glaucomatous optic disc changes and visual field defects Levels of lOP, which are Compressive optic neuropathy is
The adverb normall
Within lexical fields, the semantic distinctions between abstract nouns and other morphological forms turned out to play an important part in the organisation of arguments. Consider the role of the word women in this Thirty-four patients (63%) were
It talks about the sex of the patients using a concrete noun, women. Most of such statements are found in the results section. On the other hand, when the authors analyse and interpret observations – as distinct from reporting them – more abstract nouns come into play. In the following Analysis using Cox univariate and multivariate regression techniques revealed strong evidence of independent associations between time to onset of field loss and both the
Similarly, the presence of results in ‘It should be noted that the
Syntagmatic environments influence meanings in a number of ways. First, some lexemes take on different meanings in different syntactic structures. For example, the word classify may refer to an analytic activity, along with calculate, define, or divide. However, in combination with as or into, the same word takes on the meaning of designation and thus falls into the same field as to define as, exclude, or include. Second, the syntactic roles that certain lexical elements play or the positions they occupy in their sentences is a factor in our corpus. One group of such elements is the analytic abstractions difference, significance, and correlation, some of the most frequent abstract nouns in our corpus. They are particularly prominent in the
Another syntagmatic feature that may interact with the meanings of some lexical elements is the word order. For example, autoreferential phrases like our study or this paper are one of the elements indicating the
Last but not least, the interaction between the lexical features of topoi and their temporal and modal profiles creates rather interesting semantic patterns. Some of the most important distinctions between various kinds of primary and secondary information in our corpus are signalled through the modal, temporal, and aspectual features of the statements: verbal tenses, infinitive forms, modal expressions, and adverbial modifiers. For examples of these distinctions, we will turn to two groups of statements: one talking about well-known relationships among the phenomena at issue, and the other about relationships that have been found or are proposed by the authors. These relationships include causal links, correlations, associations, differences, similarities, groupings, and divisions. In our corpus, we have divided the statements addressing such relationships into the following topoi:
K K F F E
The first two topoi talk about the state of knowledge. The last three present the authors' own findings and thoughts. To communicate these argumentative meanings, the authors deploy a system of lexical signals combined with certain syntactic structures.
Some statements conveying received knowledge are presented as more or less unproblematic information:
Included as positive factors are a higher incidence of disc haemorrhage … , more pronounced peripapillary atrophy … , higher incidence of retinal occlusive vascular diseases … , coexistence of immunocompromised conditions … , increased resistance index in orbital vessels … , and alterations in the diurnal variation of systemic blood pressure … (G12) In normal subjects, a higher IOP is associated with a higher degree of myopic refraction, and myopia is more prevalent in patients with primary open-angle glaucoma or NTG than in normal subjects … (E1)
Oftentimes, however, the authors will use hedging to tone down their sense of confidence in the propositions:
In each of these pairs, the first statement (10a and 10c) talks about
The stylistic configurations of the set of statements below are somewhat similar to the ones above in that the first one talks about Dorzolamide lead [sic.] to a significant acceleration of systolic blood flow in the short posterior ciliary artery (Table 4). (G10) Eyes from older patients were more likely to lose visual acuity over the follow-up period. (G33) Probably the deficient perfusion of the optic nerve head is due to an imbalance between intraocular pressure and the blood-pressure in the small branches of the short, ciliary arteries. (E15) This suggests that an increased POBF may be associated with favourable prognosis of glaucoma … (G24)
What are the features telling the readers that statements 10a through 10d refer to
The former of these statements hypothesises about a causal link, and the latter points to a possible correlation. In contrast with the
In summary, it is safe to say that topoi consist of meanings, rather than words. Many of these meanings can be expressed not only by lexical means but also morphologically and grammatically. Lexical meanings also frequently depend on their morphological and grammatical forms and syntagmatic environments.
Corpus annotation and analysis results: topical organisation of ophthalmic research papers
An important motive behind the studies reported on in the papers from our corpus, as one would expect, is understanding the nature of the disease, its treatment, and management options. Such understanding requires a complex conceptual system involving phenomenal, methodological, and technical knowledge. The phenomenal knowledge includes the description of NTG in terms of its signs and symptoms, unique cases and general patterns, causes, effects, and risk factors, types and distinctions from other similar diseases, diagnosis protocols, treatment methods and their effects. The knowledge also accounts for interactions among these aspects and the ways that the disease affects lives and society.
Of all modes of reasoning used in biomedical research, medical metatheory especially favours problem-solving and decision-making (Connelly & Johnson, 1980; Levene, 1980; see also Kneale, 1949, where they are called, respectively, primary and secondary induction). In addition to inductive reasoning, we identified a significant number of topoi associated with the projected interpersonal relations between the authors and readers (cf. Hyland, 2005).
Each of these three argumentation modes (problem-solving, decision-making, and interpersonal) performs a number of functions. Problem-solving (primary) induction is aimed at the solutions of the specific biomedical problems addressed in the papers. Decision-making (secondary) induction allows the authors to make research decisions, interpret findings, and propose recommendations for future research and clinical practice. Interpersonal argumentation is used to impress the significance of the studies on the readers, to engage them in the integration of the findings into the field’s theory, and to make the arguments reader-friendly. The distinctions between the reasoning modes are loosely linked to the conventional IMRD structure of the papers. The methods and results sections mostly deal with problem-solving induction, while introductions and discussions are dominated by decision-making induction. Interpersonal argumentation also tends to gravitate towards introductions and discussions. The layers of argument created by the three reasoning modes have various degrees of cohesion between them. Interpersonal argumentation is the most autonomous. Problem-solving and decision-making are also fairly independent in their objectives, materials, and results, but one would make little sense without the other.
Problem-solving topoi
Problem-solving, based on primary information, is comparatively straightforward. It follows a set of highly standardised procedures, including statistical analysis and comparison. The relative simplicity of its operations means that, within their methodological frameworks, the authors have little influence on their results (though, of course, they have great influence over the input to such procedures). Such straightforwardness is underscored by rather uniform temporal features of this mode of reasoning. Problem-solving reports on what was done and found during the investigation, and it is overwhelmingly written in the past tense. The problem-solving topoi are divided between the method and results narratives depending whether they deal with methodological or observational content.
Method narratives
The method narratives, mostly found in the methods sections, communicate information about the researchers’ actions, stipulations, and decisions. The following basic topoi are used to convey these meanings:
This part of the argument is constructed as a matter-of-fact account of procedures and techniques most of which are expected to be familiar to the readers. It has highly standardised terminology and few citations. Despite their transparent coding, such narratives play an important part in the community’s discourse. Detailed accounts of study designs are present in each paper, which suggests that verifiability and replicability (at least in principle) are highly valued appeals. An average methods section in our corpus is almost as long as an average discussion, twice as long as a results section and thrice that of an introduction.
There is also a conventionalised order to the topoi. Authors first tend to talk about their patients (
The
The
I
Cf. Trawiński’s (1989) ‘schedule of testing method’, ‘place where testing was carried out’, ‘time of testing’, and ‘specification of procedures employed in testing’, and Salager-Meyer’s (1994) ‘describe the process which led to the obtaining of the data’.
Isolated peaks of 26 mmHg
The
I
Cf. Trawiński’s (1989) ‘specification of objects used in testing’.
The
For
Cf. Trawiński’s (1989) ‘specification of equipment used’, ‘source of objects’, and ‘source of equipment’.
Stereophotographs of the optic discs were taken with the simultaneous stereo fundus camera (Topcon TRC-SS2), using Kodak Ektachrome 100 HC film. (E1)
Visual field examinations were performed with the 24-2 full-threshold program on the Humphrey field analyzer (HFA; Carl Zeiss Meditec, Inc., Dublin, CA). (G5)
The next group of methodological topoi refers to the processing and analysis of data. The
Cf. Trawiński’s (1989) ‘model used’ and ‘data reductions, calculations’.
Statistical
The
Cf. Aristotle's ‘definition’ (Huseman, 1994), Trawiński’s (1989) ‘evaluation criteria used’, Liddy’s (1991) ‘new terms defined’, and Swales’s (2004) ‘definitional clarifications’.
The intensity decrease was an
Early complications
Altitudinal visual field asymmetry
In the results sections, the authors develop their arguments from ‘raw’ data to problem-solving inferences. Here the authors present their observations in the form of quantified data of various levels of specificity. This content relies on a relatively small number of topoi:
The outcomes of problem-solving induction drawn from the data presented in the methods and results narratives take the following forms in our corpus:
Continuing from the methods topoi, such statements are written overwhelmingly in the past tense, which identifies their content as strictly local, confined to the study.
The data for the most part refer to the study participants, clinical interventions, and their effects. In our corpus, we identified the following data topoi:
Twelve patients had less than five visual field examinations after surgery. (G31) None of the patients was lost to follow up. (G36) Four eyes had central islands, six eyes had defects in one centrocecal area, 16 eyes had extensive arcuate defects. (E15) Thirty-four patients (63%) were women. (G36) In the negative control group, all parameters were stable over time (Table 3). (G10)
A common feature for these topoi is quantifiers: numerals, generalisation, and circumscription expressions (e.g. all, at least, average, maximum, none). (Generalisation expressions are more typical of
In their problem-solving inferences, the writers make comparisons, talk about found links and intervention effects, as well as about the statistical significance of such results:
Angle a in the NTG group (35.1 (20.0)°) was significantly smaller than that of the POAG group (45.9 (21.9)°) (p=0.02), while angle b in the NTG group (49.0 (31.9)°) was significantly larger than that of the POAG group (33.1 (23.9)°) (p=0.01) (Fig 3). (G21) As in previous studies … , we found that the higher the baseline IOP, the greater the IOP reduction, and that a statistically significant IOP reduction is more likely to occur at pre-treatment IOP levels of over 15 mmHg. (G6) Interestingly, age of patient had a significant effect on response to latanoprost. (G16) There was no significant difference between these figures. (G31)
As in the data topoi above, quantifiers are frequent here, but not essential. Instead, a distinctive feature of two topoi in this group are comparison and juxtaposition. In the
The other two topoi from the group of problem-solving inferences,
Decision-making topoi
Decision-making is more complex than problem-solving, and the degree of the authors’ involvement in the outcomes of this reasoning mode is quite high. In this reasoning mode, the authors interpret their studies in practical terms and generally make them meaningful for their readers. While the generation of primary results is a matter of technique, decision-making induction (Kneale, 1949’s secondary induction) involves numerous choices. To a great extent, these choices account for the theoretical frameworks and methodologies applied to the problem. Other important functions of decision-making are the formulation of study objectives and the interpretation and evaluation of findings. One more important outcome of decision-making induction is higher-level analysis of theory and practice, which cannot be based on primary results alone. To arrive at such interpretations, the authors juxtapose and synthesise their methods and results with relevant information from their sources.
Framing and cohesion in the arguments
Much of the distance from the title to major findings and recommendations is covered with the help of transformation and translation of ideas. In our corpus, these procedures are enacted with the following group of topoi:
The exigencies or opportunities motivating the study are formulated early in introductions and discussions. They may figure as generalisations about the However, the However,
This topos is used to point to lacking knowledge or insufficient literature on the issue pursued in the paper. Such lack or insufficiency is typically conveyed with negative diction (not, few). The stylistic features of
Like
Cf. Salager-Meyer’s (1994) ‘justify the reason for the investigation’ and Swales’s (2004) ‘indicating a gap’.
The incidence of this pathology
A more important outcome after filtering surgery is the prevention of further visual field deterioration;
The A Latanoprost is a ‘
Such statements have strong affective appeals coming from lexicalised expressions of novelty or promise (new, proposed, potential) or recency and currency adverbials (recently), as well as the present perfect or indefinite tense in the main clause.
The
Cf. Trawiński’s (1989) ‘idea of testing method’, Liddy’s (1991) ‘research questions’ and ‘research topic’, Myers’s (1992) ‘self-referential introductory statements’, and Swales’s (2004) ‘announcing present research descriptively and/or purposively’.
In
H
More typical in conclusions than
Cf. Trawiński’s (1989) ‘possible ways of improving solution’, Liddy’s (1991) ‘practical applications’ and Salager-Meyer’s (1994) ‘make suggestions’.
In patients with progressive
Circulatory
A common feature of all framing topoi are their numerous coreference ties with the titles of their papers. These ties are created with terms from the titles, their synonyms, hyponyms, or hypernyms (cf. Halliday, 1994, Ch. 9). For example, every term from the title of the E15 paper (‘Results of a Filtering Procedure in Low Tension Glaucoma’) is invoked at least once in its
A big part of decision-making induction, utilising secondary information, is background reviews, mostly incorporated into introductions. Secondary information is also frequently cited in discussion sections. Such content is realised with the following topoi:
The first three of these topoi refer to well-established theory and practice. The similarity of their content is accentuated by their shared time and aspect features. They typically have present-tense predicates (with simple infinitives, if any) in the main clause:
Aside from the similarities, the distinctive content types of these topoi are apparent from their phrasing and forms. K
In contrast to the previous three topoi,
In all examples parenthetical citations are represented with ellipses.
Of a
In
The last topos in the secondary information group,
Cf. Aristotle's ‘existing decisions’ (Huseman, 1994) and Liddy’s (1991) ‘relation to other research’.
Appraisal and interpretation of primary results are indispensable for their synthesis with secondary information. Such synthesis may take various forms:
The
Cf. Aristotle's ‘proportional results’ and ‘identical results’ (Huseman, 1994), Trawiński’s (1989) ‘comparison with results obtained by other authors’, and Thompson’s (1993) ‘statements citing external consistency’.
Concerning the postoperative
This is a considerably
In our corpus, the authors often comment on the Diagnostic Our In the present study, we modified the standard for
This topos shares many features with
Cf. Trawiński’s (1989) ‘evaluation of data completeness’ and ‘analysis of possible errors’ and Thompson’s (1993) ‘evaluative comments on the quality of experimental data’.
Given the
The
Many authors go beyond simple commentary on results and methods. They also use statistical methods and combine multiple analytic methods to probe into their
After
Apart from getting acknowledged, analysed, and reckoned with, research contingencies can work as a type of evidence. Some authors take them into account when interpreting their results. Such interpretations take the form of A significant This may be
One obvious feature of this topos is the presence of cause and effect or association expressions (because, related). The circumstances of the study may also be invoked with explicit topos-specific lexis, such as selection criteria and confounding factors. In addition,
The next group of topoi serves the purposes of interpreting primary and secondary observations and drawing up the major outcomes of the study:
E
Cf. Trawiński’s (1989) ‘explanation of results obtained’.
These findings
These differences
As their
In their
Cf. Liddy’s (1991) ‘future research needs’ and Salager-Meyer’s (1994) ‘propose further questions’, as well as Trawiński’s (1989) ‘new problems encountered during research’, Thompson’s (1993) ‘calls for further research in the results section’, and Harmsze and Kircz’s (1998) ‘new problems’.
Hence, our
Whether
Whether an
Interpersonal argumentation can be seen as a way of translating the writers’ categories into the readers’ in a bid to influence their perceptions and behaviours. Topoi used in this reasoning mode have comparatively weak links with the problem-solving and decision-making operations. They seem to be brought in for the sake of delivering the results to the readers and engaging the community in their interpretation, dissemination, and integration into current theory and practice. In our study, we followed Spinoza (Ethics, 1677/2007) in dividing interpersonal argumentation into affective and logical dimensions, rather than follow the more popular ethical-emotional-logical division.
Affective appeals
Affective argumentation allows the authors to set up their credibility, project their characters, signal their memberships in discourse communities, and express their concern with the patients’ and other stakeholders’ interests. In our corpus, affective topoi produce these effects by relating arguments to professional and social contexts:
Cf. Aristotle's ‘the expediency or the harmfulness’ (Rhetoric, I.3.1358b), Trawiński’s (1989) ‘possible usage areas in practice’ and ‘possible usage areas in science’, Salager-Meyer’s (1994) ‘motivate the study’, and Swales’s (2004) ‘stating the value of the present research’.
However, the The Therefore,
Such statements tend to contain expressions of clinical practice, phenomena and attributes, associations, or cause and effect lexis (disease, therapy, treatment, NTG, therapeutic, POAG, field loss, factors, influencing, period of time) combined with analysis, examination, or argument and discourse organisation lexis (population, investigate). Another distinctive feature of this topos is the meaning of evaluation. It tends to have expressions with negative or positive connotations (hardly, challenge, ideal) often combined with enablement and possibility expressions (accessible). In addition to their suasive function,
In Japan Shiose found a
In our corpus,
The The visual fields The study was approved by the Norwich District ethics committee and all patients underwent informed consent. (G16)
This statement type is marked by topos-specific lexis: either expressions of impartiality (such as external or masked) or explicit references to organisations or procedures created for the protection of research participant rights (such as Norwich District ethics committee and informed consent). The local, transient, nature of such information is communicated with the past tense of the main clause verbs.
Logical topoi help walk readers through the argument, as it were. The authors use them to make their texts reader-friendly while at the same time managing the readers’ responses to the paper content. They do so by providing clarifications and addressing the readers’ likely questions:
Such statements provide commonplace information on research techniques and clinical procedures. The
Cf. Trawiński’s (1989) ‘characteristics’ content elements.
The
Performing multiple statistical
The
Cf. Salager-Meyer’s (1994) ‘describe the process of manipulating the data obtained during the experimental stage’.
This was a retrospective clinical study. (G21)
The study was designed as an interventional, randomized, prospective, institutional, single-blinded, controlled, clinical trial. (G10)
The next category of logical topoi refers the readers to additional information within the texts, in the literature or the media environment:
The morphometric characteristics of the optic discs are summarised in Table 2. (G36) Figure 2 shows the mean diurnal curves for both randomised groups at baseline and at follow up. (G16) All data are given as mean ± standard error of means (SEM). (G10)
The mode of progression of LTG patients has recently been described … (E15) The effect of such a lowering in IOP is to be addressed in a companion paper … (G35) The method has been presented in detail elsewhere … (G34)
Latanoprost (50 µg/ml) was obtained from Pharmacia Pfizer (Karlsruhe, Germany) as Xalatan®. Further ingredients are benz alkonium chloride, sodium chloride, sodium dihydrogene phosphate 1H2O, sodium monohydrogene phosphate, and water. (G10)
Such statements are more detailed than parenthetical remarks, cross-references, and citations but essentially perform the same functions. The first of these topoi,
One interpersonal topos,
Cf. Aristotle's ‘incentives and deterrents’ (Huseman, 1994), Trawiński’s (1989) ‘justification’ content elements, and Thompson’s (1993) ‘justifications for methodological selections’.
This
We have presented forty-four epistemic topoi that form the argumentative superstructure of ophthalmic research papers in our corpus. These topoi are associated with three modes of reasoning in the texts: problem-solving, decision-making, and interpersonal. Problem solving is focused on answering the technical questions formulated for the study. In decision-making, the formulation of the questions, the selection of methods and procedures, and the interpretation of results take centre stage. Finally, interpersonal warranting includes affect and logos, the former appealing to the readers' sensibilities, and the latter making the arguments reader-friendly and educational.
Our findings bear out the idea that argumentative organization is not signalled with isolated linguistic features but with their configurations. The elements of these configurations are not uniform. In our corpus they include lexico-grammatical and semantic relations, syntax, deixis, and coreference.
Related work and implications for argumentation
Our work extends naturally and richly into the area of computational argumentation. Grasso (2002) indicates that there are
three possible avenues for research should an AI scholar wish to undertake the task of creating a computational model of rhetorical argumentation (Crosswhite, 2000). The first is the exploitation of the argumentative schemata, of which literature in rhetoric provides a rich repository. The second is the exploitation of the figures of speech, and the ways they influence argumentation. The third is the explicit representation of the audience. (p. 59)
While our work here is not concerned with figures of speech, but it is entirely consonant with that approach. Fahnestock (1999, 23–24, et passim) argues convincingly that figures of speech epitomize topoi; conversely, topoi are elaborations of the argumentative structures that figures can crystallise. The computational detection and plotting of figures is a fine-grained approach that reveals much (Harris & DiMarco, 2009; Gawryjolek, DiMarco, & Harris 2009), especially in stylistically rich argument discourses like political speeches and opinion pieces. But it also misses larger units of argument structure, particularly in the texts of authors not given to crystalline phrasing. The computational detection and plotting of topoi would operate at a mid-grained level, and should prove especially profitable for scientific and technical argumentation.
Based on our review of past CMNA workshops, COMMA conferences, and the First Workshop on Argumentation Mining, held at the 2014 Association for Computational Linguistics Conference.
There is a key difference, however, between the stylistic configurations that we describe here and the more traditional loose collections of stylistic features (in what might be termed, somewhat facetiously, a “bag-of-features” approach). The features of stylistic configurations interact with one another and with their semantic and syntagmatic environments in rich but regular ways. We have found many benefits to configuration analysis. One benefit is the small unit size it can ‘pick out.’ Our model describes epistemic topoi at statement-level, but if necessary configuration analysis can go down to the level of clauses and even phrases. Another benefit is that stylistic configurations allow for analysis of topoi regardless of their location in the text. Our findings confirm that authors arrange their ideas with an eye to the writing conventions in their research fields. Yet for most topoi these conventions translate into fairly loose location predictors, not hard rules. Finally, stylistic configuration analysis allows for comprehensive modelling of argumentation. Rather than focus on one argumentation mode (such as problem-solving or interpersonal reasoning), our superstructure covers three modes: decision-making, problem-solving, and interpersonal reasoning. In this respect our superstructure is somewhat similar to Liakata, Thompson, Waard, Nawaz, Maat, and Ananiadou's 2012 model, which combines three schemes:
“Core scientific concepts”: Hypothesis, Motivation, Background, Goal, Object-New, Object-New-Advantage, Object-New-Disadvantage, Method-New, Method-New-Advantage, Method-New-Disadvantage, Method-Old, Method-Old-Advantage, Method-Old-Disadvantage, Experiment, Model, Observation, Result, Conclusion “Event Meta-knowledge”: Investigation, Observation, Analysis, Fact, Method, and Other (subdivided into three certainty levels and two source categories, which show whether the information comes from the current study or another source) “Discourse Segment Types”: Fact, Hypothesis, Problem, Goal, Method, Result, Implication, Other-Hypothesis, and Regulatory-Hypothesis.
In capturing these meanings, Liakata and her colleagues relied on a broad range of linguistic ‘clues,’ such as verbal forms and semantic classes, modality markers, deixis, syntactic structures, as well as combinations of these features. For example, they found that in their corpus “experimental goals are often given as a (mostly sentence-initial) clause with a to-infinitive…” often preceding a past-tense methods clause (p. 41). Liakata and co-authors established significant correspondences between the three schemes they used for annotating their three-paper corpus. Yet, from what we know, they have not yet developed a unified schema incorporating the full range of meanings and clues that they talk about in their 2012 papers. So it would be frivolous of us to compare our findings with theirs despite the significant similarities between our approaches.
Our future treatment of computational argumentation will complement existing methods of argument mining, such as, for example, Moens, Boiy, Palau, and Reed’s (2007) automatic detection of arguments in legal texts. This approach, which uses stylistic phenomena and Machine Learning algorithms to automatically detect and classify arguments, appears to be a very feasible approach for us to adopt in extending our analysis of epistemic topoi to the computational domain. Our obvious next step on this way will be to formalize our taxonomy for the purposes of annotation and verify it in terms of inter-annotator agreement against a corpus of NTG articles. We hope to recruit domain experts for our annotator team. Our annotated corpus and annotation guidelines will be made publically available for other researchers interested to test or advance our taxonomy, or to adjust it for other publication domains.
We would expect the specific topoi we have found to remain very robust in other NTG arguments, and probably ophthalmic arguments more generally, but some of them will likely be more consistently present than others. The lexical fields should also prove fairly robust, with some variation of specific lexemes across research genres and argument domains. We would also expect that some topoi would be robust across argument fields but within genres (such as clinical trials).
As a phenomenon of human collective reasoning, argumentation is not a simple object to study, and it will not yield to simple computational tools. We have found a tractable conceptual instrument for computational argumentation, combining semantic, structural, and relational attributes. We are confident that this work can add new dimensions to argument mining. For example, more intelligent systems could extract not just basic propositional content from certain parts of documents (such as in the text, in the title, or among keywords) but also from specific statement types (such as the
Footnotes
Acknowledgements
This publication was made possible with financial support from the University of Waterloo. The authors gratefully acknowledge John Swales’ and Olga Vechtomova’s feedback to the research materials on which this paper is based. We are also wish to thank Argument and Computation’s anonymous reviewers’ rich and helpful commentary on earlier versions of the paper. We are especially grateful for their insights into the current state of argumentation analysis and mining.
Disclosure statement
No potential conflict of interest was reported by the authors.
Appendix 1. Annotated NTG corpus
Appendix 2. Abbreviations
Appendix 3. Epistemic topoi in our corpus
BASIC TOPOI fall into argumentation categories (like problem-solving and decision-making), within which they cluster into smaller topoi classes, like METHOD and RESULTS NARRATIVES.
BASIC TOPOI
COMPOSITE TOPOI32
Unlike BASIC TOPOI, which are monads indivisible into other recurrent topoi in our corpus, COMPOSITE TOPOI incorporate the BASIC ones. In this paper, we do not expand on COMPOSITE TOPOI. For a fuller delineation of the superstructure, with definitions and examples, see Gladkova (2010).
