Abstract
Many secondary school students’ second language (L2) speaking skills suffer from deficiencies; the effects thereof are detrimental to their academic and career opportunities in a globalized world that highlights the importance of oral communication skills. Debate has been considered a potentially effective speaking pedagogical tool that can scaffold learning processes in ways that can lead to language development. This study investigates the effect of a debate intervention on English L2 speaking competence of Dutch secondary school students. Following a pretest–posttest control group design, we elicited speech samples from opinion tasks which we coded in terms of measures of speech quantity, fluency, complexity, accuracy and cohesion. Multilevel analysis results indicate that after the intervention, the intervention group produced more language which was more fluent, accurate, coherent and lexically more sophisticated relative to the control group. These findings, which have significant implications for L2 speaking development, are discussed in relation to specific characteristics of L2 debate pedagogy.
Keywords
I Introduction
The ability to speak a second (L2) or foreign language (FL) properly is an arduous task considering the interwoven factors that come into play when acquiring this ability (Richards & Renandya, 2002; Romaña Correa, 2015; Shumin, 2002). Speaking is a cognitively and socially taxing skill; it entails encoding and expressing thoughts in speech streams that make sense and are contextually appropriate (Goh, 2017). The need to conceptualize, formulate and articulate (see Levelt, 1989) thoughts demands a lot of cognitive space in the working memory, a fact that obstructs learners from adequately attending to all aspects of speech (Skehan, 1998).
Richards and Renandya (2002) stated that ‘a large percentage of the world’s language learners study English in order to develop proficiency in speaking’ (p. 201). Learners need to speak English well for better academic and career opportunities in a globalized world that highlights the importance of oral communication skills. Nonetheless, despite its importance, L2 speaking practice has been marginalized in many educational contexts (Baker & Westrup, 2003; Goh, 2017; Thornbury, 2005). As a result, it is no wonder that many studies (e.g. Zare & Othman, 2015) have expressed concern about the speaking ability of L2/FL learners.
L2 speaking in Dutch secondary schools faces similar neglect (Haijma, 2013; West & Verspoor, 2016). Many students believe that speaking is insufficiently trained in class (Fasoglio & Tuin, 2017) and feel frustrated that they are not able to express themselves fully in the target language after many years of instruction (Haijma, 2013; Piggott, 2019). As a result, many secondary school students’ speaking skills suffer from deficiencies, and the effects thereof are more noticeable and detrimental at the university and college stage (Beeker, 2012).
Dutch teachers and students alike have expressed their concern about the current situation. Interviews with Dutch secondary school teachers of English revealed that speaking skills receive the least attention in comparison to other skills in their teaching practice. The teachers ascribed the negligence to the absence of viable teaching tools that would enable their students to effectively practice speaking (see also Fasoglio & Tuin, 2017). In addition, Brown (2009) has maintained that in the FL/L2 context, instructors are confronted with the challenge of finding ways that ensure language development within limited time and budgetary constraints. To overcome these limitations, Brown (2009) proposed employing innovative instructional tools like debates, as their gains ‘can equal if not exceed uptake that occurs in extended immersion environments’ (p. 547).
Debate has been considered a potentially effective pedagogical tool for speaking, which can scaffold and feed the learning process in ways that can lead to language development (e.g. Lustigova, 2011; Stewart, 2003). Speaking occupies the lion’s share of attention during debate. In addition to planned speech, debates involve a lot of impromptu speaking, as debaters have to think quickly and respond to opponents’ arguments, especially during the ‘clash’ stage (see Section IV.1.a).
Various studies have reported improvements in students’ speaking competence after participation in debates. In El Majidi, de Graaff & Janssen’s (2018) study, the debaters perceived that debate improved their speaking skills with a mean of 4.26 on a 5-point Likert scale. All the respondents in the study of O’Mahoney (2015) found that debates honed their speaking skills. Also, the participants in Zare and Othman’s (2015) and Al-Mahrooqi and Tabakow’s (2015) studies reported improvements in their speaking skills as a result of debating. Nevertheless, all the studies that have correlated debate participation with oral competency development were based on self-reported data, questionnaires and interviews in addition to instructors’ observations. Experimental evidence that substantiates the existing anecdotal data is notably absent. The main objective of this study is, therefore, to provide some empirical evidence about the effects of an in-class debate intervention on various aspects of speaking competence, employing a pretest–posttest design. Without empirical evidence, the claimed effects of debate on speaking skills remain groundless. This study is premised on the hypothesis that debate pedagogy constitutes an effective avenue for enhancing many areas of L2 speaking skills. We will discuss the rationale behind this hypothesis in the next section.
II Debate as an effective L2 speaking pedagogy
1 Theoretical grounds
A number of theoretical approaches to L2 acquisition provide a rationale for assuming a potential effect of debates on speaking skills. For example, the interaction hypothesis of L2 acquisition (Long, 1996, 2018) pointed out that interactive tasks, such as conversations, set the stage for the negotiation of meaning and that through this channel input and output are connected in a productive way. Likewise, Ellis and Shintani (2014) maintained that interaction can operate ‘as a source of input and opportunities for output which foster the internal processing that results in acquisition’ (p. 194). Interaction provides learners with multiple opportunities to negotiate meaning and form in ways that lead to L2 development (Loewen & Sato, 2018). Gass and Mackey (2015) concluded that ‘there is a robust connection between interaction and learning’ (p. 181).
Debating involves meaningful multi-level interactions (i.e. interaction with content, learner-learner and learner-instructor interactions). These interactions, which are fuelled and enriched by the competitive atmosphere of debates, assist learners to notice language gaps and accordingly modify and refine their L2 output. Wade (1998) eulogized the efficacy of the debate-induced interactions, stating that ‘there are certainly trends in education which encourage interactive and dialogic pedagogies, but few are as potent as debate’ (p. 63).
The output hypothesis of Swain and Lapkin (1995) provides another theoretical perspective that supports that debates could foster L2 acquisition. They argued that output provides learners with unique opportunities to process language. Output can assist language learning through prompting learners to notice their language gaps, testing out hypotheses (i.e. using forms that are at the cutting edge of the linguistic ability) and reflecting consciously on forms. Engagement in bidirectional output (as is the case in debates) highlights gaps in L2 learners’ interlanguage system and hence facilitates attending to the problematic areas in their language (Swain, 2013; Swain & Lapkin, 1995). Benati (2017) argued that involving L2 learners in structured collaborative output tasks can ‘facilitate the accurate and appropriate use of language forms and structures’ (p. 389).
Debate, by its nature, prompts a great deal of oral output as debaters challenge each other’s perspectives and feel the urge to outshine each other’s arguments and how they frame them (El Majidi et al., 2018). In addition to oral output, debate pedagogy may also induce a considerable amount of written output which can boost the oral output (i.e. speaking skills). Furthermore, the debate environment does not only raise consciousness about linguistic deficiency, but it also stimulates experimenting with new forms and as well as using language consciously (El Majidi, de Graaff & Janssen, 2020).
2 Pedagogical grounds
Speaking activities are commonly considered as ‘PRACTICE activities rather than LEARNING activities’ (Goh, 2017, p. 250). Manchón (2011) hypothesised that the act of writing holds a language learning potential, seeing that ‘composition writing elicits attention to form-meaning relations that may prompt learners to refine their linguistic expression – and hence their control over their linguistic knowledge’ (Cumming, 1990, p. 483). We make a similar assumption about speaking in our debate intervention, as it fits this pedagogical mould. We believe that the act of speaking holds a comparable learning potential.
Debates create a genuine environment for a meaningful, functional and purposeful use of the target language. In debates, students argue with a communicative and functional purpose in mind: defending their proposition and weakening that of their opponents. Attaining this goal necessitates the use of accurate and sophisticated language. As we shall see, in our debate intervention the act of speaking is not an end in itself, but it functions as a vehicle for synthesizing and analysing arguments and as ‘a task through which language practice can be orchestrated’ (Stewart, 2003, p. 15). Anderson (2016) stated that ‘it is hard to imagine a more harmonious integration of content and language skills than in the teaching of debate’ (p. 76). In short, the debate environment promotes the interface and synergy of two speaking perspectives: learning-to-speak and speaking-to-learn perspectives.
Debate activities place students at the center of learning, with the teacher assuming the role of a coach, advisor and facilitator. Blumberg (2009) argued that when students are central in the learning process, they are empowered to gain benefits, such as higher rates of content retention, interaction, enjoyment of class activities and deeper understanding of material. Similarly, Emaliana (2017) pointed out that student-centred learning provides opportunities for a ‘conducive atmosphere of learning, dynamic classroom activities, and [offers opportunities] to do autonomous learning’ (p. 63). Debates lend themselves readily to the philosophy of student-centered pedagogy. A well-designed debate pedagogy grants students tools and power to manage the learning process with minimal interventions on the part of the instructor. In debates, students do most of the talking and thinking, which promotes deep learning (Bellon, 2000; Cinganotto, 2019).
In addition, debates promote a healthy competitive pedagogy that serves language learning in many ways (Cinganotto, 2019; Warner & Bruschke, 2001). The inherent competitive atmosphere of debate fuels students to generate rich and lengthy negotiations. What is more, research has shown that students hold a positive attitude towards debates and describe them as fun and instructive (El Majidi, de Graaff & Janssen, 2015). This positive task attitude is beneficial to learning as recent empirical research has revealed that there is a positive correlation between task attitude and language acquisition (Dewaele, Witney, Saito & Dewaele, 2018).
III This study
Taken together, several theoretical and pedagogical perspectives on L2 acquisition indicate that debate can be a fruitful avenue for oral language learning. Yet, only a limited body of research has investigated how debates can affect oral competence (Omelicheva & Avdeyeva, 2008). Littlefield (2001) noted that this dearth of research is particularly noticeable in the secondary school context. It manifests itself in the fact that ‘very few manuscripts dealing with high school debate have been published in academic journals’ (Littlefield, 2001, p. 83). The paucity of research that Omelicheva and Avdeyeva, and Littlefield have pointed out concerns the L1 context. In the L2/FL context, the debate research is scarce in the extreme (Al-Mahrooqi & Tabakow, 2015). No empirical study, to the best of our knowledge, has examined the impact of L2 debate instruction on oral proficiency across all main dimensions of speech production, including speech quantity, fluency, accuracy, complexity and cohesion dimensions in well-controlled empirical designs. Providing empirical-based evidence may stimulate instructors to employ debates in their teaching practice. The current study was guided by the following research question: What are the effects of debate pedagogy on different aspects of L2 speaking proficiency, including speech quantity (i.e. number of words, speaking time), fluency, syntactic complexity, lexical complexity, accuracy and cohesion in secondary education?
In light of the extant literature on debate, as well as the theoretical and pedagogical grounds discussed above, we hypothesized that debate pedagogy would have a positive effect on the assessed aspects of speaking proficiency.
To test our hypothesis, we conducted an intervention with a pretest–posttest control group quasi-experimental design. The source of data was speech samples elicited through opinion tasks produced by the students on two occasions: at the beginning of the intervention (pretest) and towards the end of the intervention (posttest).
IV Method
1 Setting and participants
The study was conducted in eight intact classes in three public secondary schools in the Netherlands (N = 147), located in three urban areas. Five classes were in their higher general secondary education track (or havo 5 in Dutch) 1 (n = 89), and three classes were in their fourth year of secondary pre-university education track (or vwo 4 Dutch) 1 (n = 58). Five classes served as the intervention group (n = 96) and three as the control group (n = 51). There were 88 female and 59 male students aged between 15 and 18. To ensure the comparability of the groups, both the intervention and control groups consisted of both havo 5 and vwo 4 classes.
The English proficiency level (including speaking) of all classes (including havo 5 and vwo 4 classes) spanned on average B1 (the intermediate level in the Common European Framework of Reference level or CEFR) as estimated by their teachers. With the exception of one intervention class that received on average two English sessions of 50 minutes per week, the other classes received three sessions of 50 minutes. Both groups received regular instruction consisting of activities dealing with the four language skills. For the purpose of this study, the intervention students participated in one debate a week. During that session, the control students received extra regular instruction in which the four language skills were further practised.
a Intervention group
Our debate task design was validated in a previous study following the principles and guidelines of educational design research. Students in the intervention group participated in ten debates, one per week. The topics of debates (e.g. the right to bear arms) were selected in consultation with the debating students, who received one week of preparation time for each debate.
Each debate consisted of three stages: pre-debate, during-debate and post-debate. In the pre-debate stage (prior to each debate), the students received a reading assignment (article) relevant to the topic under debate and were asked to find and read one additional article. We instructed the students to summarize the articles and to write a case 2 in which they had to defend their standpoints.
During actual debates, each student presented a speech and a rebuttal (in which the arguments of opponents were addressed) and participated in a clash. While listening to each other, the students were instructed to note down mistakes and the words they learned from each other’s contributions. We used two debate formats: debating in a group of four debaters (two students in favor and two against) and a one-to-one debating format. All debates had three phases: speech, rebuttal and clash (see Snider & Schnurer, 2006). In the post-debate stage, the teachers provided feedback on the students’ written cases and asked them to revise and resubmit them.
b Control group
For the purposes of this study, while the intervention students were involved in the debate intervention (once a week), the control students received lessons based on coursebooks (during that session) in which the four language skills were further practised. These coursebooks, which were published in the Netherlands, were the same for the intervention and control groups. In each of these coursebook lessons, one language skill is mainly targeted. However, even when speaking is not the main focus of the lesson, it often figures in many lessons, for example, to activate prior knowledge. As to speaking lessons, the control group mainly practiced traditional controlled face-to-face discussions 3 in addition to discussing newspaper articles they read weekly.
2 Procedures
We used so-called ‘opinion tasks’ involving different topics to elicit the students’ oral performance in the pre- and posttest. Opinion tasks (in which students had to argue in favour/against a point of view) induce learners to focus on meaning and are accessible to middle schoolers (Dobbs, 2014). In addition, argumentative tasks are ‘flexible in terms of content as speakers could conceptualize their own arguments relating to the topic’ (Suzuki & Kormos, 2020, p. 161). Furthermore, opinion tasks lend themselves more readily to the elicitation and assessment of cohesion than, for example, narrative tasks.
The opinion tasks in the current study consisted of different accessible topics (e.g. smoking should be banned) (see Malloy et al., 2020). These topics were randomly assigned to students. This means that all students from both groups had the same chances of choosing these topics. We made sure that the students did not receive the same prompt during pre- and posttest. The task prompts and conditions had been previously piloted with similar students and proved to be suitable for our participants. The students were explicitly instructed to provide as many arguments as possible. Before performing the task, seven minutes were allotted to planning, and there was no time limit during the performance. Parental permission forms were obtained prior to the beginning of the intervention.
V Data analysis
To assess the quality of the participants’ speech samples in the pretest and posttest, we used a variety of measures, tapping into different aspects of performance. In the field of second language acquisition, L2 speaking skills have been conceptualized as a composite construct spanning various linguistic areas, including accuracy, fluency and complexity (Skehan, 2009), as well as cohesion (Hyland, 2005; Lee & Subtirelu, 2015). For this reason, we analysed the speech samples produced for indicators of fluency, syntactic and lexical complexity, accuracy and cohesion. In addition, we also took the quantity of performance (the amount of oral output) into account because research has shown that the amount of speech produced by learners can partly reflect their language ability (see, for example, Li, Chen & Sun, 2015). The measures were a mixture of automatically coded features and measures that required hand-coding.
In line with procedures used in previous studies (e.g. Derwing & Munro, 2013; Derwing, Munro, Foote, Waugh & Fleming, 2014), speech samples should last at least 20 seconds for consideration for fluency analysis, after initial hesitations have been removed. To be considered for the analysis of the other measures, we selected a cut-off of 50 words, as speeches of fewer than 50 words do not provide sufficient linguistic information to reliably assess the student’s language aspects relevant to our study (e.g. Crossley, et al. 2015).
The speech samples were transcribed verbatim using PRAAT (Boersma & Weenink, 2016) by the first author and checked three times at intervals of approximately one month. Nonverbal fillers such as ‘eh’ and ‘um’ were transcribed and treated as filled pauses. A pause was defined as silence or a nonverbal filler of 250 ms or longer (De Jong, 2016). After pruning the students’ transcripts by excluding filled pauses, verbatim repetitions, false starts and self-corrections, we segmented them into AS-units (analysis of speech) following the guidelines of Foster, Tonkyn and Wigglesworth (2000). An AS-unit is defined as ‘a single speaker’s utterance consisting of an independent clause, or sub-clausal unit, together with any subordinate clause(s) associated with either’ (Foster et al., 2000, p. 365). All units of analysis and errors were manually identified by the first author.
To check the interrater reliability for the hand-coded measures, a research assistant, who was masked for condition, verified a randomly selected 25% of the data after training and discussion with the first author. Cohen’s Kappa indices for intercoder reliability were high for the assessed measures: .94 for mistakes identification, .98 for mistakes categorization and discourse markers, and .97 for error-free clause identification.
1 Measures
The speech samples were coded for speech quantity, fluency, syntactic and lexical complexity, accuracy and cohesion in the following ways.
a Speech quantity
Following Freed, Segalowitz and Dewey (2004), Li et al. (2015) and Lys (2013), we employed two indices to measure the quantity of oral production: production time (measured in seconds), which refers to the duration of the student’s speech and the total number of words produced in each speech sample.
b Fluency
Tavakoli and Skehan (2005) suggested that three dimensions of fluency – speed, breakdown and repair – best capture the characteristics of different aspects of temporal fluency. Following De Jong (2016), we used the following indices, which we measured via PRAAT software (Boersma & Weenink, 2016):
Speed fluency was operationalized as inverse articulation rate (i.e. mean duration of syllables) (speech time [excluding pauses]/the number of syllables).
As to breakdown fluency, we employed two indices: mean length of pause (number of silent pauses/speech time) and the number of filled pauses (number of filled pauses/speech time).
For repair fluency, we employed two measures: number of repetitions (number of repetitions/speech time) and number of repairs (number of repairs and restarts/speech time).
c Syntactic complexity
Norris and Ortega (2009) have argued that syntactic complexity is multifaceted and therefore should be measured multidimensionally. To capture the multidimensionality of syntactic complexity, we have picked out three measures that tap in three dimensions that were frequently assessed in previous studies:
global complexity (i.e. number of words per AS-unit);
subordination, (i.e. mean number of clauses per AS-unit);
length (i.e. mean length of clauses).
d Lexical complexity
We used two measures of lexical sophistication and one measure of lexical diversity obtained from the computational tool Coh-Metrix (McNamara, Crossley & McCarthy, 2010):
measure of textual lexical diversity (MTLD);
average word length;
word frequency.
We used MTLD index to measure lexical diversity because it is less affected by text length and ‘also allows for comparisons between text segments of considerably different lengths (at least 100 to 2,000 words) and produces reliable results over a wide range of genres’ (McNamara, et al., 2010, p. 69). Word length is widely utilized as an approximation of lexical sophistication and is regarded as an effective predictor of sophisticated vocabulary with longer words indicating more sophistication (Yoon, 2017; Yoon & Polio, 2017). The word frequency index, which calculates the mean logarithmic frequency for all words, describes how often particular words occur in the English language, drawing on the CELEX database (Baayen, Piepenbrock & Gulikers, 1995). A lower word frequency thus indicates higher sophistication.
e Accuracy
To measure accuracy, we have chosen the ratio of error-free clauses, which is widely recognized as a reliable global measure for tracking changes in accuracy (Skehan & Foster, 1999; Tavakoli & Skehan, 2005) and which is also considered suitable for an experimental design (Skehan & Foster, 1999). In addition, we computed ratios for four local accuracy measures tapping into different linguistic categories (see Yoon & Polio, 2017 for the operationalization and examples of the first three measures) 4 :
error-free clauses (EFC);
syntactic errors per 100 words;
morphological errors per 100 words;
prepositional errors per 100 words;
lexical errors per 100 words.
f Cohesion
The present study also casts light on discourse features of speaking performance through investigating the coherence of the produced speech samples. Since the coherence of discourse is enhanced by markers of cohesion (Halliday & Hasan, 1976; Suzuki & Kormos, 2020), which are important in speech as they help interlocutors to interpret the conveyed discourse (Tanskanen, 2006), we decided to investigate the use of a number of cohesive devices which are relevant to argumentative texts, adopting Hyland’s (2005) framework on interactive metadiscourse. Research has revealed that the frequency and diversity of the metadiscourse markers below significantly reflect the quality of argumentative texts (Qin & Uccelli, 2016; Uccelli, Dobbs & Scott, 2013). Following Hyland’s procedures, we coded speeches for four types of organizational markers in addition to their diversity of type (see Dobbs, 2014; Qin & Uccelli, 2016; Uccelli et al., 2013) and token (El Majidi et al., 2020):
frame markers: markers that mark the sequence of arguments or counterarguments (e.g. firstly, secondly);
code glosses markers: markers that introduce an example or paraphrase (e.g. for instance, in other words);
transition markers: markers that mark additive, adversative or causal relations between clauses and paragraphs (e.g. besides, because); temporal markers and the coordinating conjunction and were excluded since they are less associated with quality (Dobbs, 2014);
conclusion markers: markers that introduce a summary or conclusions (e.g. to sum up, all in all);
markers diversity token: diversity of markers in terms of token;
markers diversity type: diversity of markers in terms of token.
2 Statistical analysis
As our participants came from different classes within different schools, our data were structured hierarchically. We therefore applied multilevel linear model analyses (MLM). The multilevel procedure enabled us to explicitly model possible dependencies in the data. In this study, we used a two-level hierarchical linear model to account for the multilevel data structure with students nested within classes. We modeled the independent variables (time and condition) as fixed effects and random variations across students and classes as random effects.
To establish the effectiveness of the debate intervention, we need to take into account the combined effect of both main factors. This means that we need to focus on the interaction of time (pre vs. post) × group (intervention vs. control group). For these reasons, we limit ourselves to reporting interactions.
VI Results
To test our hypothesis, we obtained two scores (pretest and posttest) for each learner’s speech sample produced in each condition (intervention and control groups) and for each measure. Table 1 presents the descriptive statistics (means and standard errors) for each measure at pretest and posttest. What can be noticed at first sight is that the means improved on most measures at the posttest in the predicted direction.
Means and standard errors of outcome variables across time and condition.
To test whether there were statistically significant differences between the two groups’ performance after the intervention, we conducted MLM analyses. Where statistically significant results were achieved (p < .05), Cohen’s d effect sizes (ES) are provided. The MLM results are presented in relation to each measure in Table 2. The results show that the intervention students made a significant improvement in both quantity measures: quantity of speech (F(1, 277.886) = 31.1; p < .001) and number of words produced (F(1, 279.881) = 33.9; p < .001), with large effect sizes.
Multilevel analysis results.
Note. numerator df = 1.
With regard to fluency measures, the intervention students showed a significant improvement in inverse articulation rate (F(1, 217.650) = 4.1; p = .022), with a moderate ES (.52). The other fluency measures did not reach the level of significance.
As to syntactic complexity, no significant differences were observed between the two groups at posttest. However, as to lexical complexity, the intervention group significantly outperformed the control group in terms of word frequency (F(1, 201) = 3.6; p = .030), with a moderate ES (.58). The other two lexical complexity measures, however, fell short of significance.
With regard to accuracy, the intervention students made fewer errors after the intervention relative to the control students. We found significant differences for error-free clauses (F(1, 194.296) = 5.5; p = .020), with a moderate to large ES (.71) and for lexical errors (F(1, 194.987) = 4.8; p = .015), also with moderate to large ES (.65). No significant differences were found for morphological, syntactic and preposition errors. Note that the distribution of error types did not change greatly over the intervention period.
MLM analysis also revealed that the intervention students showed significant improvement in many aspects of cohesion compared to control students. They made significant gains in terms of the use of frame markers (F(1, 194.886) = 15.5; p < .001), with a large ES (1.17), conclusion markers (F(1, 200) = 5.3; p = .011), with a moderate to large ES (.70), diversity of type (F(1, 200) = 20.5; p < .001), with a large ES (1.38) and diversity of token (F(1, 200) = 12.7; p < .001), also with a large ES (1.08).
VII Discussion
The goal of the current research was to provide insights into the effects of debating as an instructional approach on L2 oral proficiency. The main finding of the study was that the debate group scored higher relative to the control group in most variables tested, with many differences reaching significance. These results generally confirm our hypothesis and are in line with previous related studies which suggest that debate-based instruction leads to improved speaking performance (e.g. El Majidi et al., 2018; Zare & Othman, 2015). However, these studies were based on debaters’ perceptions and instructors’ impressions, lacked control groups and were conducted in higher education. Therefore, the current study provides empirical evidence that substantiates these claims. It is noteworthy that in a recent empirical study (El Majidi et al., 2020) investigating the effects of debate pedagogy on L2 writing proficiency, we reported results which, broadly speaking, are comparable to the effects obtained in this study. Though that study assessed effects on writing competence, its comparable findings bolster extra support for the reliability of the results ensuing from the current study. We will now discuss our findings with regard to each measure.
1 Speech quantity
The intervention students showed substantial growth in quantitative measures of oral performance. The notable increase in overall productivity is an indicator of progress in their language ability as ‘students [become] more certain of their skills and [have] more to say’ (Lys, 2013). This improvement also attests to development in the argumentative competence of debaters. For extensive coverage and discussion of this aspect, see El Majidi, Janssen and de Graaff (2021).
2 Fluency
The participants in the intervention group showed significant improvement in the speed measure of fluency. Some of the features of the debate-based intervention seem to promote fluency. For example, the ‘performance aspect of actually doing something in real time’ (Schmidt, 1992, p. 359), emphasis on meaning-making (Gatbonton & Segalowitz, 2005), pre-task planning (e.g. Ellis, 2003) task repetition (e.g. Goh, 2017) and close interconnection between the written and oral modalities (Blake, 2009) are all argued to promote fluency. In addition, the intervention enabled sustained practice which has apparently contributed to the learners’ development of a degree of automatization in their performance. Improvement in vocabulary seems to increase the pace of their oral production. When appropriate lexical chunks are readily available, fewer searches are needed, therefore accelerating the formulation process resulting in greater fluidity in oral production.
Other aspects of fluency – breakdown fluency and repair fluency – did not improve. Earlier research has demonstrated that development in breakdown fluency (e.g. Tavakoli, Campbell & McCormack, 2016) and repair fluency (e.g. Huensch & Tracy-Ventura, 2017) is less sensitive to interventions, and they do not necessarily reflect L2 proficiency (Duran-Karaoz & Tavakoli, 2020). Measures of pausing (De Jong, 2018; Duran-Karaoz & Tavakoli, 2020) and repair (Baker-Smemoe, Dewey, Bown & Martinsen, 2014; Duran-Karaoz & Tavakoli, 2020; Huensch & Tracy-Ventura, 2017) reflect more personal speaking styles carried over from L1 than L2 proficiency. According to De Jong (2018), ‘articulation rate (or its inverse average syllable duration) seems to be a measure of fluency that best reflects L2-specific fluency’ (p. 250).
3 Syntactic complexity
The debate intervention did not lead to more syntactic complexity in the produced discourse. However, our recent study (El Majidi et al., 2020) demonstrated that debate pedagogy exerted some impact on the syntactic complexity of written production. Hwang, Jung and Kim (2020) suggested that L2 learners ‘experience greater processing burdens in spoken than in written production’ (p. 272), and that hinders them from producing syntactically complex structures in their spoken production. The lack of improvement in syntactic complexity could also be due to a trade-off effect, as debaters could have focused in their oral production on fluency, accuracy and lexical complexity at the expense of syntactic complexity (Skehan, 1998). The lack of improvement in syntactic complexity because of a possible trade-off effect is consistent with the findings of previous studies (e.g. Hsu, 2019).
4 Lexical complexity
The intervention group outperformed the control group in two measures of lexical complexity: measure of textual lexical diversity (MTLD) and word frequency, with the latter reaching significance. MTLD and word frequency have been regarded as important indicators of lexical proficiency, as learners who produce less frequent (Crossley, Salsbury & McNamara, 2015) and diverse vocabulary (Crossley, Salsbury & McNamara, 2009) are judged to be more lexically proficient. Word frequency measure is also associated with breadth of lexical knowledge, as learners who produce infrequent words are expected to have knowledge of a greater number of words.
The positive effect of in-class debates on lexical complexity might be attributable to the lexically rich environment of debates. The students read articles prior to each debate. By means of reading, the participants seemingly gained access to words and structures of which they were unaware. Importantly, case writing created an opportunity to employ the newly learned words and hence consolidated their retention. During actual debates, the debaters also used some of these words, a fact that further reinforced their grip on these lexical gains. This cyclic lexical process, which recurred during each debate, enabled debaters to incrementally build an extensive, diverse and sophisticated vocabulary.
5 Accuracy
The debate intervention seems to yield some benefits with respect to accuracy. Although learners in both groups showed some improvement, the intervention group made more significant improvements in two of the accuracy indices. The greater improvement made by the intervention group may have resulted from their exposure to feedback in the context of the debate-based lessons. The intervention group received feedback on their written cases, and during debates they were instructed to note down the mistakes their classmates made and improve them. In addition, the instructors occasionally discussed some of the mistakes commonly made during debates. It seems that the debaters benefited from the recursive cyclic processing of feedback. This cyclic processing of feedback, which recurred in each debate, enabled the debaters to carry over the accuracy gains from one debate to the next. In this way, gains piled up and led to better monitoring (processes) of online speech production. Furthermore, the combination of the written and oral production modalities could have heightened debaters’ attention to language forms (see Niu, 2009).
The debate environment also seems to enable a meaningful processing of feedback, as accurate language is essential for the persuasiveness of the adduced arguments. In other words, accurate form in this environment is functional. Furthermore, we believe that the presence of an audience (proponents and critical opponents) prompted students to pay extra attention to their language production to avoid making embarrassing grammatical mistakes. Research has revealed that the sense of audience awareness stimulates learners to attend to the quality of their language output (Chen, 2019).
6 Cohesion
The current study has also tracked the effect of the debate intervention on a set of organizational markers that explicitly signal structural organization within an oral discourse. These organizational markers contribute to crafting a coherent and interrelated discourse by connecting speakers’ arguments to each other.
In the posttest, the debaters produced speech samples that demonstrated significant growth in terms of different cohesion areas. It is important to mention that the students in the intervention group were asked to use connectives in their written cases during the intervention. Apparently, via this channel organizational markers found their way to the oral discourse. Importantly, the debate discourse prompts debaters to explicitly organize their discourse and highlight the relationship between the arguments presented. For example, adducing different arguments entails the use of transition and frame markers to signal progression and a shift from one argument to another. The debate environment presumably infused into the debaters the conviction that making explicit relations between arguments is essential for their discourse to be persuasive. What is more, the engaging and challenging nature of the debate environment may also have contributed to the enhancement of this performance area (Cinganotto, 2019; Crosson & Lesaux, 2013). From this, we can conclude that the debate context seems to afford fertile ground for the development of organizational markers.
7 Factors reinvigorating debate pedagogy
In this section, we will discuss the main factors that seemingly render debate pedagogy effective for L2 speaking development. It could be argued that the intervention students in the present study developed their speaking performance more simply due to the extra amount of practice time they received. Though the control group also practised speaking, the intervention group, quantitatively speaking, practised more, and this may have honed their oral performance. However, although we acknowledge that there is a possible ‘practice’ effect, we argue that this effect has not solely led to the emerging findings (see Lys, 2013). There are many relevant intervention-related factors that have arguably given rise to the findings of the current study.
The debate task design and environment possess features and involve procedures that theoretically and empirically were proved to affect the processes of complex, accurate, coherent and fluent production. The debate task design involves, for example, task repetition. Task repetition ‘involves the repetition of the exact same task but may also be a repetition of the same task procedure with different contents’ (Goh, 2017, p. 249). In our intervention students (at least many of them) rehearsed their prepared cases at home prior to their delivery in class, and the debate task procedures were repeated with ‘different content’ (different debate topics) throughout the whole intervention. When students repeat a task (procedure), they free up attentional resources and hence boost their performance by facilitating a speedy retrieval of words and integrating knowledge and skills gained from the first performance. As a result, partial automatization of some speech processes take places, thereby directing attention to other aspects of speech (Bygate, 1996; Goh, 2017).
Another feature of our debate task design, which has possibly led to the obtained results, is the integration and synergy of the four language skills which have presumably scaffolded oral production in many ways. Prior to each debate contest, students read preparatory articles on which they drew to write cases and which, in turn, fed oral production during debates. The integration of these skills – or rather reading-to-write-to-speak pedagogy (where reading is used as input for writing, and writing is used as input for speaking) – substantially boosts speaking. The combination of reading, writing and speaking is assumed to engineer a powerful pedagogical configuration that draws more attention to language forms than oral production by itself, and it offers great language learning opportunities.
We are not aware of an earlier study examining the potential of teaching tools that combine these three skills. However, evidence that points to this pedagogical formula leading to positive learning outcomes can be adduced from reading-to-write and writing-to-speak pedagogies. Hirvela (2016) has pointed out that ‘reading . . . provides writers with essential material to write with and about’ (p. 49). It seems that the learners profited from the recurrent process of reading and writing, which allowed them to employ the newly learned words in the subsequent cases. Earlier research has demonstrated the effectiveness of L2 reading-to-write pedagogy, as extensive reading helps students to improve their vocabulary (lexical complexity) and to write better (e.g. Hirvela, 2016).
As to writing-to-speak pedagogy, Belcher and Hirvela (2008) have stated that writing involves a slow cognitive processing of language which promotes explicit noticing, and through repeated use of explicit knowledge, implicit knowledge incrementally increases; this results in an increased automatic oral fluency and accuracy (Adams & Ross-Feldman, 2008). In addition, ‘writing activities before speaking activities . . . allow learners to apply features of their written production to their spoken production [and help] learners efficiently retrieve more varied and complex syntactic structures from their writing experience’ (Hwang et al., 2020, p. 279). Also, because writing allows offline planning, it stimulates learners to try out new forms (Williams, 2012) and facilitates increased monitoring of output (Ellis & Yuan, 2004). As such, writing allows working out and reflecting upon the intended message before delivering it in spoken interaction. Having used a particular form in planned writing enhances its retention, and thereby increases the likelihood of its easy retrieval and use in unplanned spontaneous oral production (Williams, 2012). In this way, gains in writing transfer to speaking skills. The importance of writing for fluidity of spoken language has been documented in a number of studies (Freed et al., 2004; Rubin & Kang, 2008).
In addition, the interactions promoted by the debate environment harness listening to serve speaking. Through critically listening to the output of their peers, debaters notice gaps in their output. This awareness can induce learners to scan future input to find remedies for these language gaps (Williams, 2012).
VIII Limitations and future directions
Our study has some limitations that are worth noting. The findings in this study were based on a single task type (i.e. opinion tasks). Since performance tends to vary across task types and genres, future studies should consider employing a wider variety of different tasks to gain deeper insights into the observed effects and their transferability. In addition, this study elicited data from two time points (pre- and posttest). A delayed posttest (i.e. a third time point) would provide insights into the durability of the performance patterns observed.
The present research has not examined the impact of debating on pronunciation, which is an important dimension of speaking performance. Its clarity, for example, is deemed to be an important feature of comprehensibility (Suzuki & Kormos, 2020). Some debaters in a previous study (El Majidi et al., 2018) reported that they felt that their pronunciation had improved after participating in debates. Future research may study the impact of debating on this construct to provide a more complete picture of the impact of debate pedagogy on speaking proficiency. Last but not least, our sample is not representative of all secondary school students, as it mainly includes relatively high-achieving students. More research with a wider range of language proficiency groups is warranted to further broaden our understanding of the potential of debate-based instruction for L2 speaking development.
IX Conclusions
The present study has offered empirical evidence that debate can play a facilitative role in enhancing L2 speaking proficiency. After the intervention, the debating students produced significantly more words delivered at faster speed and produced a speech that is more accurate, lexically sophisticated and coherent than the students in the control group.
These gains could be attributed to the iterative process of successive waves of input and output the debaters went through as well as to the deeper language processing in which the debate context engaged the debaters. Debate pedagogy seems to trigger deeper and more elaborate processing of the content and form, which helped the students to achieve greater proficiency and automaticity over time. This depth of processing is particularly promoted by the reading-to-write-to-speak pedagogy facilitated by the debate task design. This pedagogy enables a synergetic and harmonious collaboration of the language skills which lead to exposing learners to rich and authentic L2 input and provides socially oriented opportunities for meaningful, authentic, purposeful and goal-oriented interaction.
We hope that the potential of debate for enhancing speaking proficiency in instructed L2 acquisition will be recognized and will hence be included in L2 curricula. We also hope that we have made some modest contribution to the current discussion about teaching speaking through providing insights into some processes and task features that stimulate and scaffold L2 oral proficiency.
