Abstract
This study investigated what features undergraduate EFL learners perceive as affecting the difficulty of model paragraphs. Four hundred and seventy-five Vietnamese undergraduates participated in a partial least squares structural equation model design. They ranked five paragraphs from easiest to most difficult and responded to a 10-point Likert questionnaire regarding 11 features (titles, paragraph length, vocabulary, vocabulary in context, rhetorical organization, paragraph structure, sentence length, punctuation, signal words, interest, background knowledge). The results showed that eight variables (titles, vocabulary, vocabulary in context, sentence length, rhetorical organization, paragraph structure, interest, background knowledge) had a significant direct effect and four variables (vocabulary, sentence length, rhetorical organization, background knowledge) had mediating effects. The model accounted for 0.508
Keywords
Introduction
Publishers include model paragraphs in textbooks because genre-specific reading has been shown to facilitate students’ writing (Hyland, 2007) in what has been termed the reading-writing relationship (Shanahan & Lomax, 1988): Reading facilitates better writing (Thaiss & Zawacki, 2006). However, students’ opportunity to garner these benefits is hampered if they cannot grasp what they are reading. In short, students cannot learn from what they cannot read (Allington, 2002). As such, educators must consider whether the materials are a good match for intended readers (Baker, 2019), the study of which is known as readability (Gilliand, 1972).
Readability assessment has been widely researched and applied in the past century (DuBay, 2007a) by applying two-factor (semantic, syntactic) quantitative readability formulae to measure texts, as these two features have been shown to be reliable predictors of readability and easily measured (DuBay, 2007b). However, examining only two factors and how they apply to the text, not exploring reader-text interaction, has been repeatedly criticized as overly reductive, as it has become generally accepted that readability assessments should include a consideration of readers’ perceptions of the many features that make up text difficulty (Baker, 2021; Gunning, 2003; Weaver, 2000).
In response, comprehensive lists of features readers perceive as affecting the difficulty of various texts have been offered for use in what has been termed a hybrid procedure: Employing a readability formula is considered a good first step, followed by a second step that includes a subjective consideration of features not measured by readability formulae as this is intended to provide texts that are a good fit for potential readers (Chall & Dale, 1995; Fry, 2002; Gunning, 2003; Meyer, 2003; Weaver, 2000).
Some of these lists have been developed for general texts and native English speakers (NES) (Chall & Dale, 1995; Zakaluk, 1985; Zakaluk & Samuels, 1988). Others have been designed for more genre-specific texts (i.e., model essays) and English as a foreign language (EFL) learners (Baker, 2020). A common thread among these lists is that they are often inspired by previous research that has identified one or more features that contribute to readers’ difficulty with texts.
Literature Review
Second-language (L2) research often finds girding in first-language (L1) scholarly precedents. Aligned with this, L1 discussions began in the early 1900s, and L2 literature followed in the 1970s and 1980s. These theoretical and empirical explorations focused on a sole or a limited number of primary features and a small number of conjoined relationships that affect readability. In keeping with this trajectory, the literature is presented as such.
Titles
Theoretical literature on titles (a descriptor at the top of a text) usually references Bartlett’s (1932) work with schema. Following this, landmark empirical explorations in NES contexts showed titles are facilitative as they forecast the topic of a text (Bransford & Johnson, 1973; Dooling & Lachman, 1971). Research that followed in EFL contexts has reported similar results (Carrell, 1983; Noor, 2006), as readers often first look to titles when approaching a text. The title has also been found to mediate the effect of other features, helping readers to increase interest (Mohammed, 2021), activate background knowledge (Ahmadi, 2011), and anticipate text structure (Bock, 1980) and rhetorical organization (Baker, 2020).
Paragraph Length
Discussions of paragraph length (the number of words in a text) began in the 1890s (Earle, 1890; Lewis, 1894) and have historically shown that longer narrative texts are better comprehended and recalled than shorter ones as additional details (subsidiary sentences) strengthen plots (Keenan et al., 1985; Mandler & Johnson, 1977). Expository explorations, however, have been less conclusive. Some suggest additional details overburden readers’ memory (Reder, 1982; Reder & Anderson, 1980). Others have reported the opposite (Reder et al., 1986). Still, others have shown that readers overestimate their comprehension of shorter texts (Commander & Stanwyck, 1997).
EFL explorations have similarly been contradictory. Several have shown a relationship between narrative text difficulty and length (Gopal & Mahmud, 2019), but others failed to establish a relationship (Jalilehvand, 2012). Similarly, several expository studies have found that text length increases difficulty (Freedle & Kostin, 1991, 1992, 1993; Moon, 2019), while others reported no effect (Lee, 1999; Mehrpour & Riazi, 2004). Mediating effects have also been observed in EFL research. Longer texts, for example, have been shown to contain more vocabulary (Hung, 2017), thus negatively affecting comprehension (Bock, 1980) and interest (Baker, 2020). Moreover, vocabulary in context clues have been reported to be helpful in shorter texts (Shokouhi & Askari, 2010). However, shorter passages provide less context, requiring more reliance on background knowledge. Similarly, reduced text structures require increased reader effort to understand the text (Bae & Lee, 2018).
Vocabulary
Vocabulary (i.e., unfamiliar, abstract, figurative, or technical words) has regularly been cited as a feature that affects readers’ understanding since the early 1920s (Pressey & Pressey, 1921) and a contributing factor to efficient, silent reading and other reading skills (Davis, 1944).
Studies with EFL learners have similarly demonstrated that vocabulary plays a role in text difficulty, showing that EFL learners perceive it to be their greatest obstacle to text comprehension and recall (Kameli & Baki, 2013; Kezhen, 2015; Qian, 2002; Salyer, 1990; Yorio, 1971). Vocabulary has also been found to interact with other features. That is, difficult vocabulary can reduce the assistance vocabulary in context clues provide, be more abundant in longer sentences (Haynes & Baker, 1993), reduce interest (Baker, 2021), and hinder recognition of rhetorical organization (Carrell, 1983).
Vocabulary in Context
Discussions of vocabulary in context (how effectively phrases or sentences that surround unknown words aid comprehension) in NES contexts often begin with Ames’s (1966) categorization of textual clues and how readers use these to infer meaning of unfamiliar words, which in turn facilitates comprehension.
Explorations in EFL contexts have also resulted in classification systems (Bengeleil & Paribakht, 2004; Dubin & Olshtain, 1993) and shown that EFL learners use contextual clues to guess the meaning of unknown words (Ahmad et al., 2018; Cooper, 1999). Some clues, however, have proven more assistive than others. Immediate clues, rather than global ones, have been shown to be more conducive to learning (Haynes, 1993), and clues with limited redundancy and limited or ambiguous references have been shown to be less so (Laufer, 1997). Additionally, there is evidence that learners’ abilities to recognize clues play a part (Shen & Wu, 2009) and improve with training (Davoudi & Nafchi, 2016; Rokni & Niknaqsh, 2013). Vocabulary in context has also been shown to have a relationship with other features but one that is affected by them (see vocabulary, background knowledge, and paragraph length sections of this article).
Sentence Length
Discussions of sentence length (the number of words in a sentence) often begin with Sherman (1893), who stressed that readable materials do “not run in long and involved sentences that cannot readily be understood” (p. 327) and show that longer sentences tend to be more difficult as they often are more complex (including longer clauses, more abstract nouns, verb nominalizations, adjectives, dependent clauses, and adverbials) and the accompanying punctuation (Coleman, 1962; Coleman & Miller, 1968; Glazer, 1974), which can overtax readers’ working memories (McLaughlin, 1969; Mikk, 2008) or generally result in a misunderstanding (McElree, 2000) (e.g., overtax the cognitive load needed for the recognition of paragraph structure and rhetorical organization).
EFL literature has provided similar results (Freedle & Kostin, 1992) but has added that readers’ proficiency can play an important role (Nilagupta, 1977), as much as a .64 correlation (Dwaik, 1997). Sentence length has also been shown to interact with other features. For example, shorter sentences can increase interest (Mikk & Kukemelk, 2010).
Rhetorical Organization
Discussions of rhetorical organization (the way ideas are organized in texts to make them flow smoothly) generally begin with Meyer’s (1975) subclassifications and include more modern typologies (e.g., illustration, process, description, narrative, cause/effect, comparison/contrast, and argumentation/persuasion) (Baker, 2021), but one type can occur in another (Spiro & Taylor, 1980). In general, two sources of difficulty have been noted: (a) the complexity of rhetorical classification/subclassification and (b) readers’ formal schemata, and familiarity with the rhetorical organization employed (Carrell, 1987).
Similar discussions have been offered in EFL contexts. That is, some text types are more complex than others, learners’ awareness can affect understanding (Flick & Anderson, 1980), and rhetorical structures are not constant across cultures (Kaplan, 1966, 2005). Nevertheless, taxonomy explorations have attempted to provide as distinctive a picture as possible (see Alkhaleefah, 2017; Amiri et al., 2012; Baker, 2021; Carrell, 1984a; Freedle & Kostin, 1991, 1993; Goh, 1990; Lei, 2010; Meyer & Freedle, 1984; Putra, 2012; Saadatnia et al., 2016; Salmani, 2010; Sharp, 2002; Talbot et al., 1991; Yali & Jiliang, 2007; Zhang, 2008). However, the results have generally been incongruent due mostly to varying text types under study, methodology, and participants’ reading levels (Baker, 2021). Rhetorical organization has also been shown to interact with other features. That is, it influences students’ use of vocabulary in context clues (Baker, 2020).
Structure
Discussions of text structure (how text is organized) began in the 1970s, showing that narrative texts with a well-defined story grammar facilitate understanding better than those without (Thorndyke, 1975). Likewise, readers expect a clear structure for expository texts. For example, a text with an identifiable topic sentence (Lorch & Lorch, 1985) and one that follows a conventional development structure aligned with the relevant rhetorical style (Britton & Black, 2017; Kintsch & Yarbrough, 1982). And when this is not met, comprehension suffers (Kieras, 1978).
Similar results have been found in EFL contexts. It has been shown that text structure can influence readers’ experiences (Baker, 2020) as learners come with a predisposed schema regarding narrative (Carrell1984b) and expository structures (Ritzer, 1994). Thus, how well a text’s structure meets these expectations can affect comprehension (Carrell, 1992). This relationship can also be influenced by readers’ proficiency (Walters & Wolf, 1986) and awareness and knowledge of conventional text structures (Namjoo & Marzban, 2012; Shemshadsara et al., 2019).
Structure has also been shown to have a relationship with other features, but one where structure is mediated by them (see the title section of this article).
Signal Words
Discussions of signal words (words that indicate the flow of information, e.g., first, next, finally, etc.) often begin with Thorndike (1917), who demonstrated that signal words can facilitate NES comprehension, and continue with Miccinati (1975), who organized them into several categories that have become integral to writing studies courses (Baker, 2021; Van Silfhout et al., 2014). Evidence regarding signal words’ effects on comprehension have been mixed. Some early explorations have reported that signal words support reading comprehension (Miccinati, 1975), whereas others have demonstrated negative effects (Roen, 1984), and still others have indicated no effect (Meyer, 1975).
However, research with EFL learners begins with Le (1969) and has shown mostly positive results (i.e., signal words facilitate understanding) (Aidinlou & Pandian, 2011; Al-Surmi, 2011), as they support a coherent mental representation of clause relations (Xu et al., 2019). However, learner proficiency (Chung, 2000; Kim & Clariana, 2017) and awareness (Baker, 2021; Quan, 2008) have been shown to play a part. Signal words have also been shown to interact with other factors. That is, signal words can aid recall and recognition of rhetorical organization and structure (Baker, 2022; Lorch & Chen, 1986).
Punctuation
Discussions of punctuation (periods, question marks, exclamation marks, commas, colons, semicolons, dashes/hyphens, ellipsis, etc.) begin with Summey’s (1919) handbook, which detailed modern uses of punctuation and how it aids comprehension (Backscheider, 1972). However, the degree of assistance provided is a balance of the presence of punctuation (Neff, 1932) and readers’ understanding of its purpose and standard usage (Carr, 1978; Carver, 1970; Durkee, 1952), without which sentences can be no more than a jumble of text (Hasbrouck et al., 1999).
Similar arguments have been made in EFL contexts, explaining that punctuation is facilitative, but student awareness plays an important part (Abbott, 2006; Alsubaie, 2014; Benitez-Rivera, 2013; Pathan & Al-Dersi, 2013; Shih, 1992; Suliman et al., 2019). Punctuation has also been shown to have a relationship with other features, but one where it is mediated by other components (see sentence length and background knowledge).
Interest
Discussions of interest (an interest in a topic) begin with Hebart’s work from the 1800s (Dewey, 1913), who explained that interest or absence thereof affects learning. Later work reified this point (Lin et al., 1997; Schraw & Lehman, 2001).
Research in EFL contexts has similarly demonstrated that students who express topic interest (Atamturk, 2018) and those who do not may be interested in learning more (Baker, 2020; Erçetin, 2010). Interest has also been shown to have a relationship with other features, but one where it is mediated by other components (see titles, vocabulary, sentence length, paragraph length, and background knowledge sections of this article).
Background Knowledge
Discussions of background knowledge (how familiar students are with the topic) often begin with Kant’s 1781 treatise on schemata (Scaglia, 2020) and Bartlett’s (1932) schema classifications. EFL explorations additionally reference Carrell’s (1983) content schema, which posits that a text does not provide meaning but only provides guidance filtered by readers’ previous background knowledge (Carrell, 1983, 1987; Chau et al., 2019; Florencio, 2004; Ghorbandordinejad & Bayat, 2014; Ha, 2021; Khataee & Davoudi, 2018; Nelson, 1987; Nguyen, 2012; Steffensen et al., 1979; Thao & Son, 2018).
Background knowledge has also been shown to interact with other features: vocabulary (Johnson, 1981; Sheridan et al., 2019); vocabulary in context (Demir, 2012; Johnson, 1982), sentence length and punctuation (Johnson, 1981), rhetorical organization, structure (Carrell, 1987), and interest (Ay & Bartan, 2012; Bugel & Buunk, 1996; Carrell & Wise, 1998; Kelsen, 2016).
Aim of the Study
A review of the extant literature illustrates that explorations of one or more features’ effects on passage difficulty have been undertaken. However, limitations are similarly present, as these investigations have only explored one or a small number of variables and a limited number of mediating relationships. Additionally, research into what primary and mediating features EFL learners perceive as contributing to the difficulty of model paragraphs is noticeably absent. This study is intended to address this combined gap. Pursuant to this aim, two research questions were posed:
RQ1: What factors do undergraduate EFL learners perceive as affecting the text difficulty of model paragraphs?
RQ2: What mediating factor relationships do undergraduate EFL learners perceive as affecting the text difficulty of model paragraphs?
To explicate these two questions, 11 hypotheses (and relevant sub hypotheses) were posed using a transmittal approach, testing path relationships based on existing literature (Nitzl et al., 2022). RQ1 is explicated by the main hypotheses (H1-H11), and RQ2 is explicated by the sub-hypotheses.
H1 There is a significant relationship between titles and participants’ perceptions of text difficulty.
H1a Titles mediate the relationship between rhetorical organization and participants’ perceptions of text difficulty.
H1b Titles mediate the relationship between paragraph structure and participants’ perceptions of text difficulty.
H1c Titles mediate the relationship between background knowledge and participants’ perceptions of text difficulty.
H1d Titles mediate the relationship between interest and participants’ perceptions of text difficulty.
H2 There is a significant relationship between paragraph length and participants’ perceptions of text difficulty.
H2a Paragraph length mediates the relationship between vocabulary and participants’ perceptions of text difficulty.
H2b Paragraph length mediates the relationship between vocabulary in context and participants’ perceptions of text difficulty.
H2c Paragraph length mediates the relationship between paragraph structure and participants’ perceptions of text difficulty.
H2d Paragraph length mediates the relationship between interest and participants’ perceptions of text difficulty.
H2e Paragraph length mediates the relationship between background knowledge and participants’ perceptions of text difficulty.
H3 There is a significant relationship between vocabulary and participants’ perceptions of text difficulty.
H3a Vocabulary mediates the relationship between vocabulary in context and participants’ perceptions of text difficulty.
H3bVocabulary mediates the relationship between sentence length and participants’ perceptions of text difficulty.
H3c Vocabulary mediates the relationship between rhetorical organization and participants’ perceptions of text difficulty.
H3d Vocabulary mediates the relationship between interest and participants’ perceptions of text difficulty.
H4 There is a significant relationship between vocabulary in context and participants’ perceptions of text difficulty.
H5 There is a significant relationship between sentence length and participants’ perceptions of text difficulty.
H5a Sentence length mediates the relationship between rhetorical organization and participants’ perceptions of text difficulty.
H5b Sentence length mediates the relationship between paragraph structure and participants’ perceptions of text difficulty.
H5c Sentence length mediates the relationship between punctuation and participants’ perceptions of text difficulty.
H5d Sentence length mediates the relationship between interest and participants’ perceptions of text difficulty.
H6 There is a significant relationship between rhetorical organization and participants’ perceptions of text difficulty.
H6a Rhetorical organization mediates the relationship between vocabulary in context and students’ perceptions of text difficulty.
H7 There is a significant relationship between paragraph structure and participants’ perceptions of text difficulty.
H8 There is a significant relationship between signal words and participants’ perceptions of text difficulty.
H8a Signal words mediate the relationship between rhetorical organization and participants’ perceptions of text difficulty.
H8b Signal words mediate the relationship between paragraph structure and participants’ perceptions of text difficulty.
H9 There is a significant relationship between punctuation and participants’ perceptions of text difficulty.
H10 There is a significant relationship between interest and participants’ perceptions of text difficulty.
H11 There is a significant relationship between background knowledge and participants’ perceptions of text difficulty.
H11a Background knowledge mediates the relationship between vocabulary and participants’ perceptions of text difficulty.
H11b Background knowledge mediates the relationship between vocabulary in context and participants’ perceptions of text difficulty.
H11c Background knowledge mediates the relationship between rhetorical organization and participants’ perceptions of text difficulty.
H11d Background knowledge mediates the relationship between paragraph structure and participants’ perceptions of text difficulty.
H11e Background knowledge mediates the relationship between punctuation and participants’ perceptions of text difficulty.
H11f Background knowledge mediates the relationship between interest and participants’ perceptions of text difficulty.
To explore the hypotheses and sub hypotheses, a partial-least squares structural equation (PLS-SEM) model was posed and tested. A PLS-SEM design was employed to address the limitations of previous studies (e.g., investigating a limited number of features and relationships), as sole hypotheses are individual conjectures, whereas PLS-SEM enables the investigation of complex models with many constructs and mediating relationships in a causal predictive approach that emphasizes prediction in estimating statistical models where the structure is designed to provide causal explanations of the relationships among constructs (Hair et al., 2018; Wong, 2019).
Methods
Drawing on the extant literature, a PLS-SEM model was developed to explore the two research questions and related hypotheses. The model contained 11 unobservable independent variables (constructs): Participants’ perceptions regarding how 11 features affected their perceptions of the dependent variable text difficulty (TD): titles (T), paragraph length (PL), vocabulary (V), vocabulary in context (VC), sentence length (SL), rhetorical organization (RO), paragraph structure (PS), signal words (SW), punctuation (P), interest, (I), and background knowledge (BK). Drawing further on the literature and resulting hypotheses, indirect mediating relationships were additionally explored.
Setting and Participants
The study was conducted at Van Lang University in Ho Chi Minh City, Vietnam. A nonprobability method was employed to identify the target sample: the entire cohort of Writing II (17 sections) who had completed Writing 1 and can be expected to be familiar with the type of paragraph genre explored in this study, 734 potential participants. Due to Covid-19 infections, 259 students were absent. No follow-up was attempted. Four hundred and seventy-five surveys were collected.
Materials
The paragraphs under study were excerpted from the first-year composition course reference text (Savage & Shafiei’s, 2012 Effective Academic Writing 1: The Paragraph, 2nd ed). The text contains 12 model paragraphs, of which five were purposefully chosen. The number is large enough to provide a wide variety of comparative options for participants to make insightful comparisons but small enough for participants to scrutinize them in a reasonable amount of time to collect meaningful data, without undue participant fatigue (Baker, 2020).
The paragraphs were chosen to be near in difficulty (as measured by the DRP formula) so as not to make the ranking obvious. Each paragraph contained the aforementioned characteristics in varying degrees but was not specifically selected as such to avoid influencing the results (Table 1) (Baker, 2020).
Experimental Procedures
After the paragraphs’ identification, a cline-questionnaire was administered (O’Hear et al., 1992). In the cline phase, participants ranked the paragraphs from easiest to most difficult. To facilitate the sort of decision-making process usually used to make such judgments, the paragraphs were provided in random order without ranking criteria (Chall et al., 1996). The Friedman test was used to determine the means and significance of participants’ rankings.
In the following phase, participants completed a 10-point Likert questionnaire to provide insight into their perceptions of factors contributing to text difficulty. The survey was developed from existing literature (Dörnyei & Taguchi, 2009) and reviewed by field experts (
To attain a high response rate, the cline-questionnaire procedure was conducted during regular class periods (Brown, 2001; Kropf & Blair, 2005). To motivate students to participate, reduce nonresponse bias and missing data, and improve overall response quality, several motivators were addressed (e.g., altruistic motivation and interest) (Singer & Ye, 2013). A token incentive was also provided in the event other motivators were not present (10,000 Vietnam Dong phone card, approximately 40 US cents): Small enough to compensate for time spent and inconvenience while not being unethically compelling (Ripley et al., 2010).
Prior to the study, an exploratory factor analysis (EFA) was conducted with 174 undergraduates with similar demographics and experience with the text. The results showed that each variable loaded well on its factor and that the results were significant (Bartlett’s Test of Sphericity,
Kaiser-Meyer-Olkin Measure.
Bartlett’s Test of Sphericity.
The Cronbach alpha (Cα) for each factor was found to be above .70. This indicated acceptable scale reliability (Table 4).
Cronbach’s alpha.
After the completion of the pilot study, the study was undertaken. Following the data collection, two research assistants independently entered the survey responses into Microsoft Excel CSV files, which were then verified by a third. Several data preparation issues were addressed. (a) missing data (MCAR, expectation-maximization algorithm) (Haziza, 2009), (b) suspicious response patterns (straight-lining or inconsistent answers), (c) outliers, and (d) abnormal data distribution (Hair et al., 2021). Four hundred and forty-one usable responses were set for analysis, a sample size larger than PLS-SEM suggested specifications. That is, the priori sample method (Soper, 2022) and the 10 Item Method (Hair et al., 2017). No participants were excluded based on demographic characteristics. These included participants of varying gender, ages, and year of study (Table 5).
Demographics.
The PLS-SEM outer measurement model was then examined using Indicator Reliability, Internal Consistency, and Discriminant Validity (Fornell-Larcker Criterion, Cross Loading, and Heterotrait-Monotrait Ratio—HTMT). The inner structure model was assessed using Collinearity, Path Coefficients, Mediating Relationships, Explanatory Power (Coefficients of Determination,
Results
The Friedman Test (mean rank table) (Table 6) demonstrated that My Brother’s Game paragraph was reported to be perceived to have the lowest average difficulty, followed by The Long Life of My Grandfather’s Car, The Secret to a Successful Vacation, St. Petersburg, and Something Wild.
Paragraph Cline (Rankings).
A significant difference in the ranking of each essay was shown (χ2(4),
Friedman Test.
The assessment of the outer measurement model included several areas: Indicator Reliability, Internal Consistency, and Discriminant Validity (Fornell-Larcker Criterion, Cross Loading, and HTMT).
An examination of the indicator loadings showed 66 of the 72 indicators were above the 0.70 threshold (Table 8) (i.e., six were not). As such, those below the threshold were removed from the measurement scale: PL5 (0.698), T4 (0.626), T5 (0.653), V4 (0.668), V5 (0.657), and VC5 (0.635).
Indicator Reliability.
Removed from the measurement scale because they were below the threshold of 0.70.
Three methods were used to establish reliability. Cronbach’s alpha and Composite Reliability (CR) (Hair et al., 2017) showed all constructs were above the required 0.70 threshold (Hair et al., 2011). The third indicator, rhoA, showed that all loadings were between 0.70 or higher but lower than 1 (Wong, 2019). Finally, convergent validity was established at the recommended Average Variance Extracted Value (AVE) greater than or equal to 0.50, indicating that items converged to measure the underlying construct (Table 9).
Internal Consistency.
Three methods were used to evaluate discriminant validity: Fornell and Larcker (1981) Criterion, cross-loadings, and HTMT. Fornell-Larcker demonstrated discriminant validity, as the square root of the AVE for each construct was greater than its correlation with all other constructs (Table 10).
Discriminant Validity—Fornell-Larcker Criterion.
Cross-loadings indicated that each indicator loaded strongly onto its parent construct and not on other constructs (Hair et al., 2017). This further demonstrated discriminant validity (Table 11).
Discriminant Validity—Cross Loadings.
The HTMT ratio was assessed at 0.85 or less (Kline, 2011). Results were below 0.85 (Table 12), further demonstrating construct validity.
Discriminant Validity—HTMT.
The inner structural model was evaluated through an examination of several areas: collinearity, significance of the structural model relationship path coefficients, mediation analysis, explanatory power: coefficients of determination (
An examination of collinearity showed a variance inflation factor (VIF) below 5.0 for all constructs (Table 13). As such, no sign of excessive collinearity was found among the predictor constructs (Hair et al., 2021) (Table 13).
Variance Inflation Factor.
Afterward, 11 hypotheses (H1-H11) were examined. These hypotheses queried the direct relationships between 11 independent variables (T, V, VC, SL, RO, PS, P, SW, I, BK) and the dependent variable (TD). This was done to address the first research question: What factors do undergraduate EFL learners perceive as affecting the text difficulty of model paragraphs.
The path relationships were examined using the PLS-SEM two-tailed bootstrap procedure (5,000 samples) using three measures: path coefficients (β),
An examination of the comparative strength of these relationships showed BK had the highest path coefficient, followed by VC, I, RO, PS, SL, V, and T. No significant effect was found for three variables (PL, P, SW). Thus, three hypotheses (H2, H8, H9) were not supported (Table 14).
Direct Relationship Results.
Mediation Analysis
Mediation was explored to address the second research question: What mediating factor relationships do undergraduate EFL learners perceive as affecting the text difficulty of model paragraphs? Mediation was explored by first assessing the indirect effect of the independent variable through the mediating variable to the independent variable (X > M > Y) (Nitzl et al., 2022). If significant, mediation was investigated further. Where the direct effect between the independent variable and the dependent variable was significant (X > Y), partial mediation was identified, as both the mediating (M) and independent path are contributing. In cases where the direct path was insignificant, full mediation was identified, as only the mediating variable (M) had an effect.
Seven sets of sub hypotheses were explored, 26 sub hypotheses in total. These explored how seven variables (T, PL, V, RO, SW, P, BK) acted as mediating variables between seven other variables (VC, RO, PS, P, SL, I, BK) and the dependent variable TD. The significance of the mediating relationships was assessed using a two-tailed bootstrapping (5,000 resamples) and the resulting three measures: path coefficients (β),
H1a-d explored the mediating relationships between T and four variables (PS, RO, I, BK) and TD. That is, whether T mediates the relationship between these variables and participants’ perceptions of TD. H1a-d were not supported. It was found that T did not mediate the relationship between each of the variables (RO, PS, I, BK) and TD. That is, the mediating paths (RO > T > TD; PS > T > TD; I > T > TD; BK > T > TD, respectively) were not significant (Table 15). Hence, no mediating relationships were identified.
Sub Hypotheses H1a-d.
H2a-e explored the mediating relationships between five variables (V, VC, PS, I, BK) and PL. That is, how PL mediates the relationship between these variables and participants’ perceptions of TD. H2a-e were not supported as the results showed that PL did not significantly mediate the relationship between the five variables (V, VC, PL, I, K) and TD. That is, the mediation paths (V > PL > TD; VC > PL > TD; PS > PL > TD; >PL > I > TD; BK > PL > TD, respectively) were not significant (Table 16).
Sub Hypotheses H2a-e.
H3a-d explored the mediating relationships for four variables (VC, SL, RO, I) and V. That is, how V mediates the relationship between these variables and participants’ perceptions of TD. H3a was supported. A partial mediating relationship was identified for variable VC, as the mediating path (VC > V > TD) and the direct path between VC and TD (VC > TD) were found to be significant. No mediation relationships were found between the other three variables (SL, RO, I) and variables V and TD, as the mediation paths (SL > V > TD; RO > V > TD; I > V > TD, respectively) were not significant (Table 17).
Sub Hypotheses H3a-d.
H5a-d explored the mediating relationship between four variables (RO, PS, P, I), SL, and TD, that is, how SL mediates the relationship between these variables and participants’ perceptions of TD. H5a, H5b, and H5d were not supported. For these three variables (RO, PS, I), no mediation relationships were found as their mediating paths (RO > SL > TD; PL > PL > TD; I > SL > TD) were not significant. However, H5c was supported, as mediating path (P > SL > TD) was significant, and the direct path (P > TD) was not significant. Hence, full mediation was identified (Table 18).
Sub Hypotheses H5a-d.
Sub-hypothesis H6a explored the mediating relationship for one variable (VC) and how RO mediates the relationship between VC and participants’ perceptions of TD. H6a was supported. A partial mediation relationship was identified for RO as both its mediating relationship (VC > RO > TD) and the direct relationship between VC and TD (VC > TD) were significant (Table 19).
Sup Hypothesis H6a.
H8a-b explored the mediating relationship for two variables (RO, PS) and how SWs mediate the relationships between these variables and participants’ perceptions of text difficulty. H8a-b were not supported. No mediation relationships were identified for either variable, as the path relationships (RO > SW > TD; PS > SW > TD) were not significant (Table 20).
Sub Hypotheses H8a-b.
H11a-f explored the mediating relationships of six variables (V, VC, RO, PS, P, I) and how BK mediates the relationships between these variables and participants’ perceptions of TD. H11a, H11d, and H11f were supported. Three variables (V, PS, I) showed partial mediation relationships, since the mediating paths (V > BK > TD; PS > BK > TD; I > BK > TD) and the direct paths (V > TD; PS > TD; I > TD) were significant (Table 21). H11e was also supported. A full mediation relationship was identified for one variable (P) as the mediating path (P > BK > TD) was significant, but the direct path (P > TD) was not significant. H11b-c were not supported. No mediation relationships were identified for two variables (VC, RO) as the mediation paths were not significant (VC > BK > TD; RO > BK > TD).
Sub Hypotheses H11a-f.
Explanatory Power: Coefficients of Determination (R2)
The explanatory power of the PLS-SEM model was explained using
The explanatory power of the proposed PLS-SEM predictive model was found to be 0.508

Explanatory power: coefficients of determination (
Predictive Relevance (Q2)
To determine predictive relevance (
Discussion
A review of the extant literature showed that previous investigations had only explored one or a small number of variables and a limited number of mediating relationships. Moreover, what primary and mediating features EFL learners perceive as contributing to the difficulty of model paragraphs was found to be noticeably absent. To address this, this study investigated what factors undergraduate EFL learners perceive affecting the difficulty of model paragraphs excerpted from writing coursebooks. Two research questions were set to explore this, with hypotheses related to each. To assess the results, a PLS-SEM model was posed and tested.
Regarding RQ1 (What factors do undergraduate EFL learners perceive as affecting the text difficulty of model paragraphs), the findings showed that 8 of the 11 direct hypotheses (H1, H3, H4, H5, H6, H7, H10, H11) were supported; that is, eight independent variables (titles, vocabulary, vocabulary in context, sentence length, rhetorical organization, paragraph structure, interest, background knowledge) significantly affected students’ perceptions of text difficulty. Conversely, three hypotheses were not supported (H2, H8, H9). That is, three variables (paragraph length, signal words, punctuation) were not found to have a significant effect.
Regarding RQ2 (What mediating factor relationships do undergraduate EFL learners perceive as affecting the text difficulty of model paragraphs?), seven mediating hypotheses were supported, while 19 were not. That is, four variables were identified as significant mediating variables (vocabulary, sentence length, rhetorical organization, background knowledge), and three were not (titles, paragraph length, signal words).
The PLS-SEM model accounted for 0.508
The findings corroborate, contradict, and further the extant literature. These are organized according to the research questions foci (i.e., direct, mediating relationships), respectively. Titles, for instance, demonstrated a significant relationship with text difficulty, consistent with previous studies (Bartlett, 1932; Carrell, 1983; Dooling & Lachman, 1971; Noor, 2006). Nevertheless, no significant mediating relationships were shown with other variables (rhetorical organization, paragraph structure, interest, background knowledge), contrary to literature that has demonstrated such effects (Ahmadi, 2011; Baker, 2020; Bartlett, 1932; Bock, 1980).
Paragraph length showed no significant direct relationship with text difficulty, a surprising finding contrary to similar research (Freedle & Kostin, 1991, 1992, 1993; Gopal & Mahmud, 2019; Moon, 2019) but consistent with research showing no relationship (Jalilehvand, 2012; Lee, 1999; Mehrpour & Riazi, 2004). Similarly, no mediating relationships were found, contrary to studies that have illustrated paragraph length’s relationship with other variables (vocabulary, vocabulary in context, paragraph structure, background knowledge) (Baker, 2020; Bock, 1980; Keenan et al., 1985; Mandler & Johnson, 1977; Reder & Anderson, 1980; Shokouhi & Askari, 2010). These findings may be attributable to the similarity in the lengths of the selected paragraphs.
Vocabulary was found to have a significant direct relationship with text difficulty, corroborating literature with similar findings (Baker, 2020; Chou, 2011; Kameli & Baki, 2013; Qian, 2002; Salyer, 1990; Yorio, 1971). Furthermore, a mediating relationship with vocabulary in context was found, consistent with literature indicating such relationships (Baker, 2020, 2021; Guo, 2008; Haynes & Baker, 1993). However, mediating relationships with other variables were not found (sentence length, rhetorical organization, interest), contrary to other works (Alkhaleefah, 2017; Baker, 2020; Carrell, 1983; Guo, 2008).
Vocabulary in context was found to have a significant direct relationship with text difficulty, supporting work that has shown that vocabulary in context clues can affect text difficulty (Ahmad et al., 2018; Bengeleil & Paribakht, 2004; Cooper, 1999; Dubin & Olshtain, 1993; Haynes, 1993).
Sentence length showed a relationship with text difficulty, which is consistent with previous work that has demonstrated this relationship (Coleman, 1962; Coleman & Miller, 1968; Freedle & Kostin, 1992; Glazer, 1974; McElree, 2000; McLaughlin, 1969; Mikk, 2008; Nilagupta, 1977). Contrary to previous research (McElree, 2000; McLaughlin, 1969; Mikk & Kukemelk, 2010), no mediating relationships with rhetorical organization, paragraph structure, or interest were found. However, a relationship with punctuation was identified, supporting literature showing that sentences become more difficult as they become more complex (Coleman, 1962; Coleman & Miller, 1968; Glazer, 1974; Sherman, 1893). That is, longer sentences tend to have a more extensive variety of punctuation (Baker, 2020), and students’ understanding can play a part (Durkee, 1952).
It was found that rhetorical organization and text complexity were associated, which is similar to taxonomy research showing that some texts are more complex than others (Alkhaleefah, 2017; Amiri et al., 2012; Baker, 2021; Carrell, 1984a; Freedle & Kostin, 1991, 1993; Meyer & Freedle, 1984; Putra, 2012; Saadatnia et al., 2016). A mediating relationship with vocabulary in context was also identified, supporting work showing that rhetorical organization assists students in guessing clues provided by vocabulary in context (Baker, 2020).
Paragraph structure was found to be associated with text difficulty, supporting research showing students consider how well the text is structured when attempting to understand it (Baker, 2020; Carrell, 1984a, 1984b; Kieras, 1978; Lorch & Lorch, 1985; Ritzer, 1994; Thorndyke, 1975). This is also consistent with work that has illustrated that students’ awareness of structure plays an important role (Namjoo & Marzban, 2012; Shemshadsara et al., 2019).
Signal words did not significantly affect text difficulty, which is in alignment with research that has found that signal words do not affect students’ reading comprehension (Meyer, 1975), but not with research that argues the importance of signal words (Baker, 2021; Van Silfhout et al., 2014) or research that states that signal words affect students’ understanding of texts (Aidinlou & Pandian, 2011; Al-Surmi, 2011; Miccinati, 1975; Roen, 1984; Xu et al., 2019). Furthermore, signal words did not mediate the relationship between rhetorical organization and paragraph structure, contrary to previous literature that argues signal words assist readers in forming a mental representation of the relationships within a text (Xu et al., 2019). This surprising result may be attributable to the limited number of signal words presented in the texts.
Punctuation was not found to have a direct relationship with text difficulty, contrary to research that has illustrated the importance of punctuation (Alsubaie, 2014; Backscheider, 1972; Carr, 1978; Carver, 1970; Neff, 1932; Pathan & Al-Dersi, 2013; Suliman et al., 2019). It was found, however, to be related to sentence length (see above discussion).
Interest was found to have a significant direct relationship with students’ perceptions of text difficulty, supporting work that has demonstrated interest can impact understanding (Erçetin, 2010). Interest was also shown to have a relationship with other features, but one that was mediated by another feature (see background knowledge).
Background knowledge was found to significantly influence text difficulty, confirming previous research suggesting background knowledge influences readers’ comprehension (Carrell, 1983; Chau et al., 2019; Florencio, 2004; Ghorbandordinejad & Bayat, 2014; Khataee & Davoudi, 2018; Nelson, 1987; Steffensen et al., 1979). Background knowledge was also found to have a mediating relationship with several other features (vocabulary, paragraph structure, punctuation, and interest) (Ay & Bartan, 2012; Bugel & Buunk, 1996; Carrell, 1987; Carrell & Wise, 1998; Demir, 2012; Kelsen, 2016; Sheridan et al., 2019). However, no relationship with vocabulary in context or rhetorical organization was found, which is in contrast to Carrell (1983), who argued that background knowledge facilitates inference and can compensate for difficulties with rhetorical organization.
Conclusion
Overall, the study identified eight features that had a significant direct effect on students’ perceptions of difficulty and four mediating variables. Considering these features and relationships as individual hypotheses in isolation, as was done in previous studies, is insightful. However, this investigation, through the unique methodological lens of PLS-SEM, offers a unique contribution to the literature as the resulting complex statistical model provides a holistic explanation of the direct and mediating relationships (Hair et al., 2018; Wong, 2019), which accounted for a moderate to substantial measure (Chin 1998; Hair et al., 2021) of students’ perceptions regarding what features contribute to the difficulty of model paragraphs with a moderate to high predictive relevance.
We hope this model will be useful to various stakeholders, as paragraph difficulty (text readability) is an important but much-neglected topic. These findings are potentially useful to teachers and syllabus designers who wish to understand the appropriateness of materials for learners but might otherwise make decisions on intuitive grounds (Fulcher, 1997), resulting in choosing materials that are too difficult and thus damaging to learning and motivation. Members of the publishing industry and material writers may also find the results useful in creating appropriate-level model paragraphs. Additionally, the results have the potential to inform the research community and further the literature by extending our knowledge of what contributes to readability.
Implications and Suggestions for Future Study
The results of this study (the list of features and their direct and mediating relationships presented and accompanying PLS-SEM model) have practical implications, as they have the potential to be of use to teachers of writing, material writers, and members of the publishing industry and further readability literature. However, they beg several questions that might be explored in future studies.
First, the model explained 50.8% (
Second, the
Supplemental Material
sj-docx-1-sgo-10.1177_21582440231211802 – Supplemental material for A Partial Least Squares Structural Equation Modeling Exploration of EFL Learners’ Perceptions of What Contributes to the Readability of Model Paragraphs
Supplemental material, sj-docx-1-sgo-10.1177_21582440231211802 for A Partial Least Squares Structural Equation Modeling Exploration of EFL Learners’ Perceptions of What Contributes to the Readability of Model Paragraphs by Tuyen Thanh Nguyen, John R. Baker and Thao Quang Le in SAGE Open
Footnotes
Acknowledgements
We would like to thank the Head Editor, editors, and reviewers of the Sage Open Journal for their suggestions and guidance, Dr. James Gaskin for his expertise with PLS-SEM, Katherine Kurowski for her APA reference suggestions, and Luu Thi Thanh An and Thanh Nguyen for their help as research assistants.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Request for funding has been sent to Ton Duc Thang University, Ho Chi Minh City, Vietnam.
Supplemental Material
Supplemental material for this article is available online.
Data Availability Statement
Questions regarding this research can be sent to the corresponding author, Dr. John R. Baker, Creative Language Center, Ton Duc Thang University, 19 Nguyen Huu Tho St, Tan Phong Ward, Dist. 7, Ho Chi Minh City, Vietnam. Email: drjohnrbaker@tdtu.edu.vn.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
