Can Syntactic Complexity Distinguish Translator Style?—A Novel Stylometric Analysis of Hongloumeng with Dependency Grammar

Abstract

The present study is likely the first to investigate translation style through the lens of dependency distance. While translator style has been explored using various indicators—such as STTR, LR, MSL, and typical language patterns like reporting verbs and loan words—prior research has rarely accounted for syntactic structures, particularly syntactic complexity, which has been widely discussed in fields like second language acquisition. Dependency distance is considered a valid measure of syntactic complexity, and studies have demonstrated that its ability to differentiate between translational and original English. Given this, it can be hypothesized that translation style could also be distinguished by dependency distance. Accordingly, this study examines the translator styles of four English versions of the Chinese classic novel Hongloumeng. The results support the hypothesis, showing that dependency distance can effectively distinguish translator styles. Moreover, the findings suggest that dependency direction should also be considered when analyzing translator style.

Keywords

Translator style stylometric analysis syntactic complexity dependency distance dependency direction

Introduction

Style, as defined by Leech and Short, refers to “the way in which language is used in a given context, by a given person, for a given purpose” (1981, p. 10), suggesting that style arises from a series of deliberate linguistic choices. Mona Baker expands on this, describing style as “a kind of thumb-print that is expressed in a range of linguistic—as well as non-linguistic—features” (2000, p. 245). In the context of translation, style is often analyzed from two perspectives: translation style and translator style. Regarding translator style, Baker argues that the study “must focus on the manner of expression that is typical of a translator…[and] must attempt to capture the translator’s characteristic use of language, his or her individual profile of linguistic habits, compared to other translators” (2000, p. 245). From Baker’s perspective, analyzing translator style can reveal a translator’s unique linguistic habits, or idiolect, distinguishing them from others and defining their identity as an individual translator. Furthermore, translator style sheds light on how different translators interpret and represent the same author in various translations. As Munday (2007, 2008) has demonstrated, translator style can also reflect the ideology of the translator, a particular community, or even a nation.

Traditionally, translator style has been assessed through intuitive analysis of examples. For example, Zhao and Sun (2004) explored how idiolect translation influences translator style in literary translation, while Yao (2009) examined translator style by analyzing wordplay in two versions of Hongloumeng. While this kind of non-quantitative analysis can help distinguish translators, especially when significant linguistic differences exist, it becomes less effective when the differences are subtle. In such cases, corpus-based stylometric analysis offers distinct advantages.

In corpus-based translation studies (CTS), translator style has been examined using methodologies that rely on corpus statistics such as STTR (standard type-token ratio) ASL (average sentence length) or MSL (mean sentence length), as seen in Baker (2000). With the development of corpus linguistics, additional statistical indicators have emerged for exploring various topics in the humanities and social sciences, such as MATTR (moving average type-token ratio) by Covington and McFall (2010) and idea density by Covington (2007) and Brown et al. (2008). These indicators, like STTR, ALS, or MATTR, generally focused on surface-level linguistic features. Few studies, however, have investigated translator style through statistical analysis of deeper linguistic features, such as syntactic complexity. Surface-level features like ASL can indeed differentiate styles, but sentences of similar length may exhibit fundamentally different grammatical structures or relations. These differences can affect readers’ interpretations and reveal a translator’s characteristic linguistic choices in subtler ways, forming an implicit “thumb-print” of translator style.

Therefore, this study aims to identify translator style through implicit linguistic indicators by analyzing dependency relations in different translations.

Literature Review

Corpus-based Studies of Translator Style

Since the early 2000s, corpus analysis has emerged as an effective method for examining translator style, largely due to advancement in computer technology and natural language processing. Various analytical approaches have been adopted to investigate translator style using indicators that reveal distinct linguistic features.

One approach has focused on lexical features, utilizing indicators such as AWL (average word length), TTR (type-token ratio), STTR, LD (lexical density), LR (lexical richness), and LI (lexical idiosyncrasy). For instance, Baker (2000), one of the earliest corpus-based studies on translator style, proposed a methodology that incorporated TTR and STTR as statistical indices. Numerous subsequent studies followed Baker’s lead, including TTR and STTR in Li et al. (2011); STTR and LD in Wang and Liu (2012); STTR and MSL in Huang and Chu (2014); and TTR and STTR in Liu et al. (2011). These studies reflect early effort to explore translator style using the then-emerging corpus-based approach. By calculating values for basic lexical-level indicators, scholars were able to uncover linguistic features that are difficult to detect through traditional, intuitive methods.

Another line of analysis explored typical language patterns to uncover a translator’s linguistic habits. For example, Baker (2000) examined the use of reporting structures, specifically the use of the verb “say” in its various forms by different translators. Following Baker’s example, Liu and Yan (2010) analyzed the translation of reporting verbs headed by the Chinese character dao (道). Olohan (2003) investigated the use of contractions such as can’t, don’t, and it’s. Winters (2004a, 2004b, 2007, 2009) included modal particles, load words, and report verbs as indicators of translator style. Mastropierro (2018) suggested that “key clusters” might serve as potential indicators of translator style (p. 256), while Saldanha (2011) examined the use of emphatic italics, foreign words, the connective that, and reporting verbs.

While these studies also examined translator style through lexical-level linguistic features, they focused more on identifying typical or unique language patterns characteristic of a translator rather than relying sorely on statistical indices like STTR and LD. This approach expanded the scope of corpus-based analysis, opening up new possibilities for investigating translator style. As more indicators were employed and new ones were developed, this line of inquiry contributed significantly to the evolving understanding of translator style through corpus analysis.

Some studies have included syntactic features as indicators of translator style, such as ASL and SS (syntactical sequence). In pioneering research, Baker (2000) examined ASL alongside other indicators, while Wang and Liu (2012) also incorporated ASL in their exploration of translator style. Ge (2020) extended this line of inquiry by examining syntactic features such as sentence length, paragraph length, sentence type, and the degree of hypotactic conformity as indicators of translator style. These studies expanded the scope of corpus-based analysis beyond the lexical level to the syntactic level, providing a more comprehensive understanding of translator style. However, the syntactic-level analyses have generally been limited to basic corpus indicators, such as average sentence length and paragraph length.

In the studies on translations of Hongloumeng, researchers have also used corpus-based analysis to investigate translator style. For instance, Sun and Liu (2025) investigated the stylistic nuances of narration and dialogue in the Hawkes and Yang translations of Hongloumeng using indexes of syntactic complexity. Their analysis revealed that the Hawkes translation employed longer linguistic units in narration and exhibited a higher frequency of subordinations in dialogue. Similarly, Kwok et al. (2024) examined the translation styles of the Hawkes and Yang versions using the activity index from quantitative linguistics. They found that the Hawkes translation exhibited a higher level of activity compared to the Yang version, displaying greater activity in fictional dialogues and greater descriptivity in fictional narration. Chou and Liu (2024) also explored the stylistic features of speech and narration in the two translations, but adopted a multidimensional approach to capture more nuanced patterns. In a related study, Tan (2024) conducted a multi-perspective stylometric analysis of the joint translation of Hongloumeng, identifying notable stylistic differences between the sections translated by David Hawkes and those by John Minford. Distinct from these stylistic investigations, Hu et al. (2025) applied entropy-based features and machine learning algorithms for translator attribution of the Hawkes and Yang translations.

Further extending this line of inquiry, Liu et al. (2022) analyzed the translation styles of two Hongloumeng translations, focusing on the use of hedges and boosters in fictional dialogues. Their study revealed that the Hawkes version employed more hedges and boosters than the Yang version, suggesting a more nuanced and interpersonally engaged style. Similarly, Liu and Afzaal (2021) conducted a corpus-driven analysis of translator style in the two English versions of Hongloumeng, using three-word and four-word lexical bundles as indicators of stylistic preference. Zhang and Liu (2014) examined Joly’s English translation of Hongloumeng to assess its stylistic consistency, employing both lexical and syntactic indicators such as average sentence length, reporting verbs, and endearing terms. In a related study, Hou (2013) explored the translator styles of Joly and the Yangs by focusing on nominalization, illustrating how each translator developed a distinctive style through this particular syntactic feature.

These studies have significantly enriched our understanding of translator style in Hongloumeng through both lexical and syntactical analyses. However, while lexical features have been widely examined, in-depth exploration of deep syntactic-semantic features—particularly those based on dependency syntax—remains limited.

As mentioned earlier, scholars have employed various types of indicators to reveal translator style, and most studies tend to incorporate both lexical and syntactical indicators rather than focusing on just one. For example Altamimi (2016), Wang and Liu (2012), and Baker (2000) used a combination of indicators in their analyses.

Among studies on translator style, some have investigated sentence-level linguistic features, such as ASL. While a few have explored syntactic complexity, there remains a notable gap in examining deeper syntactic-semantic relationships as indicators of translator style. Specifically, current research seldom utilizes syntactic perspectives that capture nuanced syntactic-semantic distinctions, which could enrich the analysis of syntactic complexity in translated texts. Dependency grammar, as a robust framework for analyzing such features, offers valuable tools to uncover these underlying patterns. In particular, it holds significant potential for advancing the study of translator style and translation universals (Han et al., 2019, p. 92), especially in nuanced cases such as stylistic variations in translations of Hongloumeng.

Studies of Stylometry

Stylometry, also known as computational stylistics, was first developed in the 19th century (Liu & Xiao 2020, p. 32). As a quantitative methodology, it has been widely employed in corpus linguistic research. Stylometric analysis, which relies on “the quantifiable features and structures of a text to study the style and author’s writing habits” (Liu & Xiao, 2020, p. 32), differs from traditional stylistic research that often depends on “the reader’s feelings and understanding of the words, sentences, and paragraphs of the text” (Liu & Xiao, 2020, p. 33). By leveraging quantifiable data, stylometric analysis provides a more scientific approach, as long as the linguistic features analyzed are valid and appropriate to the research questions at hand. Due to its robustness, this methodology is used in various fields, including authorial attribution (see Collin et al., 2004; Dooley & Ramirez, 2009; Holmes, 1995; Schöberlein, 2017; Zhu, Lei & Craig, 2020) and style analysis (see Kestemont et al., 2015; Klaussner et al., 2015).

Compared to studies that apply stylometric analysis to investigate original works by examining linguistic features at different levels, from character and word to sentence, paragraphs, grammar, and semantics, there are far fewer studies aiming to reveal the stylistic features of translations, address issues of “translator-ship,” distinguish translator styles, or detect textual similarity in translated texts. Nevertheless, scholars are increasingly applying stylometric methodologies in translation studies to examine stylistic features of translated texts. For example, Olohan and Baker (2000) investigated the use of the reporting structure “say/tell + that” in literary texts using data from the Translational English Corpus (TEC), revealing that “that is used more with simpler syntactic constructions in TEC than in the BNC” (p. 157). Winters (2004a, 2007, 2009) explored the translator’s style in German translations of F. Scott Fitzgerald’s novels, while Huang and Chu (2014) analyzed Howard Goldblatt’s translation of Chinese novels, proposing a multiple-complex model for comparing translator styles by considering both source-text and target-text styles. Covington et al. (2015) distinguished 10 English versions of the Bsible through stylometric measurements of ASL, vocabulary diversity, idea density, and clustering, concluding that “quantitative stylometry and clustering can produce classifications that reflect or reveal literary history, even when the only literary creativity involved is translation” (p. 325). Brashi (2021) examined style shift in the Arabic translation of Susan Glaspell’s play Trifles, finding that the style shift was demonstrated through “a number of linguistic phenomena, namely contraction, elision, subject-verb agreement, and figurative multi-word expressions” (p. 79).

The use of stylometric methodologies has advanced the study of literary works’ styles, enabling researchers to uncover features and aspects of both original and translated texts that are not easily identifiable through traditional close reading. These studies also provide a framework for the present research on examining the translator style of Hongloumeng through stylometric analysis.

Studies of Dependency Relation

Analyses of dependency relations have been conducted in various contexts. One area of research involves the typological analysis of languages, where dependency distance and/or dependency direction are used as variables (see Bi & Tan, 2024; Jing & Liu, 2017; Liu, 2010; Liu & Xu, 2012; Lei & Jockers, 2018). These studies have shown that dependency distance and direction can serve effectively as quantitative indices for language categorization.

Another line of research explores the cognitive mechanisms involved in language processing, specifically how dependency distance affects cognitive effort in code-switching and interpreting (see Liang et al., 2017; Wang & Liu, 2013, 2016).

A further line of inquiry examines the tendency for changes in dependency distance. Research in this area has found that dependency distance in human languages tends to decrease, a phenomenon labeled as dependency distance minimization or dependency length minimization (see Futrell et al., 2015; Lei & Wen, 2019; Temperley, 2007, 2008).

Other studies have explored how dependency distance and/or direction vary across genres and text types. These investigations reveal that dependency distance displays different distributions depending on genre and text type, suggesting that these factors may influence dependency distance and direction (see Fan & Jiang, 2019; Wang & Liu, 2017).

Despite the application of dependency distance and direction as quantitative indices in various research fields, such as language categorization and cognitive mechanisms, dependency relations have not garnered significant attention in translation studies. However, a few studies have employed dependency distance and/or direction as quantitative variables to examine specific issues in translation and interpreting. For example, Liang et al. (2017) examined how dependency distance varies in different types of interpreting. Fan and Jiang (2019) used dependency relations to differentiate translational English from native English, revealing that “the MDD of the translated texts and native texts is significantly different from each other and the MDD of the translated texts is longer than that of the native texts” (p. 58). Their findings also show that dependency direction can distinguish between translational and native English texts. Nevertheless, in the study of translator style, dependency distance and/or dependency direction have rarely been utilized as variables for quantitative analysis.

From the above literature review, it is evident that corpus-based stylometric analysis offers significant advantages over traditional close reading in examining translator style. Inspired by previous studies on translator style that focus on lexical and syntactical indicators, which predominantly target formal rather than semantic features, we aim to take this research further by examining syntactic indicators—specifically, the semantic relations between sentence constituents, known as dependency relations. As previous studies (e.g., Han et al., 2019) have suggested, dependency relations offer valuable insights into translator style at the syntactic level. We propose that dependency relations, which expose the underlying semantic connections between sentence components, can serve as crucial complementary indicators to the syntactic-level features already explored, such as ASL. This is because sentences of similar length may exhibit entirely different semantic relations among their constituents, resulting in distinct stylistic characteristics at the syntactic level. In light of this, we conducted a preliminary study on the translator style of four English translations of Hongloumeng through corpus-based stylometric analysis, using dependency relations as key indicators. This research seeks to both corroborate the findings of earlier studies and test our hypothesis regarding the role of dependency relations in translator style.

Methodology

Theoretical Basis

In this study, we aimed to explore translator style through a quantitative analysis based on dependency grammar (DG). DG is a theoretical framework that describes language structures through the relationship between their elements, known as dependencies—an asymmetrical relationship between two sentence constituents, typically words, where one functions as the governor or head, and the other as its dependent or modifier (Fraser, 1994). In dependency grammar, two key measures of analysis are dependency distance and dependency direction, which have rarely been used as variables in the quantitative study of translator style.

Dependency distance (Liu 2007, 2008; Liu et al., 2017), also referred to as dependency length (Futrell et al., 2015; Gildea & Temperley, 2010; Temperley, 2007, 2008; Temperley & Gildea, 2018), is defined as the linear positional difference between two words within a sentence—namely, the governor and its dependent (Hudson, 1995, 2010; Liu et al., 2009). The value of dependency distance is measured by counting the number of intervening words between the dependents and its governor (Hudson, 1995). According to Liu et al., (2009), for any dependency relation between two words, Wx and Wy, where x is the governor and y is the dependent, the dependency distance between them is measured as the difference x-y. By this measure, the dependency distance between adjacent words is 1. When x is greater than y, the dependency distance is positive, meaning the governor follows the dependent; when x is smaller than y, the dependency distance is negative, indicating the governor precedes the dependent. However, mean dependency distance (MDD) is calculated using the absolute values of these distances. The MDD of a sentence is computed using the following formula:

MDD (the sentence) = \frac{1}{n - 1} \sum_{i = 1}^{n - 1} | D D_{i} | (see Liu et al ., 2009, p . 166)

Formula 1

In this formula, n represents the number of words in the sentence, and DDi refers to the dependency distance of the i-th syntactic relation within the sentence. Typically, each sentence contains one word without a governor, known as the root verb, whose dependency distance is considered to be zero.

MDD (the sample) = \frac{1}{n - s} \sum_{i = 1}^{n - s} | D D_{i} | (see Liu et al ., 2009, p . 166)

Formula 2

In this formula, n represents the total number of words in the sample, and s denotes the number of sentences in the sample. DDi is the dependency distance of the i-th syntactic relation within the sample.

The following example demonstrates a dependency analysis. Figure 1 illustrates the dependency structure for Example 1, and Table 1 lists the corresponding dependency relations and the dependency distances between word pairs.

Example 1

Original: 不知有何祸事，且听下回分解。

Translation: But if you wish to know who it was, you will have to read the next chapter.

(Last sentence of Chapter 1 in David Hawkes’s translation of Hongloumeng)

Figure 1 illustrates the dependency relations between governors and their dependents in the example sentence. Each syntactically related word pair is connected by a labeled line with an arrow pointing from the governor to the dependent. Labels, such as nsubj, det, obj, amod, and punct represent the specific dependency relations between the words.

Figure 1.

Dependency structure of Example 1.

Table 1.

Dependency Relations of Example 1.

Dependent id	Token	Part of speech	Governor id	Dependency relation	Dependency distance
1	But	CC	0	ROOT	/
2	if	IN	4	mark	2
3	you	PRP	4	nsubj	1
4	wish	VBP	1	dep	−3
5	to	TO	6	mark	1
6	know	VB	4	xcomp	−2
7	who	WP	9	obj	2
8	it	PRP	9	nsubj	1
9	was	VBD	6	ccomp	−3
10	,	PUNCT	9	punct	/
11	you	PRP	13	nsubj	2
12	will	MD	13	aux	1
13	have	VB	9	ccomp	−4
14	to	TO	15	mark	1
15	read	VB	13	ccomp	−2
16	the	DT	18	det	2
17	next	JJ	18	amod	1
18	chapter	NN	15	obj	−3
19	.	PUNCT	1	punct	/

According to formula 1, the MDD of the example can be calculated as follows:

\begin{array}{l} MDD(Example1)= \\ \frac{| 2 + 1 + 3 + 1 + 2 + 2 + 1 + 3 + 2 + 1 + 4 + 1 + 2 + 2 + 1 + 3 |}{13} \\ =1 .9375 \end{array}

As mentioned earlier, the value of dependency distance can be either positive or negative, depending on whether the governor precedes or follows its dependent. If the governor precedes the dependent, the dependency distance is negative, indicating a governor-initial (or head-initial) dependency relation. Conversely, if the governor follows the dependent, the dependency distance is positive, indicating a governor-final (or head-final) dependency relation. This variable is referred to as dependency direction (Fan & Jiang, 2019; Jiang & Liu, 2015; Liu et al., 2009; Liu, 2010; Wang & Liu, 2017). The dependency direction of a sample can be quantified by calculating the percentage of head-initial and head-final relations, using the following formulas:

\begin{matrix} percentage of head - final dependency \\ = \frac{frequencies of the head - final dependency}{total number of dependencies in the treebank} \times 100 \end{matrix}

Formula 3

\begin{matrix} percentage of head - initial dependency = \\ \frac{frequencies of the head - initial dependency}{total number of dependencies in the treebank} \times 100 \end{matrix}

Formula 4

(see Liu, 2010, p. 1570)

The dependency direction of Example 1 can be calculated as follows:

Percentage of head-final dependencies in Example 1= $\frac{10}{16} \times 100 = 62.5 %$

Percentage of head-initial dependencies in Example 1= $\frac{6}{16} \times 100 = 37.5 %$

As indicated by these percentages, the example sentence has significantly more head-final dependencies than head-initial ones, meaning that in most cases, the dependents precede their governors.

Corpus Data

The corpus used in this study consists of a large dataset featuring four English translations of Hongloumeng—one of the four greatest classic novels in Chinese literary history—translated and published under different titles: The Dream of the Red Chamber by H. Bencraft Joly (1891), The Story of the Stone by David Hawkes & John Minford (1973–1986), and A Dream of Red Mansions by Yang Hsien-yi (Xianyi) & Gladys Yang (1978–1980).

The dataset was divided into four sub-datasets based on authorship, or more accurately in this context, “translator-ship”: Sub-dataset 1 (The Dream of the Red Chamber, hereafter DRC) consists of the first 56 chapters of Hongloumeng. Sub-dataset 2 (The Story of the Stone, volumes 1–3, hereafter SOS 1) contains the first 80 chapters, translated by David Hawkes. Sub-dataset 3 (The Story of the Stone, volumes 4-5, hereafter SOS 2) covers the last 40 chapters, translated by John Minford. Sub-dataset 4 (A Dream of Red Mansions, hereafter DRM) includes all 120 chapters of Hongloumeng, translated by Yang Hsien-yi (Xianyi) and Gladys Yang. Although the four sub-datasets are imbalanced in size due to the varying number of chapters each translator worked on, this imbalance does not negatively affect our analysis. This division is determined by the translators’ respective contributions: Joly translated only the first 56 chapters, while Hawkes translated the first 80 chapters, with the remaining 40 chapters translated by Minford. Instead of relying on traditional corpus analysis methods—such as calculating metrics like the TTR, which are sensitive to dataset size—we focused on the MDD of each chapter within the four sub-datasets, as well as the overall MDD for the entire corpus. This approach is less affected by the size of the corpus and allows for a more balanced comparison of translator styles.

The four sub-datasets are categorized into two groups: L1 translation (translated into one’s mother tongue or first language) and L2 translation (translated into one’s second language). DRC, SOS 1 and SOS 2 belong to the L1 translation category, while DRM falls into the L2 translation category. This division allows us to examine whether there are notable difference in dependency distance and dependency direction between L1 and L2 translations, as well as among the translations of different translators. For this reason, The Story of the Stone was split into two sub-datasets: SOS 1 and SOS 2. Despite being the complete translation of Hongloumeng, SOS 1 was translated by David Hawkes, while SOS 2 was translated by his son-in-law, John Minford, who claimed to have maintained stylistic consistency with Hawkes in the latter part of the translation (Zhu & Minford, 2017, pp. 51–52). The descriptive statistics of the dataset are provided in Table 2, giving insight into the structural characteristics of the different translations and laying the groundwork for analyzing the role of dependency relations in the translators’ styles.

Table 2.

Descriptive Statistics of the Dataset.

Sub-dataset number	Translator	Translation direction	Number of chapters	Number of sentences	Mean sentencecount per chapter	Number of words	Mean word count per chapter
Sub-dataset 1 (DRC)	H. Bencraft Joly	L1 translation	56	20,800	358.1	448,858	7982.2
Sub-dataset 2 (SOS 1)	David Hawkes	L1 translation	80	26,890	358.0	578,715	7224.4
Sub-dataset 3 (SOS 2)	John Minford	L1 translation	40	13,288	333.5	251,974	6287.9
Sub-dataset 4 (DRM)	Yang Hsien-yi & Gladys Yang	L2 translation	120	44,500	370.8	628,887	5240.7
Total			296	107,284	362.4	1,908,434	6447.4

Research Questions

This study is guided by the following three questions.

(1) Is there a significant difference in dependency distance among the four English translations of Hongloumeng?

(2) Is there a significant difference in dependency direction among the four English translations of Hongloumeng?

(3) Can dependency distance and dependency direction serve as reliable indicators for distinguishing the translator style of the English versions of Hongloumeng?

Data Processing

The procedure for data processing is outlined as follows:

Data Extracting and Cleaning

The raw data was extracted from a self-built corpus of the English translations of Hongloumeng. The corpus was compiled by scanning printed copies of the works by Cao (1973, 1977, 1980, 1982, 1986, 1994, 2010) and saved as PDF files. These files were processed using OCR (Optical Character Recognition), and the OCR results were saved in Word files for manual error-checking. Following this, data cleaning was performed to ensure accuracy in the subsequent analysis. This step primarily involved removing meta-information to achieve more precise statistical results. All data was stored in UTF-8 encoded files to prevent any compatibility issues with corpus analysis tools.

MDD Computing

The leoDDcalculator (Lei & Jockers, 2018), an R package designed to calculate MDD values, was used to compute the syntactic dependencies of the texts.

Results

In this section, we present the findings related to the translator style of the four English translations of Hongloumeng by examining syntactical complexity through two indicators of dependency grammar: dependency distance and dependency direction.

Dependency Distance (DD) of Chapters in the Sub-datasets

This subsection reports the MDD of the chapters in each of the four sub-datasets. Figure 2 illustrates the evolution of MDD across the chapters of the sub-datasets, shown as individual datapoints (Raw MDD data for each sub-dataset is available upon request). The vertical axis represents the MDD values, while the horizontal axis denotes the chapters of the sub-datasets respectively. The density of datapoints corresponds to the number of chapters for each sub-dataset, meaning that more datapoints indicate more chapters. For instance, the graph in the top left corner contains significantly fewer datapoints than the one in the bottom right corner, as DRC consists of only 56 chapters, while DRM comprises 120 chapters.

Figure 2.

MDD of Chapters in the Sub-datasets.

As shown in Figure 2, the four sub-datasets exhibit distinct patterns of MDD evolution. The scatterplots indicate that, overall, DRM—the sub-dataset at the bottom right corner—has a lower MDD value compared to the other three sub-datasets. This variation in MDD evolution across the sub-datasets suggests that the sample texts can be differentiated by their MDD values. In other words, MDD can serve as a syntactic-level indicator of the translators’ styles in the case of the four English versions of Hongloumeng analyzed here. The lower MDD value in DRM suggests that, in terms of syntactic features, this translation is of less formal, characterized by simpler syntax and greater succinctness compared to others.

Table 3 presents the descriptive statistics of MDD for the four English translations of Hongloumeng. The data reveals notable differences among the four English texts, with DRM showing the lowest dependency distance. Although the values of dependency distance differ significantly between the translations, the overall patterns of dependency distance evolution across chapters are not as divergent, particularly for SOS 1, SOS2, and DRM. These three translations exhibit similar patterns, as reflected by the standard deviations (SD) of their MDD values—0.09, 0.08, and 0.09, respectively. The low SD of MDD for these texts indicates internal consistency in translator style, meaning that these translators maintained a consistent approach to syntactic features throughout their translations. In contrast, DRC stands out from the other three translations in terms of dependency distance evolution, with an SD of MDD at 0.21, significantly higher than the rest. This larger SD suggests a lack of internal consistency in the translator’s style, implying that the translator did not maintain a consistent approach to syntactic features. This finding aligns with Zhang and Liu’s (2014) observation that the first 24 chapters differ markedly in style from the subsequent 32 chapters. As illustrated in Figure 2, the datapoints for DRC are divided into two distinct groups: 24 points in the upper left corner and 32 points in the bottom right corner, corroborating the results of Zhang and Liu’s (2014) study.

Table 3.

Descriptive Statistics of MDD of the Sub-datasets.

Data	Min	Qu_1	Median	Mean	Qu_3	Max	SD	Skew	Kurtosis
Sub-dataset 1 (DRC)	2.33	2.43	2.54	2.60	2.77	3.04	0.21	0.51	−1.01
Sub-dataset 2 (SOS 1)	2.30	2.42	2.47	2.48	2.53	2.80	0.09	0.83	1.07
Sub-dataset 3 (SOS 2)	2.22	2.33	2.38	2.38	2.43	2.59	0.08	0.19	-0.25
Sub-dataset 4 (DRM)	2.07	2.28	2.33	2.33	2.39	2.60	0.09	−0.2	0.35

Table 4 provides the overall MDD values for the four sub-datasets. As shown in the table, sub-dataset 4 (DRM) has an MDD of 2.3158, which is lower than the values for the other three sub-datasets. Compared to the MDD of 2.543 for original English texts (Liu, 2008), sub-dataset 4 (DRM), along with sub-dataset 2 (SOS1) and sub-dataset 3 (SOS2), demonstrates significantly lower MDD values. However, sub-dataset 1 (DRC) presents a higher MDD value than that of the original English texts. At present, there is no substantial research available on the MDD of translated texts using large datasets, leaving it unclear whether the MDD values of sub-dataset 2 (SOS1), sub-dataset 3 (SOS2), and sub-dataset 4 (DRM) are representative of translated English texts. This remains a subject for future investigation. Nevertheless, the results shown in Table 4 suggest that MDD can serve as an indicator of translator style at the syntactic level, at least in this specific case.

Table 4.

MDD of the Sub-datasets.

Index	Sub-dataset 1 (DRC)	Sub-dataset 2 (SOS1)	Sub-dataset 3 (SOS2)	Sub-dataset 4 (DRM)
MDD	2.5582	2.4742	2.3775	2.3158

To explore whether the differences in MDD between the four texts are statistically significant, we conducted an independent sample t-test on the MDD values.

Table 5 presents the t-test results for the MDD of the sub-datasets. The results indicate that the four datasets differ significantly from one another, as the p-values for each pair of texts are all below the significance level of 0.05, with each p-value being less than .001. Therefore, it can be confidently stated that MDD is effective in distinguishing between the translated texts of different translators and can serve as an indicator of the translator style for the four English versions of Hongloumeng.

Table 5.

MDD t-test Results.

Welch two sample t-test of MDD
Data	t	df	p-Value
DRC and DRM	9.0927	64.048	<.001
SOS1 and DRM	11.545	165.39	<.001
SOS2 and DRM	3.4607	72.023	<.001
SOS1 and DRC	−3.9094	69.389	<.001
SOS2 and DRC	−6.9229	75.437	<.001
SOS1 and SOS 2	5.9506	86.733	<.001

Dependency Direction (DDI) of Chapters of the Sub-datasets

Dependency direction refers to whether the governor in a dependency relation precedes or follows its dependent, determining if a dependency structure is governor-initial or governor-final. Previous studies, such as Liu (2010) and Fan and Jiang (2019), have demonstrated that dependency direction is an effective indicator of linguistic typology, useful for distinguishing translational language from native language and classifying natural languages.

In this study, dependency direction is employed as a factor to assess whether it can differentiate the translator styles of the four English translations of Hongloumeng. Using Formula 3 as defined earlier, the DDI for each sub-dataset has been calculated, with the results presented in Table 6.

Table 6.

DDI of Sub-datasets.

I?ndex	Sub-dataset 1 (DRC)	Sub-dataset 2 (SOS 1)	Sub-dataset 3 (SOS 2)	Sub-dataset 4 (DRM)
(HI)Head-initial	34.06%	34.85%	34.88%	35.14%
(HF)Head-final	65.94%	65.15%	65.12%	64.86%
HI/HF	0.5165	0.5350	0.5356	0.5419

As shown in Table 6, sub-dataset 4 (DRM) exhibits a higher percentage of HI (head-initial) dependency relations compared to the other three sub-datasets. Sub-dataset 2 and sub-dataset 3, which represent the two parts of The Story of the Stone, show a similar percentage of HI dependency relations, albeit slightly lower than sub-dataset 4.

To determine whether the differences in DDI evolution are statistically significant, we performed independent t-tests between sub-dataset 4 (DRM) and the other sub-datasets. For the t-test, we used the raw DD (Dependency Direction) data. First, the DD values of each sub-dataset were categorized into two groups: negative values, representing head-initial dependency relations, and positive values, representing head-final dependency relations. Next, we input the negative DD values of DRC and DRM, SOS 1 and DRM, SOS 2 and DRM into R to conduct the t-test. The same procedure was repeated for the positive DD value of these three pairs of sub-datasets. The t-test results are presented in Table 7.

Table 7.

DDI t-test Results.

Welch two sample t-test of DDI values
Data	t	df	p-Value
DRC and DRM	−31.078	1,018,691	<.001
SOS1 and DRM	−8.4169	882,887	<.001
SOS2 and DRM	−8.3648	492,324	<.001
SOS1 and DRC	23.451	1,358,491	<.001
SOS2 and DRC	17.851	531,594	<.001
SOS1 and SOS2	−1.451	494,966	.147

As shown in Table 7, among the six pairs of sub-datasets, five pairs—excluding the pair of SOS 1 and SOS 2—exhibit significant differences in their dependency direction evolution, with p-values below the significance level of .05, each less than .001. For SOS 1 and SOS 2, however, the p-value of the t-test is .147, which is above the significance level, indicating that the two texts are not significantly different in their dependency direction evolution. When SOS 1 and SOS 2 are combined as a whole, representing the complete English translation of Hongloumeng, the combined dataset shows significant differences from the other two texts, DRC and DRM, with p-values below 0.001 in the t-tests of dependency direction. However, for the individual sub-datasets, dependency direction cannot be conclusively considered a reliable indicator to distinguish the translated texts of different translators or as a definitive indicator of translator style for the four English translations of Hongloumeng.

Examples of MDD Difference

In the Theoretical Basis section, an example from the sub-dataset SOS 1 was provided, specifically the last sentence of Chapter 1 translated by David Hawkes, which had an MDD of 1.9375. To further illustrate the stylistic differences among the four translated texts, three additional examples from the other sub-datasets are presented. These examples highlight how differences in MDD reflect distinct translator styles. The relevant figures (Figures 3 –6) and tables (Tables 8 –10) accompanying each example show the corresponding dependency structures and relations.

Figure 3.

Dependency structure of Example 2.

Figure 4.

Dependency structure of Example 3.

Figure 5.

Dependency Structure of Example 4.

Figure 6.

Style in a continuum of the degree of succinctness.

Table 8.

Dependency Relations of Example 2.

Dependent id	Token	Part of speech	Governor id	Dependency relation	Dependency distance
3	But	CC	0	ROOT	/
1	to	TO	3	cc	2
2	learn	VB	3	mark	1
4	how	WRB	6	advmod	2
5	he	PRP	6	nsubj	1
6	fared	VBD	3	ccomp	-3
7	the	DT	9	det	2
8	following	VBG	9	amod	1
9	day	NN	6	obl: tmod	-3
10	,	PUNCT	6	punct	/
11	you	PRP	13	nsubj	2
12	must	MD	13	aux	1
13	read	VB	6	parataxis	-7
14	the	DT	16	det	2
15	next	JJ	16	amod	1
16	chapter	NN	13	obj	-3
17	.	PUNCT	3	punct	/

Table 9.

Dependency Relations of Example 3.

Dependent id	Token	Part of speech	Governor id	Dependency relation	Dependency distance
4	What	WDT	0	ROOT	/
1	calamity	NN	2	det	1
2	was	VBD	4	nsubj	2
3	impending	VBG	4	aux	1
5	is	VBZ	9	aux: pass	4
6	not	RB	9	advmod	3
7	as	RB	9	advmod	2
8	yet	RB	9	advmod	1
9	ascertained	VBN	4	dep	-5
10	,	PUNCT	9	punct	/
11	but	CC	15	cc	4
12	,	PUNCT	15	punct	/
13	reader	NN	15	nsubj	2
14	,	PUNCT	15	punct	/
15	listen	VB	9	conj	-6
16	to	IN	18	case	2
17	the	DT	18	det	1
18	explanation	NN	15	obl	-3
19	contained	VBD	18	acl	-1
20	in	IN	23	case	3
21	the	DT	23	det	2
22	next	JJ	23	amod	1
23	chapter	NN	19	obl	-4
24	.	PUNCT	4	punct	/

Table 10.

Dependency Relations of Example 4.

Dependent id	Token	Part of speech	Governor id	Dependency relation	Dependency distance
5	read	VB	0	ROOT	/
1	To	TO	2	mark	1
2	find	VB	5	advcl	3
3	out	RP	2	compound: prt	-1
4	,	PUNCT	5	punct	/
6	the	DT	8	det	2
7	next	JJ	8	amod	1
8	chapter	NN	5	obj	-3
9	.	PUNCT	5	punct	/

Since SOS 1 and SOS 2 were translated by two different translators but together form the complete English translation of Hongloumeng, it is impossible to find the exact same sentence to compare MDD across all four texts. As a compromise, we chose the last sentence of Chapter 1 from DRC, DRM and SOS 1, and the last sentence of Chapter 81 of SOS 2. This approach works because the author of Hongloumeng frequently ended many chapters with the same sentence pattern, “…且听下回分解,” which roughly means “read the next chapter to find out what is going on.”

Example 2

Original: 欲知明日听解何如，且听下回分解。

Translation: But to learn how he fared the following day, you must read the next chapter.

(This is the last sentence of Chapter 81 in the English translation of Hongloumeng by John Minford.)

Example 3

Original: 不知有何祸事，且听下回分解。

Translation: What calamity was impending is not as yet ascertained, but, reader, listen to the explanation contained in the next chapter.

(This is the last sentence of Chapter 1 in H. Bencraft Joly’s version of Hongloumeng).

Example 4

Original: 不知有何祸事，且听下回分解。

Translation: To find out, read the next chapter.

(This is the last sentence of Chapter 1 in Yang Hsien-yi (Xianyi) and Gladys Yang’s version of Hongloumeng.)

Based on formula 1, the MDD of the above three examples can be calculated as follows:

MDD (Example 2_SOS 2) = $\frac{| 2 + 1 + 2 + 1 + 3 + 2 + 1 + 3 + 2 + 1 + 7 + 2 + 1 + 3 |}{14} = 2.2143$

MDD (Example 3_DRC) = $\frac{| 1 + 2 + 1 + 4 + 3 + 2 + 1 + 5 + 4 + 2 + 6 + 2 + 1 + 3 + 1 + 3 + 2 + 1 + 4 |}{19} = 2.5263$

MDD (Example 4_DRM) = $\frac{| 1 + 3 + 1 + 2 + 1 + 3 |}{6} = 1.8333$

Drawing on the results from these calculations, along with the MDD of Example 1, we can create a comparative table of the MDD values from the four sub-datasets.

As Table 11 shows, Example 3 from sub-dataset 1 has the longest MDD, and Example 4 from sub-dataset 4 has the shortest MDD, with the other two examples falling between them. The results demonstrate a significant variation in dependency distance across the four texts, with MDD values of 1.8333, 2.5263, 1.9375, and 2.2143 respectively.

Table 11.

MDD of the examples.

Index	Sub-dataset 1 (DRC)	Sub-dataset 2 (SOS1)	Sub-dataset 3 (SOS2)	Sub-dataset 4 (DRM)
MDD	2.5263	1.9375	2.2143	1.8333

The DDI of the three examples can be calculated as follows, using the previously discussed Formulas 3 and 4:

percentage of head-final dependency of Example 2 = $\frac{10}{14} \times 100 = 71.43 %$

percentage of head-initial dependency Example 2= $\frac{4}{14} \times 100 = 28.57 %$

percentage of head-final dependency of Example 3 = $\frac{14}{19} \times 100 = 73.68 %$

percentage of head-initial dependency Example 3= $\frac{5}{19} \times 100 = 26.32 %$

percentage of head-final dependency of Example 4 = $\frac{4}{6} \times 100 = 66.67 %$

percentage of head-initial dependency Example 4= $\frac{2}{6} \times 100 = 33.34 %$

As Table 12 shows, the four examples also differ in their dependency direction, further emphasizing that dependency direction effectively distinguishes the four English texts of Hongloumeng.

Table 12.

Dependency Direction of the Examples.

Index	Sub-dataset 1(DRC)	Sub-dataset 2(SOS 1)	Sub-dataset 3(SOS 2)	Sub-dataset 4(DRM)
Percentage of head-final dependency	73.68%	62.50%	71.43%	66.67%
Percentage of head-initial dependency	26.32%	37.50%	28.57%	33.34%

To determine whether the four examples differ significantly in dependency relations, we conducted a t-test. The results show that the four sentences are significantly different in both MDD (t = 13.706, p < .001) and DDI (t = 27.474, p < .001). This suggests that dependency relations can indeed distinguish the four sentences in terms of syntactic structure, highlighting distinct translator styles at the syntactic level.

As demonstrated by the dependency analysis of the four examples:

But if you wish to know who it was, you will have to read the next chapter. (SOS 1)

But to learn how he fared the following day, you must read the next chapter. (SOS 2)

What calamity was impending is not as yet ascertained, but, reader, listen to the explanation contained in the next chapter. (DRC)

To find out, read the next chapter. (DRM)

They are distinguishable in dependency relations at the syntactic level. Example 4 from DRM has the shortest MDD (1.8333), indicating less syntactic complexity, making it easier for readers to comprehend due to simpler dependency relations. Sentences with shorter MDD tend to be more succinct in style, while longer MDD suggests more complexity and verbosity. Therefore, in terms of stylistic succinctness, Example 4 from DRM demonstrates the highest degree of succinctness. Example 3 from DRC, by contrast, presents a style of lower succinctness, with example 1 from SOS 1 and example 2 from SOS 2 falling somewhere in between. Based on the degree of succinctness, Example 3 from DRC is the most verbose, while Example 4 from DRM is the most succinct, with the other two examples occupying the middle ground. If we were to place the style of these four examples on a continuum from succinctness to verbosity, the relationship can be presented in the following figure.

In summary, the four examples exhibit distinct styles when evaluated by syntactic complexity. Example 4 from DRM demonstrates the highest degree of succinctness, while Example 3 from DRC reflects the lowest degree of succinctness. SOS 1 and SOS fall between DRM and DRC, with SOS 1 displaying greater succinctness than SOS 2. Additionally, the length of the sentences correlates with their stylistic succinctness: Example 4 from DRM is the shortest, and Example 3 from DRC is the longest, with SOS 1 and SOS 2 situated in between. Therefore, Example 4 from DRM exemplifies a more concise style, whereas Example 3 from DRC presents a more verbose style. Thus, it can be concluded that dependency relations serve as a useful metric for identifying translator styles, particularly when assessing the degree of succinctness.

Discussion

Previous studies have demonstrated that dependency grammar is an effective tool for typological analysis in translation studies (Fan & Jiang, 2019; Liang et al., 2017). Based on this, we hypothesized that dependency distance could reveal translator style. In the present study, we extended the application of dependency grammar to stylometric analysis, utilizing two key indicators—dependency distance and dependency direction—to differentiate translator styles. The results are discussed below in relation to the three research questions.

First, the findings reveal a statistically significant difference in dependency distance among the English translations of Hongloumeng under examination. As shown by the statistics in the previous section, the MDD values for DRC, SOS 1, SOS 2, and DRM are 2.5582, 2.4742, 2.3775, and 2.3158, respectively, all of which differ significantly from the MDD of native English texts (2.543, according to Liu, 2008). This result answers the first research question and part of the third research question, indicating that there is a significant variation in dependency distance among the four English translations of Hongloumeng. Furthermore, dependency distance can serve as a marker for distinguishing translator styles at the syntactic level, particularly with regard to the degree of succinctness.

Second, we found that, with the exception of the SOS 1 and SOS 2 pair, the other five pairs of English texts of Hongloumeng showed significant differences in dependency direction. This finding does not fully confirm the second research question, as it cannot be conclusively stated that there is always a significant difference in dependency direction across all four English translations of Hongloumeng. However, it is worth noting that SOS 1 and SOS 2 were expected to have similar styles, given that John Minford, the translator of SOS 2, had studied SOS 1 to ensure stylistic consistency (Zhu & Minford, 2017, pp. 51–52). Despite this effort, the analysis of dependency distance revealed a significant difference between SOS 2 and SOS 1, suggesting that translators inevitably imprint their unique style on their work, even when attempt to remain invisible.

Among the four English translations of Hongloumeng, three—DRC, SOS 1, and SOS 2—are by native English speakers: H. Bencraft Joly, a British vice-consul in Macao; David Hawkes, a British sinologist and translator, and John Minford, also a British sinologist and translator. The fourth, DRM, was translated by Yang Hsien-yi (Xianyi), a Chinese literary translator. In terms of the translators’ native languages, DRC, SOS 1, and SOS 2 are translations from L2 into L1 (from a second language into the translator’s first), while DRM is a translation from L1 into L2 (from the translator’s first language into a second). The findings of this reveal show that DRM, translated from L1 to L2, shows significant differences from the other three, particularly in terms of dependency distance (DD), with a much shorter mean dependency distance (MDD) than the others. A possible explanation for this sharp difference is the translation direction: DRM was translated into L2, while the others were translated into L1. Translation direction may thus have influenced this variation in dependency distance. Although some scholars, such as Fan and Jiang (2019), have demonstrated that dependency distance can distinguish translated texts from non-translated ones, no research to date has focused on whether dependency distance can distinguish L1 translations (those translated into the translator’s first language) from L2 translations (those translated into a second language). If we argue that translation direction can be distinguished through MDD, more evidence is needed, which will be the goal of our future research.

In addition to translation direction, time may also have influenced the differences in dependency distance. This study finds that DRC, translated in 1891, has a significantly longer MDD than the other three texts, which were translated much later: SOS 1 in 1973 (82 years later), SOS 2 in 1982 (91 years later), and DRM in 1978 (87 years later). A general tendency of decreasing MDD can be observed, from 2.5582 in DRC to 2.3158 in DRM. Previous studies have noted a minimization process in DD over time (Lei & Wen, 2019; Temperley, 2007, 2008). In this context, the decrease in MDD from DRC (2.5582) to SOS 1 (2.4742) could partially be explained by this minimization, as it reflects humans’ tendency to seek the least effort in communication. Thus, it can be hypothesized that time is a factor influencing translator style, as measured by dependency distance. To further explore this, a diachronic corpus would be necessary to investigate the effect of time on translator style.

Additionally, the factor of source text accessibility may have played a role. Although we eliminated the influence of the source texts themselves—all four translated texts were based on the same Chinese text, Hongloumeng—we did not consider the accessibility of source text. Our study did not explore whether a translator would produce translations with similar dependency relations when working from source texts of varying accessibility. To draw a more definitive conclusion about whether dependency relations can serve as indicators of translator style, further research is needed, particularly using a translational corpus that includes different source texts translated by the same translator.

Regardless of whether the differences in MDD among the four English translations of Hongloumeng are due to translation direction, the principle of least effort in human communication, source text accessibility, or other factors such as ideologies or communities of practice, it is clear that the four translations exhibit significant differences in MDD. This suggests that dependency distance can indeed serve as an indicator of translator style—at least for the four English translations of Hongloumeng.

Based on the above analysis and discussion, we propose that in the stylometric analysis of translator style, dependency relations—particularly dependency distance—can be considered an effective indicator, which we refer to as an implicit measure of translator style. In our view, measures of translator style in the stylometric approach fall into two categories: explicit measures and implicit measures. Explicit measures examine observable indicators that can be easily calculated manually or by computer, such as word length, sentence length, paragraph length, and typical language patterns. These require only one step of calculation. Implicit measures, on the other hand, examine indicators that are not directly observable and are difficult to be calculated manually. They require at least two steps of calculation, such as TTR (Type-Token Ratio), STTR (Standardized Type-Token Ratio), lexical density (LD), and MDD. For example, calculating MDD requires an initial a step of POS tagging, followed by syntactic parsing before the final MDD calculation. In most cases, implicit measures rely on natural language processing (NLP). Compared to explicit measures, implicit measures reveal hidden features of translator style that are beyond what can be detected by the naked eye.

Conclusion

Dependency distance is an effective indicator for measuring syntactic complexity (Oya, 2011; Lei & Wen, 2019). This study examined translator style from the perspective of syntactic complexity, using dependency distance and dependency direction as measurement indicators. Based on the analysis and discussion in the preceding sections, it can be confidently concluded that MDD (Mean Dependency Distance) is an effective indicator of translator style. In addition to MDD, DDI (Dependency Direction) can be considered a supplementary indicator for measuring translator style.

However, several limitations should be addressed in future research. First, the data used in this study consists of only four English translations of the Chinese novel Hongloumeng, which may be limited in both quantity and genre. Therefore, caution is required when considering the representativeness of the data and the generalizability of the findings. Future research should include a larger dataset spanning different genres and languages. Second, the dependency relations in this study were automatically annotated using the leoDDcalculator. Due to the large volume of data, only a portion of the parsed data was manually verified. Although some annotation errors were found during manual checks, they likely do not significantly impact the overall findings, given the relatively high accuracy of the calculator (Lei & Jockers, 2018).

Footnotes

Acknowledgements

The author would like to express sincere thanks to the anonymous reviewers and the editor for their thoughtful and detailed comments, which significantly enhanced the clarity and rigor of the paper.

ORCID iD

Hua Tan

Ethical Considerations

This article does not contain any studies with human or animal participants.

Author Contributions

The author is solely responsible for the conception, design, data collection, analysis, interpretation, and writing of this manuscript.

Funding

The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the China Postdoctoral Science Foundation [grant number 2023M730702]; the Center for Translation Studies of Guangdong University of Foreign Studies [grant number CTS202010]; the Humanities and Social Sciences Funds of Department of Education of Hubei Province [grant number 20G012].

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The datasets generated and/or analyzed during the current study are not publicly available but are available from the author on reasonable request.

References

Altamimi

(2016). A corpus-driven investigation of translator style: A study of Humphrey Davies’ Arabic-English Translations of Midaq Alley and The Yacoubian Building [PhD thesis]. University of Leeds.

Baker

(2000). Towards a methodology for investigating the style of a literary translator? Target, 12(2), 241–266. https://doi.org/10.1075/target.12.2.04bak

Tan

(2024). Language transfer in L2 academic writings: A dependency grammar approach. Frontiers in Psychology, 15, 1384629. https://doi.org/10.3389/fpsyg.2024.1384629

Brashi

(2021). Style shift in translation: The case of translating Susan Glaspell’s Trifles into Arabic. Translation and Interpreting, 13(2), 79–91. https://search.informit.org/doi/10.3316/informit.023602307020457

Brown

Snodgrass

Kemper

S. J.

Herman

Covington

M. A.

(2008). Automatic measurement of propositional idea density from part-of-speech tagging. Behavior Research Methods, 40(2), 540–545. https://doi.org/10.3758/BRM.40.2.540

Cao

(1973). Hongloumeng (The Story of the Stone) (Volume 1. Trans. Hawkes

David

). Penguin Books.

Cao

(1977). Hongloumeng (The Story of the Stone) (Volume 2. Trans. Hawkes

David

). Penguin Books.

Cao

(1980). Hongmoumeng (The Story of the Stone) (Volume 3. Trans. Hawkes

David

). Penguin Books.

Cao

(1982). Hongloumeng (The Story of the Stone) (Volume 4. Trans. Minford

John

). Penguin Books.

10.

Cao

1986. Hongloumeng (The Story of the Stone) (Volume 5. Trans. Minford

John

). Penguin Books.

11.

Cao

(1994). Hongloumeng (A Dream of Red Mansions). (Trans. Yang

Hsien-yi

Yang

Gladys

). Foreign Languages Press.

12.

Cao

(2010). Hongloumeng (The Dream of the Red Chamber). (Trans. Joly

Henry Bencraft

). Tuttle Publishing.

13.

Chou

Liu

(2024). Style in speech and narration of two English translations of Hongloumeng: A corpus-based multidimensional study. Target, 36(1), 76–111. https://doi.org/10.1075/target.22020.cho

14.

Collin

, et al. (2004). Detecting Collaborations in text comparing the authors’ rhetorical language choices in the federalist papers. Computers and the Humanities, 38, 15–36. https://doi.org/10.1023/B:CHUM.0000009291.06947.52

15.

Covington

(2007). CPIDR 3 User Manual. CASPR Research Report 2007-03, Artificial Intelligence Center, The University of Georgia. Available, with software, at http://www.ai.uga.edu/caspr.

16.

Covington

McFall

(2010). Cutting the Gordian knot: The Moving-Average Type–Token Ratio (MATTR). Journal of Quantitative Linguistics, 17(2), 94–100. https://doi.org/10.1080/09296171003643098

17.

Covington

Potter

Snodgrass

(2015). Stylometric classification of different translations of the same text into the same language. Digital Scholarship in the Humanities, 30(3), 322–325. https://doi.org/10.1093/llc/fqu008

18.

Dooley

Ramirez

(2009). Who wrote the blonde countess? A Stylometric Analysis of Herbert O. Yardley’s Fiction. Cryptologia, 33(2), 108–117. https://doi.org/10.1080/01611190802653244

19.

Fan

Jiang

(2019). Can dependency distance and direction be used to differentiate translational language from native language? Lingua, https://doi.org/10.1016/j.lingua.2019.03.004

20.

Fraser

(1994). Dependency Grammar. In Asher

(ED), Encyclodepia of language and linguistics (pp. 860–864). Pergamon.

21.

Futrell

Mahowald

Gibson

(2015). Large-scale evidence of dependency length minimization in 37 languages. Proceedings of the National Academy of Sciences of the United States of America, 112(33), 10336–10341. https://doi.org/10.1073/pnas.1502134112

22.

(2020). A corpus-based study on the syntactic features of the English Versions of Shangshu and translator style. Ludong University Journal (Philosophy and Social Sciences Edition), 37(1), 54–61.

23.

Gildea

Temperley

(2010). Do grammars minimize dependency length? Cognitive Science, 34(2), 286–310. https://doi.org/10.1111/j.1551-6709.2009.01073.x

24.

Han

Jiang

Yuan

(2019). Corpus-based study of translator’s style in the era of big data. Foreign Language Education, 40(2), 88–93. https://doi.org/10.16362/j.cnki.cn61-1023/h.2019.02.016

25.

Holmes

(1995). The federalist revisited: New directions in authorship attribution. Literary and Linguistic Computing, 10(2), 111–127. https://doi.org/10.1093/llc/10.2.111

26.

Hou

(2013). A corpus-based study of nominalization as a feature of translator’s style (Based on the English Versions of Hong Lou Meng). Meta, 58(3), 556–573. https://doi.org/10.7202/1025051ar

27.

Wang

Shao

(2025). Translator attribution of Hongloumeng: Using entropy-based features and machining learning algorithm. Digital Scholarship in the Humanities, 40(1), 138–150. https://doi.org/10.1093/llc/fqae074

28.

Huang

Chu

(2014). Translator’s style or translational style? A corpus-based study of style in translated Chinese novels. Asia Pacific Translation and Intercultural Studies, 1(2), 122–141. https://doi.org/10.1080/23306343.2014.883742

29.

Hudson

R. A.

(1995). Measuring syntactic difficulty. Unpublished paper. https://dickhudson.com/wp-content/uploads/2013/07/Difficulty.pdf (2021-10-31)

30.

Hudson

R. A.

(2007). Language networks: The new word grammar. Oxford University Press.

31.

Hudson

R. A.

(2010). An introduction to word grammar. Cambridge University Press.

32.

Jiang

Liu

(2015). The effects of sentence length on dependency distance, dependency direction and the implications–Based on a parallel English–Chinese dependency treebank. Language Sciences, 50, 93–104. https://doi.org/10.1016/j.langsci.2015.04.002

33.

Kestemont

Moens

Deploige

(2015). Collaborative authorship in the twelfth century: A stylometric study of Hildegard of Bingen and Guibert of Gembloux. Digital Scholarship in the Humanities, 30(2), 199–224. https://doi.org/10.1093/llc/fqt063

34.

Klaussner

Nerbonne

Çöltekin

Ç.

(2015). Finding characteristic features in stylometric analysis. Digital Scholarship in the Humanities, 30(1), 114–129. https://doi.org/10.1093/llc/fqv048

35.

Kwok

H. L.

Moratto

Liu

(2024). Activity versus descriptivity: A stylometric analysis of two English translations of Hongloumeng. Glottometrics, 56, 1–21.

36.

Leech

Short

(1981). Style in fiction: A linguistic introduction to English fictional prose. Longman.

37.

Lei

Jockers

M. L.

(2018). Normalized dependency distance: Proposing a new measure. Journal of Quantitative Linguistics, 27(1), 62–79. https://doi.org/10.1080/09296174.2018.1504615

38.

Lei

Wen

(2019). Is dependency distance experiencing a process of minimization? A diachronic study based on the State of the Union addresses. Lingua, 16, 102762. https://doi.org/10.1016/j.lingua.2019.102762

39.

Zhang

Liu

(2011). Translation style and ideology: A corpus-assisted analysis of two English translations of Hongloumeng. Literary and Linguistic Computing, 26(2), 153–166. https://doi.org/10.1093/llc/fqr001

40.

Liang

Fang

Liu

(2017). Dependency distance differences across interpreting types: implications for cognitive demand. Frontiers in Psychology, 8, 2132. https://doi.org/10.3389/fpsyg.2017.02132

41.

Liu

(2007). Probability distribution of dependency distance. Glottometrics, 15, 1–12. http://lingviko.net/glotto15f.pdf

42.

Liu

(2008). Dependency distance as a metric of language comprehension difficulty. Journal of Cognitive Science, 9(2), 159–191. https://doi.org/10.17791/jcs.2008.9.2.159

43.

Liu

(2010). Dependency direction as a means of word-order typology: A method based on dependency treebanks. Lingua, 120(6), 1567–1578. https://doi.org/10.1016/j.lingua.2009.10.001

44.

Liu

Liang

(2017). Dependency distance: A new perspective on syntactic patterns in natural languages. Physics of Life Reviews, 21, 171–193. https://doi.org/10.1016/j.plrev.2017.03.002

45.

Liu

(2012). Quantitative typological analysis of Romance languages. Poznań Studies in Contemporary Linguistics, 48(4), 597–625. https://doi.org/10.1515/psicl-2012-0027

46.

Liu

Hudson

Feng

(2009). Using a Chinese treebank to measure dependency distance. Corpus Linguistics and Linguistic Theory, 5(2), 161–174. https://doi.org/10.1515/CLLT.2009.007

47.

Liu

Afzaal

(2021). Translator’s style through lexical bundles: A corpus-driven analysis of two English translations of hongloumeng. Frontiers in Psychology, 12, 633422. https://doi.org/10.3389/fpsyg.2021.633422

48.

Liu

Kwok

H. L.

Moratto

(2022). Hedges and boosters as indicators of translation style: With reference to fictional dialogues in Hongloumeng translations. In Moratto

Liu

Chao

(Eds.), Dream of the red chamber: Literary and translation perspectives. Routledge. https://doi.org/10.4324/9781003296812-13

49.

Liu

Xiao

(2020). A stylistic analysis for Gu Long’s Kung Fu novels. Journal of Quantitative Linguistics, 27(1), 32–61. https://doi.org/10.1080/09296174.2018.1504411

50.

Liu

Yan

(2010). Choice and style: On the English translation of the reporting verbs headed by Dao in Hong Lou Meng. Journal of PLA University of Foreign Languages, 33(4), 87–92.

51.

Liu

Zhu

(2011). Translator style of four English versions of Hongloumeng: A corpus-based statistic analysis. Chinese Translators Journal, 1, 60–64.

52.

Mastropierro

(2018). Key clusters as indicators of translator style. Target: International Journal of Translation Studies, 30(2), 240–259. https://doi.org/10.1075/target.17040.mas.

53.

Munday

(2007). Translation and ideology. The Translator, 13(2), 195–217. https://doi.org/10.1080/13556509.2007.10799238

54.

Munday

(2008). Style and ideology in Translation: Latin American Writing in English. Routledge.

55.

Olohan

Baker

(2000). Reporting that in translated English: evidence for subconscious processes of explicitation. Across Languages & Cultures, 2, 141–158. https://doi.org/10.1556/acr.1.2000.2.1

56.

Olohan

(2003). How frequent are the contractions? A study of contracted forms in the translational English Corpus. Target, 15(1), 59–89. https://doi.org/10.1075/target.15.1.04olo

57.

Oya

(2011). Syntactic dependency distance as sentence complexity measure. Proceedings of the 16th Conference of Pan-Pacific Association of Applied Linguistics, 313–316.

58.

Saldanha

(2011). Translator style: Methodological considerations. The Translator, 17(1), 25–50. https://doi.org/10.1080/13556509.2011.10799478

59.

Schöberlein

(2017). Poe or not Poe? A stylometric analysis of Edgar Allan Poe’s disputed writings. Digital Scholarship in the Humanities, 32(3), 643–659. https://doi.org/10.1093/llc/fqw019

60.

Sun

Liu

(2025). Stylistic nuances through syntactic complexity: A corpus-assisted study of narration and dialogue in two English translations of Hongloumeng. Applied Corpus Linguistics, 5(2), 100125. https://doi.org/10.1016/j.acorp.2025.100125

61.

Tan

(2024). Quantifying the style of joint translations: A multi-perspective stylometric analysis of the story of the stone by David Hawkes and John Minford. Studia Neophilologica, 97(1), 50–73. https://doi.org/10.1080/00393274.2024.2356093

62.

Temperley

(2007). Minimization of dependency length in written English. Cognition, 105(2), 300–333. https://doi.org/10.1016/j.cognition.2006.09.011

63.

Temperley

(2008). Dependency-length minimization in natural and artificial languages. Journal of Quantitative Linguistics, 15(3), 256–282. https://doi.org/10.1080/09296170802159512

64.

Temperley

Gildea

(2018). Minimizing syntactic dependency lengths: typological / cognitive universal? Annual Review of Linguistics, 4(1), 1–14. https://doi.org/10.1146/annurev-linguistics-011817-045617

65.

Wang

Liu

(2012). A corpus-based comparative study of translators’ styles: Focusing on five versions of The Ballad of Mulan. Journal of Guangxi University for Nationalities (Philosophy and Social Science Edition), 34(2), 182–188.

66.

Wang

Liu

(2013). Syntactic variations in Chinese-English code-switching. Lingua, 123(1), 58–73. https://doi.org/10.1016/j.lingua.2012.10.003

67.

Wang

Liu

(2016). Syntactic differences of adverbials and attributives in Chinese-English code-switching. Language Sciences, 55(3), 16–35. https://doi.org/10.1016/j.langsci.2016.02.002

68.

Wang

(2012). Looking for translator’s fingerprints: A corpus-based study on Chinese translation of Ulysses. Literary and Linguistic Computing, 27(1), 81–93. https://doi.org/10.1093/llc/fqr039

69.

Wang

Liu

(2017). The effects of genre on dependency distance and dependency direction. Language Sciences, 59, 135–147. https://doi.org/10.1016/j.langsci.2016.09.006

70.

Winters

(2004a). German translations of F. Scott Fitzgerald’s The Beautiful and Damned: A corpus-based study of modal particles as features of translators’ style. In Kemble

(Ed.), Using corpora and databases in translation, (pp. 71–89). University of Portsmouth.

71.

Winters

(2004b). F. Scott Fitzgerald’s Die Schönen und Verdammten: A corpus-based study of loan words and code switches as features of translators’ Style. Language Matters, Studies in the Languages of Africa, 35(1), 245–258. https://doi.org/10.1080/10228190408566215

72.

Winters

(2007). F. Scott Fitzgerald’s Die Schönen und Verdammten: A corpus-based study of speech-act report verbs as a feature of translator’s style. Meta, 52(3), 412–425. https://doi.org/10.7202/016728ar

73.

Winters

(2009). Modal particles explained: How modal particles creep into translations and reveal translators’ Styles. Target, 21(1), 74–97. https://doi.org/10.1075/target.21.1.04win

74.

Yao

(2009). Translation of Word Games in Hongloumeng—A Contrast between Mr. And Mrs. Yang’s and Mr. Hawkes’ Translation. Foreign Languages and Their Teaching, 249(12), 50–52, 56.

75.

Jing

Liu

(2017). Dependency distance motifs in 21 Indo-European Languages. In Liu

Liang

(Eds.), Motifs in Language and Text, Mouton De Gruyter.

76.

Zhang

Liu

(2014). Is this english translation of Hong Lou Meng by Joly Himself?—a corpus-based investigation of translator style. Foreign Languages in China, (1), 85–93.

77.

Zhao

Sun

(2004). Idiolect and translator style in literary translation. Foreign Language Education, 25(3), 64–68.

78.

Zhu

Lei

Craig

(2020). Prose, verse and authorship in dream of the red chamber: A stylometric analysis. Journal of Quantitative Linguistics, 28(4), 289–305. https://doi.org/10.1080/09296174.2020.1724677

79.

Zhu

Minford

(2017). Introducing the “Best China”—An interview with John Minford. East Journal of Translation, 1, 50–56.