Stylistic Fingerprints: A Multi-Layered Computational Analysis of Chinese Translations of Plato’s Republic

Abstract

This study systematically compares four prominent Chinese translations of Plato’s Republic against two authoritative English benchmark translations, addressing a critical gap in existing Platonic scholarship: the lack of objective comparative analysis of its diverse Chinese versions. Using a three-tiered computational stylistic framework comprising macro-stylistic positioning, micro-level syntactic and lexical features, and cognitive-linguistic profiling via LIWC categories, we analyzed a purpose-built parallel corpus of 935,861 tokens. The corpus comprises four influential Chinese translations (Gu Shouguan 2010, Guo Binhe 1986, He Xiangdi 2021, Xie Shanyuan 2016) and two English benchmarks (Allan Bloom 1968, C. D. C. Reeve 2004). Significant stylistic divergences emerged among the Chinese versions and in contrast to the English texts. Chinese translations consistently exhibit a pronounced tendency toward explicitation, marked by significantly higher frequencies of cognitive process words (13.7%–16.4% vs. 11.2%–12.7% in English versions, d = −1.75, p < .001) and elevated frequencies of logical connectives, particularly result and contrast markers. Notable variations in syntactic complexity were also found, ranging from Guo Binhe’s concise style (17.985 tokens per sentence) to Gu Shouguan’s elaborate style (27.419 tokens per sentence), with substantial variation among Chinese translations (η² = 0.18–0.48). These findings underscore that the Chinese Republic translations are not mere linguistic variants but unique intellectual projects shaped by deliberate interpretive strategies and cultural contexts, highlighting translation’s active role in philosophical re-creation and re-localization.

Plain Language Summary

One Book, Four Different Voices: Using Computer Analysis to Uncover the Stylistic Fingerprints of Chinese Translations of Plato’s Republic

Plato’s Republic is a cornerstone of Western philosophy, but how does it read in another language? In China, students and scholars face a “paradox of choice” with over ten different translations available. This study uses computer-based analysis to compare four influential Chinese translations by Gu Shouguan 2010, Guo Binhe 1986, He Xiangdi 2021, and Xie Shanyuan 2016, against two English benchmark versions. We analyzed nearly 936,000 words, examining sentence length, word types, and how translators handle key philosophical terms. The results reveal notable variations. Gu Shouguan’s version tends to use extremely long, complex sentences averaging over 27 words, aiming for a formal style. In contrast, Guo Binhe’s translation tends to use shorter sentences averaging about 18 words, making Plato’s ideas more accessible. The Chinese translations tend to make Plato’s logical arguments more explicit than the English versions, often adding connecting words like “therefore” and “because” to guide readers. They also tend to use more words expressing contrast and result, while English versions rely more on words signaling reasons. These stylistic choices appear to reflect the translators’ unique backgrounds, the historical eras spanning from 1986 to 2021, and their different goals. This research suggests that a single classic can become multiple, distinct works in translation, each offering a different window into Plato’s thought for Chinese readers.

Keywords

computational Stylistics Chinese translations stylometry

Introduction

Plato’s Republic is a foundational text of Western philosophy whose influence has permeated political theory, ethics, metaphysics, and epistemology for over two millennia. Translating such a work is not merely a linguistic exercise but a profound act of cross-cultural intellectual reconstruction, as a translator’s choices actively shape how new readers understand Plato’s complex philosophical system. In China, interest in the Republic (Lǐxiǎngguó [理想国] =Plato’s Republic)¹ is evidenced by at least 10 distinct translations. This abundance, however, has created a “paradox of choice” for students and scholars. However, a review of Chinese academic databases reveals a near-total absence of systematic, comparative scholarship on these versions, with evaluations largely confined to subjective online ratings. The prominence of certain editions, like Guo Binhe’s, may stem more from publisher prestige than from demonstrated stylistic merit.² Existing scholarship suffers from three limitations: analyses are predominantly qualitative and impressionistic; no objective framework exists for evaluating stylistic differences; and the relationship between translator choices and interpretive strategies remains underexplored. This gap is significant, as translation choices in philosophical texts profoundly impact how core concepts are understood across cultures. To address this gap, this study uses a computational stylistic framework to systematically compare four influential Chinese translations of the Republic (by Guo Binhe, Gu Shouguan, He Xiangdi, and Xie Shanyuan) against authoritative English benchmarks.³ The study asks two primary research questions: (1) How do the four Chinese translations differ in quantifiable stylistic features (syntactic complexity, logical structure, cognitive-linguistic tone) when compared against each other and English benchmarks? (2) What strategies do the translators employ for core philosophical concepts (e.g., “justice,” “soul,” “form”), and what does this reveal about their role as active interpreters rather than passive conduits? To answer these questions, this paper employs a multi-layered computational framework moving from macro-stylistic positioning to micro-level features and cognitive-linguistic profiling (Biber, 1988, p. 55). The framework quantitatively profiles the translations by measuring key metrics—sentence length, syntactic tree depth, logical connective frequencies, and LIWC (Linguistic Inquiry and Word Count) word usage—analyzes choices for core terms, and synthesizes these findings to construct distinct “stylistic portraits” for each translation, providing an objective framework for their evaluation.

Literature Review

To situate this study, we synthesize three interconnected domains: computational stylistics as a methodological foundation, its application to Platonic studies as a historical precedent, and its integration with translation studies as a theoretical framework. This review demonstrates how concepts and tools from these domains converge to enable a rigorous, multi-layered analysis of translation style.

The Foundations of Computational Stylistics

Computational stylistics, or stylometry, has evolved from authorship attribution into a comprehensive framework for quantitative textual analysis. Its foundation is the principle that style—“the manner of writing” or “an ensemble of formal features” (Herrmann et al., 2021)—can be measured through statistical analysis of linguistic features. While early applications focused on identifying authors of disputed texts (Stamatatos, 2009), contemporary stylometry now addresses questions of genre, period, register, and translation.

The field’s most robust findings center on function words (prepositions, articles, conjunctions) as stylistic markers. Research consistently shows that the frequencies of the most common function words capture unconscious authorial habits without requiring complex linguistic parsers (Daelemans, 2013; Grieve, 2007). These features are powerful because they operate below the level of conscious choice, making them resistant to manipulation. Methodologically, the field employs various classification approaches, including machine learning and distance-based measures like Burrows’s Delta, a foundational technique for comparing texts based on word frequencies (Burrows, 2002; Eder et al., 2016).

The evolution from authorship attribution to broader stylistic analysis reflects a shift in understanding what stylometric features reveal. For instance, most-frequent-word-based stylometry can reveal patterns beyond authorial identity, such as theme, audience, and stylistic patterning (Hołobut & Rybicki, 2020). This expansion is crucial for translation studies, where patterns may reflect translator choices, genre, or cultural contexts more than an authorial signal. Thus, tools from authorship attribution provide a robust foundation for analyzing translation style, even as the theoretical focus shifts from identifying authors to understanding stylistic variation.

The Tradition of Stylometry in Platonic Studies

The application of quantitative methods to the Platonic corpus is not new; Plato’s dialogs have been a laboratory for stylometric inquiry for over a century. The term “stylometry” was introduced by W. Lutoslawski in his late 19th-century work on the chronological periodization of Plato’s works, marking the field’s origins in classical philology. The advent of computational methods, documented by Brandwood (1990), was a pivotal shift, enabling large-scale analysis of syntax, rhythm, and other complex features of style (Brandwood, 1990, pp. 235–248). This computational turn quickly confirmed the long-suspected division of Plato’s works into early, middle, and late groups.

More recently, the focus has expanded beyond chronology to more nuanced investigations, such as using stylometry to identify different literary registers and “voices” within single dialogs. For instance, M. Johnson and Tarrant (2014) used computational methods to analyze Plato’s Symposium, demonstrating that the Diotima material exhibits a distinct “female voice” similar to other passages where Socrates adopts a female voice, and is grouped with myth-like episodes (M. Johnson & Tarrant, 2014). This approach allows for a deeper understanding of Plato’s literary artistry by revealing stylistic patterns that inform literary interpretation.

Furthermore, the field provides quantitative evidence in debates over the authenticity of specific dialogs. Koentges (2020), for example, used a massive corpus of Greek literature to argue that the Menexenus is a stylistic outlier and likely not of Platonic authorship (Koentges, 2020). This tradition confirms that applying computational methods to Plato’s works is a well-established and increasingly sophisticated scholarly endeavor.

This tradition is relevant for translation studies because it shows that computational methods can reveal systematic patterns in texts reflecting underlying structures like chronology, register, or authorial identity. Applied to translations, these methods can reveal patterns reflecting translator choices, interpretive strategies, and cultural contexts. The tools developed for analyzing Plato’s Greek texts thus provide a proven foundation for analyzing translated texts, even as the research questions shift from authenticity and chronology to style and interpretation.

Bridging Stylistics and Translation Studies

While stylometry has a rich history in classical philology, its integration with translation studies is a more recent, synergistic development. For decades, stylistic analysis was largely excluded from translation theory, partly due to a narrow understanding of stylistics as early structuralist linguistics focused on source-text features. Recent scholarship has bridged this gap through three developments: corpus stylistics, the concept of “translator’s style,” and cognitive approaches to translation.

Corpus stylistics marked a pivotal shift, providing quantifiable evidence for stylistic patterns beyond mere intuition (J. H. Johnson, 2008). This approach was instrumental in developing the concept of “translator’s style,” which Baker (2001) defines as a “thumb-print” of recurring linguistic habits that distinguish one translator from another (Baker, 2001). Saldanha (2011) refines this, specifying that translator style must be recognizable, distinctive, coherent, and motivated (Saldanha, 2011). Corpus methods operationalize this concept, demonstrating how computational tools can reveal systematic patterns in translation choices. For instance, Baker’s analysis of type-token ratios and sentence length provides evidence of consistent stylistic habits (Baker, 2001), while Saldanha’s work on italics and foreign words shows how specific features can function as stylistic markers (Saldanha, 2011).

A complementary “cognitive turn” has shifted focus to the mental processes underlying text production and reception. This led to “translational stylistics,” defined by Malmkjær (2003) as studying “why, given the source text, the translation has been shaped in such a way that it comes to mean what it does” (Malmkjær, 2003). Boase-Beier (2014, 2023) develops this from a cognitive stylistics perspective, viewing style as an expression of mind states and worldviews (Boase-Beier, 2014, pp. 71–110, 2023). Ghazala (2018) frames translation as a cognitive process where style reflects mind (Ghazala, 2018). This perspective operationalizes style by analyzing features like transitivity patterns, which reveal how translators conceptualize meaning (Ghazala, 2018). The framework thus recognizes the translator as a distinct stylistic voice whose choices are motivated by cognitive processes of interpretation.

Recent developments in corpus-based translation studies, such as Meng and Oakes’s (2021) volume, show the field’s evolution from generating numeric attributes to developing sophisticated analytical paradigms integrating lexical, discourse, and stylistic analysis (Meng & Oakes, 2021). Corpus translation studies have expanded from a marginal tactic to a powerful framework permeating literary, cognitive, neural, and socially oriented translation studies (Meng & Oakes, 2021). This reflects a growing recognition that translation style requires analytical schemes connecting quantitative patterns to theoretical frameworks.

Similarly, Holobut and Rybicki’s (2020) work on film dialog shows how stylometric methods can reveal stylistic patterns in non-literary genres (Hołobut & Rybicki, 2020). Applying most-frequent-word-based stylometry to film dialog, his team showed that quantitative methods can reveal patterns of theme, audience, and style beyond authorial identity. This expansion validates applying computational stylistics to translation analysis, as the same tools used in authorship attribution can reveal systematic differences in how translators approach different genres, themes, and source texts.

Beyond translation studies, computational stylistics has been applied to other philosophical contexts. Botz-Bornstein and Mostafa (2017) used stylometric methods to distinguish between analytic and continental philosophy texts (Botz-Bornstein & Mostafa, 2017). More recently, Becker and Culotta (2025) used network analysis and NLP to analyze reference networks across 2,245 philosophical texts, revealing intellectual lineages (Becker & Culotta, 2025). These studies illustrate the broader applicability of computational methods to philosophical texts, validating their use in analyzing translation style.

By synthesizing these three domains—computational stylistics (methodology), Platonic studies (precedent), and translation studies (theoretical framework)—this study develops an analytical approach for the systematic analysis of translation style. Computational tools provide the means for analysis, the Platonic tradition demonstrates their applicability to philosophical texts, and translation studies provides the framework for understanding how stylistic patterns reflect translator agency and interpretive choices (e.g., translator style, cognitive processing). This synthesis enables a rigorous, multi-layered analysis that connects quantitative patterns to theoretical frameworks, illuminating the relationship between translator choices and their motivations.

Methodology

Corpus Selection and Construction

The analysis uses a purpose-built bilingual corpus pairing two benchmark English translations of Plato’s Republic with four Chinese editions. We include Allan Bloom’s (1968) and C. D. C. Reeve’s (2004) translations as a robust English reference axis (Plato, 1991, 2004). The Chinese set spans four decades, capturing translators with distinct affiliations and offering a diachronic view of the Republic’s reception in China (Lǐxiǎngguó, 理想国) (Weng, 2015a, 2015b). Table 1 summarizes the corpus composition.

Table 1.

Composition of the Republic Translation Corpus.

No.	Title	Language	Translator(s)	First published	Publisher	Tokens
1	The Republic of Plato	EN	Allan Bloom	1968	Basic Books	152,985
2	Republic	EN	C. D. C. Reeve	2004	Hackett Publishing	146,650
3	Lǐxiǎngguó (理想国)	ZH	Gu Shouguan (顾寿观)	2010	Yuelu Publishing House	196,533
4	Lǐxiǎngguó (理想国)	ZH	Guo Binhe & Zhang Zhuming (郭斌和/张竹明)	1986	The Commercial Press	134,721
5	Lǐxiǎngguó (理想国)	ZH	He Xiangdi (何祥迪)	2021	Yunnan People’s Publishing House	138,331
6	Lǐxiǎngguó (理想国)	ZH	Xie Shanyuan (谢善元)	2016	Shanghai Translation Publishing House	166,641

The six texts total 935,861 tokens. Including the widely cited 1986 Commercial Press edition and the recent He (2021) revision allows for a comparison between canonical phrasing and contemporary updates. The dual English benchmarks capture differences between a philosophically annotated rendering and a precise modern translation.

Source texts were digitized from print using ABBYY FineReader PDF 16. The raw OCR output underwent a double-pass manual audit: two annotators reviewed 1% to 2% samples, removed paratextual matter (e.g., prefaces, footnotes), and confirmed only the 10 main books remained, yielding 100% coverage. Pre-cleaning, the character error rate (CER) was 0.0758 and word error rate (WER) was 0.0508. Systematic artifacts were corrected using a regex rulebook. A post-cleaning spot-check found zero residual artifacts, and no narrative text was lost.

To harmonize the dataset, each file was saved in UTF-8, standardized with consistent headers, and segmented by book divisions (with supplementary 3,000-token windows where needed). Chinese texts were segmented and parts-of-speech (POS) tagged using Jieba (paddle mode), while English texts were tokenised and tagged with Natural Language Toolkit (NLTK). These decisions follow established digital humanities practices (Baker, 2001; Ball et al., 2024; Meng & Oakes, 2021). The resulting corpus provides the token-level comparability and provenance required to assess stylistic fingerprints across translations.

Analytic Framework and Steps

This study adopts an integrated, multi-layered analytical framework for a comprehensive stylistic comparison. As shown in Figure 1, this framework provides a holistic view of style by moving from broad generic positioning to fine-grained syntactic features and finally to the underlying cognitive-linguistic tone.

Figure 1.

The three-tiered stylistic analysis framework.

Grounded in multi-dimensional analysis, which requires tracking co-occurring linguistic features (Biber, 1988), the three-tier framework moves from macro-style to micro-style and cognitive-style analysis. The first tier sketches the grammatical footprint, the second corroborates with sentence-level structure, and the third examines the cognitive-linguistic texture via psycholinguistic categories (Pennebaker et al., 2015).

Operationalizing the framework required a coordinated toolchain rather than a collection of isolated scripts. All processing ran inside a unified Python 3.7.12 environment so that segmentation, parsing, and statistical routines could share versions and seeds. Jieba 0.42.1 (paddle mode) handled Chinese segmentation and POS tagging, while NLTK 3.8.1 served the English pipeline. Stanza 1.6.1 provided constituency parses for both languages, and custom utilities replicated AntConc-style concordancing for the terminology and connective studies. Cognitive-linguistic profiles drew on the official English and Simplified Chinese LIWC-2015 dictionaries via the LIWC Python package (0.5.0), with NumPy 1.21.6, SciPy 1.7.3, and Scikit-learn 1.0.2 supporting statistical modeling.

Tool validation preceded large-scale processing. Published benchmarks show high accuracy for Stanza and Jieba on relevant corpora (PaddleNLP Contributors, 2021; Qi et al., 2020). Building on this, we created 100-sentence gold samples from our corpus. On this material, Jieba’s F1 score was 0.9956. Stanza achieved Chinese POS accuracy of 0.9314 (UAS/LAS 0.9007/0.8938), and the English pipeline recorded POS accuracy of 0.9978 (UAS/LAS 0.9504/0.9489). Sentence-boundary checks yielded an F1 of 1.0000 for both languages. Residual errors, clustering around philosophical terms and names, were mitigated by a user dictionary and manual review.

The analysis proceeded in three passes mirroring the framework. First, for macro-style analysis, POS-annotated files were summarized to generate grammatical footprints. Second, for micro-style analysis, we computed several metrics: the terminology study retrieved concordance lines for nine core Platonic concepts; we used sentence segmentation and Stanza trees to calculate syntactic complexity (e.g., average sentence length, tree depth); and the logical connective inventory was mapped to Penn Discourse Treebank (PDTB) 3.0 semantic classes. Third, for cognitive-style analysis, segmented corpora were processed with LIWC dictionaries to quantify the prevalence of relevant word categories. All outputs were then consolidated.

Measurement

With the workflow in place, this section clarifies the quantification of each tier. The selected metrics are widely used in stylistic and linguistic research, providing an interpretable basis for comparison.

Across all tiers, features are normalized for comparability. Frequencies are reported per 10,000 tokens, and long works are sliced into blocks. Rare items are additively smoothed or omitted. Family-wise error is controlled through Holm–Bonferroni correction, and significance tests are paired with effect sizes (Cohen’s d; η²). All stochastic components use fixed seeds to ensure reproducibility.

To situate each translation stylistically, this study compares their grammatical profiles against established genres in the Lancaster Corpus of Mandarin Chinese (LCMC) (A. McEnery & Xiao, 2004). As a balanced corpus of texts (e.g., news, fiction, academic), LCMC provides a representative baseline for contemporary Mandarin. The macro-genre position of each translation is determined by quantifying its stylistic similarity to the LCMC genres.

Using POS frequencies as a primary indicator of style is a foundational technique in quantitative stylistics. The principle, established by Biber (1988), is that different genres have distinct and stable distributions of grammatical features (Biber, 1988). For instance, a text heavy in verbs and pronouns may be more narrative, while one dense with nouns and adjectives may be more descriptive. The POS frequency vector thus serves as a “fingerprint” of a text’s fundamental grammatical style.

Long works are split into contiguous blocks of 3,000 tokens. For a POS inventory of size p, let $c_{i}$ be the count of tag i and $T = \sum_{i = 1}^{p} c_{i}$ . We compute L1-normalized, additively smoothed frequencies with $α = 10^{- 3}$ :

x_{i} = \frac{c_{i} + α}{T + α p}, x = {(x_{1}, \dots, x_{p})}^{T}

For genre g, let $μ_{g}$ be the mean POS vector and let the shrinkage covariance be $Σ_{g, λ} = (1 - λ) S_{g} + λ diag (S_{g})$ , with λ estimated from data using the Ledoit–Wolf method (Ledoit & Wolf, 2004). The block–genre distance is calculated using the shrinkage Mahalanobis metric:

D_{g}^{(b)} = (x^{(b)} - μ_{g})^{T} Σ_{g, λ}^{- 1} (x^{(b)} - μ_{g})

Classification is by majority vote across blocks, with ties broken by the smaller mean distance. To report uncertainty, we compute robust z-scores and empirical p-values using the median and median absolute deviation (MAD) as robust estimators. This approach accounts for variance and correlation among POS features more effectively than the standard Euclidean distance (Eder et al., 2016).

A translator’s most critical decisions often concern core philosophical terms. This is not a neutral act but an interpretive choice with philosophical weight. This study analyzes nine foundational concepts from the Republic: justice, just, form, idea, soul, good, city/state, virtue/excellence, and necessity/necessary. These terms were selected for their centrality and the scholarly debate surrounding their translation. For instance, the Greek term δικαιοσύνη (“justice”) has a broader meaning than its modern equivalents, while εἶδος (“form”) and ἰδέα (“idea”) are difficult to render without misrepresenting Plato’s metaphysics (Plato, 2004; Sir Ross, 1976). Analyzing how translators navigate these controversies provides insight into their interpretive strategies.

To quantify these strategies, we measure the consistency and prominence of the Chinese renderings for each term. To assess consistency, we count the Number of Variants Used for each concept, which informs a qualitative Translation Strategy label (consistent or variable). To measure prominence, we calculate the Term Frequency, which is the combined frequency of all Chinese variants for a term, normalized per 10,000 words. The formula is:

F_{term} = \frac{C_{variant}}{N_{total}} \times 10, 000

In this formula, $F_{term}$ is the frequency score, $C_{variant}$ is the total count of all Chinese variants, and $N_{total}$ is the total word count. To ensure reliability, a subset of terms was double-annotated. These metrics provide a detailed, quantitative profile of each translator’s approach to the philosophically significant vocabulary in the Republic.

A text’s syntactic structure is a crucial dimension of authorial and translator style. Features like sentence length and grammatical complexity can reflect a translator’s approach—whether prioritizing readability with simpler sentences or replicating the source text’s elaborate structures. To investigate this, we measure syntactic complexity using two complementary metrics: Average Sentence Length (ASL) and Average Syntactic Tree Depth. These indicators, common in stylistics and computational linguistics, offer a quantitative view of the texts’ structural characteristics (Lu, 2010).

The first metric, ASL gives a surface-level overview of textual complexity. It is a robust indicator calculated by dividing the total number of words ( $N_{words}$ ) by the total number of sentences ( $N_{sentences}$ ), where sentences were delimited using standard terminal punctuation. The formula is as follows:

ASL = \frac{N_{words}}{N_{sentences}}

While useful, ASL cannot distinguish between long, simple sentences and shorter, complex ones. To capture this deeper complexity, we also calculate the Average Syntactic Tree Depth. This metric measures the hierarchical structure within a sentence, moving beyond simple length. Complex sentences are often built through embedding and subordination, creating nested structures (Yngve, 1960). Cognitively, greater syntactic depth increases processing load by requiring readers to hold more incomplete grammatical structures in working memory (Frazier, 1985).

This analysis was operationalized using the Stanza constituency parser, which models sentence grammar as a hierarchical tree of phrasal constituents. The tree’s depth, corresponding to the maximum level of grammatical nesting, is calculated recursively. The final metric, Average Syntactic Tree Depth ( $D_{avg}$ ), is the average of these tree depth values ( $D_{tree} (T_{i})$ ) across all sentences ( $N_{sentences}$ ), providing a robust measure of overall structural complexity. The formula is:

D_{avg} = \frac{1}{N_{sentences}} \sum_{i = 1}^{N_{sentences}} D_{tree} (T_{i})

The Republic’s philosophical weight rests on its intricate argumentative structure. The linguistic markers signaling relationships between propositions—logical or cohesive connectives—are crucial to its rhetorical force, guiding the reader’s reasoning (Halliday & Hasan, 2014). A translator’s use of these connectives is a significant stylistic choice. Explicitly rendering logical links can increase transparency, while a subtler approach may better preserve the original’s dialogic nature. To investigate these patterns, we analyzed the frequency of logical connectives, classified into 11 functional categories from PDTB 3.0: Reason, Result, Contrast, Condition, Purpose, Concession, Alternative, Synchrony, Subsequent, Until, and Free-choice. The high-frequency Addition category (e.g., “and”/“和”) was excluded to avoid distorting cross-linguistic comparisons.

To quantify usage patterns, we calculated the normalized frequency for each connective, ensuring fair comparison across texts. This involved counting the occurrences of each connective ( $C_{conn}$ ) and normalizing this count per 10,000 words ( $N_{total}$ ). The Logical Connective Frequency ( $F_{conn}$ ) is expressed by the formula:

F_{conn} = \frac{C_{conn}}{N_{total}} \times 10, 000

To probe the cognitive and psychological dimensions of the translations, this study uses a dictionary-based text analysis method grounded in the LIWC framework. This approach is built on the principle that the frequency of words in psychologically meaningful categories can provide an objective, quantifiable profile of a text’s character. The analysis measures the aggregate prevalence of different word types, revealing patterns in thinking style, emotional tone, and social focus (Tausczik & Pennebaker, 2010). This method is well-suited for stylistic analysis as it captures the subtle, aggregate effects of lexical choices.

This analysis was operationalized using a word-counting algorithm that matches text against the official English and Simplified Chinese LIWC2015 dictionaries. The process involves segmenting the text, converting tokens to lowercase, and matching them against LIWC categories, handling word stems to group related words. The raw count for each category ( $C_{cat}$ ) is then normalized as a percentage of the total word count ( $N_{total}$ ) for comparability. The formula for any given category ( $P_{cat}$ ) is:

P_{cat} = \frac{C_{cat}}{N_{total}} \times 100

This study focuses on seven LIWC dimensions relevant to philosophical dialog. Function Words are a powerful, topic-independent marker of style, as their usage is largely unconscious (Pennebaker, 2013). Cognitive Processes measures words related to thinking and reasoning, indicating how explicitly a translator signals the text’s logical nature. Prepositions capture structural and spatial-temporal relationships. Positive Emotion, Negative Emotion, and the combined Affective Processes measure emotional content. Social Processes tracks references to social interactions. These metrics allow us to test the assumption of purely rational discourse and determine if translators infuse the text with particular emotional or social valences.

For cross-language comparisons, we report both original and adjusted LIWC scores, excluding non-comparable categories. We identified 71 comparable categories through semantic equivalence assessment. Non-comparable categories, such as language-specific grammatical markers, are excluded from statistical tests to ensure that comparisons reflect genuine semantic differences, not structural ones.

Statistical comparisons between translations employ Welch’s t-tests for pairwise comparisons or Welch’s ANOVA for multi-group comparisons when variances are unequal, which is appropriate when the assumption of homogeneity of variance is violated. When variances are equal, standard t-tests or one-way ANOVA are used. Prior to statistical testing, assumption checks are performed: normality is assessed using Shapiro–Wilk tests, and homogeneity of variance is assessed using Levene’s test. When data violate normality assumptions, appropriate transformations (e.g., Box–Cox or logarithmic) are applied before testing, or non-parametric alternatives are considered. When testing multiple features within a family (e.g., multiple LIWC categories or multiple syntactic metrics), Holm–Bonferroni correction is applied to control the family-wise error rate. Effect sizes are reported using Cohen’s d for pairwise comparisons (with |d| ≥ 0.2, ≥0.5, and ≥0.8 indicating small, medium, and large effects, respectively) and η² for ANOVA models (with η² ≥ 0.01, ≥0.06, and ≥0.14 indicating small, medium, and large effects, respectively). All statistical tests use fixed random seeds (random.seed(42), numpy.random.seed(42), and random_state = 42 for Scikit-learn functions) to ensure reproducibility, and assumption checks are documented for each analysis.

Additional information regarding our methodology, datasets, and supplementary results is available in the Supplementary Material associated with this article.

Results

This chapter presents the empirical findings in the order of the analytical tiers: macro-stylistic positioning, micro-level features, and the cognitive-linguistic profile, directly following the methodology described.

Macro-Stylistic Positioning

To situate the four Chinese translations within modern Chinese writing, their stylistic profiles were compared against 15 LCMC genres using the shrinkage Mahalanobis distance metric from Section 3.3 (Eder et al., 2016; Ledoit & Wolf, 2004; Meng, 2013). Classification uses majority votes across 3,000-token blocks, with uncertainty quantified by robust z-scores and empirical p-values.

The distance matrix in Table 2 reveals divergent genre alignments. Guo Binhe’s translation (ZH Guo) shows the closest statistical match to Science Fiction (LCMC M), with a mean distance of 0.0482 (z = 1.53, p = .127), an alignment that cannot be rejected at conventional significance levels. In contrast, the other three translations (Gu Shouguan, He Xiangdi, and Xie Shanyuan) align most closely with Romantic Fiction (LCMC P). However, these alignments are not statistically significant (Gu: z = 5.86, p < .001; He: z = 7.02, p < .001; Xie: z = 4.74, p < .001). These results indicate that while these translations are nearest to Romantic Fiction, they remain stylistically distinct from all established LCMC genres.

Table 2.

Text-LCMC Genre Distance Matrix.

Code	Genre	ZH Gu	ZH Guo	ZH He	ZH Xie
A	Press: reportage	0.235	0.135	0.236	0.187
B	Press: editorials	0.216	0.115	0.217	0.169
C	Press: reviews	0.257	0.140	0.263	0.205
D	Religion	0.203	0.113	0.210	0.162
E	Skills, trades and hobbies	0.245	0.134	0.236	0.190
F	Popular lore	0.167	0.083	0.169	0.126
G	Biographies and essays	0.151	0.077	0.159	0.121
H	Miscellaneous: reports and official documents	0.469	0.313	0.447	0.389
J	Science: academic prose	0.296	0.168	0.302	0.238
K	General fiction	0.089	0.048	0.099	0.071
L	Mystery and detective fiction	0.141	0.073	0.145	0.111
M	Science fiction	0.094	0.048	0.112	0.077
N	Adventure and martial arts fiction	0.140	0.097	0.135	0.120
P	Romantic fiction	0.077	0.052	0.088	0.065
R	Humor	0.108	0.092	0.094	0.086

The principal component analysis (PCA) visualization in Figure 2 provides a complementary geometric perspective on these relationships, mapping the translations and reference genres into a two-dimensional space that captures the primary dimensions of stylistic variation.

Figure 2.

PCA diagram of the Chinese translations.

The visualization shows the four Chinese translations clustering in a distinct region, confirming their unique stylistic space as philosophical translations. They are positioned closer to fictional genres than to informational or academic ones, aligning with the distance matrix. This positioning reflects the Republic’s dialogic and narrative nature, which blends argumentation with dialog, creating a grammatical profile closer to imaginative prose than expository writing.

Statistical comparisons confirm significant differences: Chinese translations are positioned closer to reference genres than English translations (t = −2.84, p = .006, d = −0.63), and significant variation exists among Chinese translations (F = 2.88, p = .044, η² = 0.13).

These findings should be interpreted with caution. A translation’s “closest” genre does not mean it belongs to that genre. It only indicates that its grammatical structure most closely resembles that genre’s typical patterns. The proximity to Romantic and Science Fiction suggests that philosophical dialogs in Chinese translation share grammatical traits with imaginative narrative genres, likely due to their reliance on dialog and hypothetical scenarios. This distinguishes their style from the immediacy of press reportage or the formality of academic prose, confirming that the translators have captured the Republic’s unique nature as a speculative, argumentative dialog.

Terminology Strategy

This study analyzes translation strategies for nine core philosophical terms from the Republic: justice, just, form, idea, soul, good, city/state, virtue/excellence, and necessity/necessary (Plato, 2004; Sir Ross, 1976). These terms are challenging due to polysemy and the lack of direct equivalents between Greek and Chinese. To ensure reliable identification, disambiguation rules were applied to distinguish part-of-speech variants (e.g., “just” vs. “justice”).

The disambiguation protocol addressed several key ambiguities. For Chinese terms 正义/公正/正当 (just/justice), classification relied on syntactic context and POS tags. When followed by coordinating conjunctions (e.g., 和, 与), classification was based on POS tags. When followed by the particles 的 or 之, the classification depended on the head noun: if followed by abstract nouns (e.g., 定义, 意义, 概念) or location words (e.g., 上, 下, 中), the term was classified as a noun (this rule requires the presence of 的/之); if followed by entity nouns (e.g., 人, 事, 者), it was classified as an adjective regardless of whether 的/之 is present. When the following word was not in these filter lists, classification relied on the base word’s POS tag. The same rules applied to 善/好/好的/善的 (good): when followed by coordinating conjunctions, classification relied on POS tags; when followed by abstract nouns or location words after 的/之, they were classified as nouns (善/好); when followed by entity nouns (with or without 的/之), they were classified as adjectives (好的/善的); when not in filter lists, POS tags determined classification. Adverbial uses of 好 (tagged as adverbs, POS = d) were excluded unless in special token lists (e.g., 最好, 至好, 很好). The term 相 (idea) was distinguished from its adverbial uses (e.g., 相和谐, 相反) by excluding instances tagged as adverbs or adjectives. For English, “just” was identified as an adjective when preceded by determiners and followed by common nouns within two tokens, with adverb uses (RB) excluded; otherwise “justice” was counted as a noun. The term “good” required filtering to exclude plural nouns (goods) and economic usages (public goods), while retaining abstract noun usages (the good, the Good).

To validate this protocol’s reliability, a stratified 2% sample was double-annotated by two independent annotators. Inter-annotator agreement (Cohen’s κ) reached 0.8904 for Chinese and 0.8017 for English, both exceeding the 0.70 threshold for substantial agreement. Disagreements were resolved through third-round adjudication. This rigorous process ensures the reproducibility and consistency of term identification.

The analysis reveals that translators are active interpreters making significant and divergent lexical choices. The data shows a universal tendency toward variability, with no core concept translated with perfect one-to-one consistency. Table 3 provides a comprehensive overview of the translation variants for each core term across the four Chinese translations.

Table 3.

Statistics of Translation Variants of Core Terms.

Text	Term	Variant	Count	Frequency (per 10k)
ZH Gu	Justice	正义	213	13.422
		公道	1	0.063
	Idea	形相	27	1.701
		相	3	0.189
		面貌	1	0.063
		理念	1	0.063
	Just	正义的	123	7.750
		公正的	6	0.378
		正当的	12	0.756
	Form	形式	59	3.718
		种类	13	0.819
		型式	1	0.063
	Soul	灵魂	322	20.290
		心灵	16	1.008
	Good	好的	199	12.539
		好	74	4.663
		善的	15	0.945
		善	48	3.025
	City/state	城邦	663	41.777
		城市	2	0.126
	Virtue/excellence	品德	104	6.553
		德性	1	0.063
	Necessity/necessary	必然	101	6.364
		必需	11	0.693
		必要	21	1.323
		必然性	2	0.126
ZH Guo	Justice	正义	254	22.578
		公正	3	0.267
		公道	3	0.267
	Idea	理念	26	2.311
		相	2	0.178
	Just	正义的	144	12.800
		正当的	6	0.533
		公正的	9	0.800
	Form	形式	25	2.222
		种类	5	0.444
		型	1	0.089
	Soul	心灵	171	15.200
		灵魂	157	13.956
		魂	1	0.089
	Good	好的	221	19.645
		好	123	10.933
		善的	79	7.022
		善	80	7.111
	City/state	国家	311	27.645
		城邦	225	20.000
		城市	3	0.267
		国	6	0.533
	Virtue/excellence	美德	51	4.533
		德性	19	1.689
		品德	1	0.089
	Necessity/necessary	必然	55	4.889
		必要	29	2.578
		必需	5	0.444
		必然性	3	0.267
ZH He	Justice	正义	208	18.792
	Just	正义的	169	15.268
		公正的	1	0.090
		正当的	5	0.452
	Form	种类	28	2.530
		型	15	1.355
		形式	2	0.181
		外型	1	0.090
	Idea	相	18	1.626
		相貌	4	0.361
		形相	2	0.181
	Soul	灵魂	311	28.097
		心灵	2	0.181
	Good	好的	184	16.623
		善	89	8.041
		善的	75	6.776
		好	37	3.343
	City/state	城邦	537	48.515
	Virtue/excellence	德性	94	8.492
	Necessity/necessary	必然	126	11.383
		必然性	14	1.265
		必要	22	1.988
		必需	3	0.271
ZH Xie	Justice	公道	243	17.918
		公正	28	2.065
		正义	8	0.590
	Just	公正的	134	9.881
		正当的	45	3.318
		正义的	1	0.074
	Form	型	25	1.843
		形式	18	1.327
		种类	17	1.254
	Idea	相	26	1.917
		理念	1	0.074
		面貌	2	0.147
	Soul	灵魂	334	24.629
		心灵	24	1.770
	Good	好的	193	14.231
		好	60	4.424
		善	56	4.129
		善的	30	2.212
	City/state	城邦	320	23.596
		国家	223	16.444
		城市	4	0.295
		邦国	2	0.147
		国	2	0.147
	Virtue/excellence	美德	89	6.563
	Necessity/necessary	必然	51	3.761
		必要	13	0.959
		必需	7	0.516
		必然性	4	0.295

The cross-translation summary reveals distinct patterns of variability. The concepts of good and necessity/necessary show the highest average number of variants (4.0 each), indicating translators use multiple terms for different nuances. Justice and its adjectival form just show moderate variability (averaging 2.25 and 3.0 variants), reflecting the complexity of the Greek term δικαιοσύνη. Conversely, virtue/excellence shows the lowest variability (averaging 1.75 variants), suggesting greater consensus, though still without perfect consistency.

The most striking divergence is in the translation of justice (δικαιοσύνη). Gu Shouguan and He Xiangdi consistently use 正义 (zhèngyì, “righteousness/justice”); Guo Binhe primarily uses 正义 but also 公正 (gōngzhèng, “fairness”) and 公道 (gōngdào, “impartiality”); Xie Shanyuan reverses this, using 公道 as the primary term. This divergence reflects different philosophical interpretations: 正义 emphasizes moral righteousness, 公正 foregrounds fairness, and 公道 carries connotations of public reason. For example, Gu’s “正义的人” (the just person) becomes Xie’s “公道的人” (the fair person), shifting the emphasis from moral righteousness to equitable balance.

Beyond consistency, the frequency with which these core terms appear reveals distinct differences in emphasis among the translators, as shown in the heatmap in Figure 3. Across all four Chinese versions, terms related to good and city/state are unsurprisingly the most prominent, confirming their centrality to the dialog.

Figure 3.

Frequency of core terms in each translation.

However, there are significant differences in their usage frequencies. He Xiangdi’s translation, for instance, uses city/state (城邦) with the highest frequency (48.515 per 10,000 words), while Guo Binhe’s version shows a more balanced distribution between 国家 (guójiā, “nation/state,” 27.645) and 城邦 (chéngbāng, “city-state,” 20.000), reflecting different interpretive approaches to Plato’s political terminology. Similarly, the concept of good shows substantial variation: Guo Binhe’s translation uses 好的 (hǎo de, “good”) most frequently (19.645 per 10,000 words), while Gu Shouguan employs a more varied distribution across 好的, 好, 善的, and 善, suggesting a more nuanced approach to rendering the Greek agathon. These frequency differences demonstrate that translators not only vary their terminology for core concepts but also differ in the overall lexical emphasis they place on particular philosophical ideas, choices which inevitably shape the thematic texture and interpretive focus of the final translated work. Because each corpus provides a single aggregate observation per term, the terminology statistics are reported descriptively; assumption checks confirm that standard inferential models would be underpowered in this setting.

Syntactic Complexity

Syntactic complexity analysis reveals systematic differences in structural choices among the Chinese translations, reflecting distinct stylistic philosophies. The resulting metrics focus on comparative patterns rather than methodological detail, following the pipeline from Section 3.3. Table 4 presents five complementary metrics of syntactic complexity. These metrics capture different dimensions of structural complexity.

Table 4.

Syntactic Complexity Metrics of Each Translation.

Text	ASL	Tree depth	MLC	C/S	MATTR	Sentences	Clauses
EN Bloom	23.403	10.346	8.064	2.962	0.383	6,537	19,360
EN Reeve	21.949	10.976	7.661	2.697	0.402	6,682	18,024
ZH Gu	27.419	12.069	6.613	5.301	0.407	7,171	38,011
ZH Guo	17.985	10.369	5.221	3.192	0.45	7,492	23,912
ZH He	20.567	10.941	5.945	3.892	0.424	6,730	26,192
ZH Xie	22.179	11.195	6.526	3.85	0.444	7,517	28,937

Note. ASL = Average Sentence Length (tokens per sentence); Tree Depth = mean syntactic tree depth across sentences; MLC = Mean Length of Clause (tokens per clause); C/S = Clauses per Sentence; MATTR = Moving-Average Type-Token Ratio (window size 500).

A striking finding is the dramatic divergence in sentence length. Gu Shouguan’s translation (ZH Gu) has the highest ASL (27.419), exceeding both English benchmarks and favoring elaborate structures. In contrast, Guo Binhe’s (ZH Guo) has the lowest ASL (17.985), indicating a preference for concise prose. The other two translations, He Xiangdi (20.567) and Xie Shanyuan (22.179), occupy intermediate positions.

Average Syntactic Tree Depth, a more nuanced measure of grammatical complexity, corroborates the ASL findings. Gu Shouguan’s translation has the deepest syntactic structures (12.069), confirming its sentences are structurally complex. Guo Binhe’s has the shallowest tree depth among Chinese translations (10.369), aligning with its concise style. He Xiangdi (10.941) and Xie Shanyuan (11.195) show moderate depth. The English translations fall within a similar range (10.346–10.976), suggesting Chinese translations generally have greater syntactic complexity.

Clause-level metrics reveal additional patterns. The C/S ratio is highest in Gu Shouguan’s translation (5.301), indicating extensive subordination, while Guo Binhe’s is the lowest among Chinese versions (3.192), though still exceeding both English benchmarks. This suggests Chinese translations use more complex clause structures. Conversely, the MLC is higher in English translations than in Chinese ones. This suggests that while Chinese translations have more clauses per sentence, the individual clauses are shorter, reflecting different strategies for information density.

The MATTR metric, measuring lexical diversity, is highest in Guo Binhe’s translation (0.45), indicating greater lexical variety. Gu Shouguan’s version shows lower diversity (0.407). Both values exceed the English benchmarks (0.383 and 0.402), suggesting Chinese translations use a wider lexical range, perhaps reflecting the challenges of rendering Greek philosophical concepts.

Together, these metrics show a clear stylistic divergence. Gu Shouguan’s translation prioritizes structural complexity, creating a dense, formal style. Guo Binhe’s version favors accessibility with shorter sentences, simpler structures, and greater lexical variety. The other two translations occupy intermediate positions. Statistical tests confirm significant differences: Chinese translations show greater syntactic complexity than English (tree depth, C/S, MATTR) but shorter clause length (MLC), with substantial variation among Chinese translations across all metrics and among English translations in most.

Logical Connective Usage Patterns

Following the protocol from Section 3.3, the analysis groups tokens into 11 PDTB 3.0 functional categories, excluding Addition markers. Frequencies are reported per 10,000 tokens for comparability. Coverage of variants combines corpus-driven harvesting with lemma-anchored supplementation, while POS filters guard against non-connective homographs.

Semantically polyvalent markers were routed to their roles via the rule set from Section 3.3. For example, temporal versus causal readings of “since” were distinguished by temporal cues, and contrastive versus concessive uses of “but/however” depended on counter-expectation. Chinese causal-resultive pairs and temporal connectors were also separated based on contextual triggers. The results of this analysis are presented in Table 5.

Table 5.

Frequency of Logical Connectives Classified by Function (per 10k).

Category	EN Bloom	EN Reeve	ZH Gu	ZH Guo	ZH He	ZH Xie
Reason	85.825	89.124	50.532	36.388	58.178	72.022
Result	10.655	5.592	20.835	26.363	19.54	14.992
Contrast	71.903	72.349	115.316	64.31	87.119	68.35
Condition	46.148	51.415	46.075	43.294	58.031	41.855
Purpose	6.341	5.114	7.93	7.649	7.125	6.425
Concession	9.086	6.069	4.198	16.709	10.137	23.62
Alternative	59.941	72.008	38.404	37.799	35.626	35.001
Synchrony	29.742	32.936	7.982	9.58	16.968	21.172
Subsequent	0.392	0.818	0.881	3.713	2.424	5.262
Until	1.111	2.864	1.14	1.337	1.91	1.224
Free-choice	2.157	2.659	0.674	2.748	2.204	1.775

Note. Frequencies are reported per 10,000 tokens (simple proportional normalization); functional labels follow the 11 PDTB 3.0 categories retained for this study; coordination/addition markers are excluded from the inventory.

English translations have higher overall connective densities (323.30 and 340.95 per 10k), driven by a strong reliance on Reason and Alternative markers. Chinese translations redistribute this toward Contrast and Result. Gu Shouguan’s version foregrounds adversative framing (115.316 per 10k), while Guo Binhe and He Xiangdi compensate for sparser Reason markers by amplifying Result and Concession signals to guide readers.

The percentage profiles in Figure 4 clarify these strategic emphases. Gu Shouguan’s translation devotes almost two-fifths of its logical connectives to Contrast, reinforcing the agonistic dialog.

Figure 4.

Percentage stacked area chart of logical connectives.

By contrast, Xie Shanyuan elevates Reason and Concession, a pattern consistent with his preference for explicit signaling. Guo Binhe’s translation is the most balanced, pairing low Contrast usage with the highest Result frequency, yielding a more linear argumentative rhythm.

Temporal coordination also differs. English versions use Synchrony markers (e.g., “when/whenever”) at nearly three times the rate of Gu Shouguan. In contrast, Chinese translators—especially Xie Shanyuan—use more Subsequent markers to choreograph narrative progression. These choices show how translators recalibrate the connective system to maintain cohesion and align with target-language expectations. English foregrounds logical scaffolding, while Chinese versions rely more on contrastive and resultative cues.

The observed density gap echoes documented typological differences and the tendency toward implicitation in Chinese renderings of hypotactic source texts (Baker, 2001; Huang, 2022; T. McEnery & Xiao, 2010). By modulating connective use, translators tailor argumentative flow to target-language conventions while preserving philosophical nuance.

Statistical tests confirm significant cross-linguistic differences with large effect sizes. English translations use more Reason, Alternative, and Synchrony markers, while Chinese versions favor Result, Contrast, Concession, and Subsequent markers. Substantial variation also exists among the Chinese translations (e.g., Contrast: η² = 0.42; Concession: η² = 0.39) and among the English translations (e.g., Result: d = 0.70).

Cognitive and Linguistic Profile

The cognitive-profile analysis follows the LIWC-based workflow from Section 3.3, using official English and Simplified Chinese LIWC2015 dictionaries. Category labels were harmonized to retain only semantically comparable dimensions, excluding language-specific categories. To avoid earlier anomalies, all reported values use adjusted scores with smoothing for sparse counts. Scores are reported as percentages of total tokens, following LIWC convention, which enables direct comparison while acknowledging the risk of culturally specific nuances. Table 6 compares the key cognitive style dimensions.

Table 6.

Comparison of Cognitive Style Dimensions Across Translations (% of tokens).

Text	Function	Cogproc	Prep	Posemo	Negemo	Affect	Social
EN Bloom	47.271	11.245	11.602	3.283	1.371	4.714	10.873
EN Reeve	48.624	12.719	11.398	3.426	1.489	4.981	9.808
ZH Gu	48.049	13.746	4.422	2.214	2.147	5.097	9.909
ZH Guo	45.194	15.754	4.056	2.901	2.884	6.763	9.605
ZH He	43.985	16.434	4.234	2.797	2.780	6.160	11.593
ZH Xie	45.241	16.033	4.562	2.892	2.484	6.208	11.261

Note. Percentages reflect LIWC category coverage relative to total tokens; only categories deemed fully comparable across languages are reported.

Several cross-linguistic contrasts emerge. Function-word usage is broadly similar (45%–48%), indicating comparable reliance on grammatical scaffolding. However, Chinese versions show markedly higher Cognitive Processes coverage (13.7%–16.4% vs. 11.2%–12.7%), suggesting a stronger tendency to foreground reasoning and epistemic vocabulary, especially in He Xiangdi’s translation (16.434%). Conversely, English translations use substantially more prepositions (≈11.5% vs. 4%–4.6%), a predictable reflection of English hypotaxis.

Emotional tone also diverges. Chinese translations use more negative emotion words (2.15%–2.88% vs. 1.37%–1.49%) and fewer positive emotion words (≈2.2%–2.9% vs. ≈3.3%–3.4%) than the English versions. The consolidated affect category is therefore higher in Chinese (5.10%–6.76%) than in English (4.71–4.98%), implying a broader emotional palette. Social-process terms show a split: Bloom’s translation has the highest share among English versions (10.873%), but the Chinese translations by He and Xie peak higher (11.593% and 11.261%), reflecting their emphasis on interpersonal and civic vocabulary.

These tendencies underscore that cognitive and affective framing is not uniform. English versions focus more on structural cohesion (prepositions) and have a brighter affective tone. Chinese translations amplify cognitive and affective cues, potentially compensating for structural differences by signaling reasoning and emotional stakes more overtly. This aligns with the observation that Chinese renditions favor explicitation of argumentative roles, while English versions retain more of the implicit balance of the Greek. Statistical tests confirm significant cross-linguistic differences with large effect sizes in Cognitive Processes, Affect, Prepositions, Function words, and Positive emotion. Substantial variation also exists within both Chinese and English translations across several categories. Despite the inherent limits of dictionary-based proxies, the LIWC framework provides a consistent baseline for tracing how translators reshape the cognitive-emotional profile of the Republic.

Discussion

The quantitative profiles are entry points into a richer intellectual history: four translators re-situating Plato’s Republic within distinct Chinese contexts. The stylistic differences are not random but are legible fingerprints of specific translational projects. Following Weng’s notion of “re-locating” Plato (Weng, 2015a), each project pairs empirical habits with motivations from postscripts. Aligning these allows the discussion to move from description to interpretation, particularly along the axis of foreignization versus domestication (Schleiermacher, 2021). Ricoeur’s idea of translation as “equivalence without adequacy” (Ricoeur, 2006, p. 10) captures this balancing act between fidelity and reader expectations, which the styles in Chapter 4 materialize.

The translation by Gu Shouguan (2010) stands as a compelling embodiment of a foreignizing strategy, one that rigorously bends the target language to the contours of the source text (Venuti, 2018). As Venuti argues, foreignizing translation “Signifies the differences of the foreign text, yet only by disrupting the cultural codes that prevail in the translating language” (Venuti, 2018, p. 15). This strategy seeks to restrain the ethnocentric violence of translation and can serve as a form of resistance against cultural narcissism and imperialism (Venuti, 2018, p. 16). The quantitative data provides the initial outline of this strategy: with an Average Sentence Length of 27.419 tokens and an Average Syntactic Tree Depth of 12.069, Gu’s text is statistically the most complex among all translations analyzed. His translation also exhibits the highest Clauses per Sentence ratio (5.301), indicating extensive subordination and embedding that mirrors the layered structure of the original Greek. This abstract complexity becomes tangible in passages like his rendering of the tripartite soul at [441a]. Where others might simplify, Gu constructs a formidable periodic sentence that meticulously mirrors the layered, subordinate structure of the original Greek: “就像是在城邦里那样, 是有三个属类合起来构成为一个城邦的: 生聚财货的_、协助守卫的_、谋虑决断的…” (“Just as it is in the city-state, there are three classes that combine to form a city-state: that which accumulates wealth, that which assists in defense, that which deliberates and decides…”) (Plato, 2010, p. 198). This is not simply a long sentence; it is a deliberate architectural choice. Gu himself confirms this intent in the postscript to his work, where he notes that his editorial approach was to “Stay as close as possible to the original’s word order and tone, sometimes even at the expense of fluent Chinese expression.”⁴ This explicit statement of purpose provides a direct link between his philosophical motivation and his stylistic execution.

Gu’s foreignizing impulse is not merely linguistic but may be understood within intellectual contexts like the rise of Straussian interpretation in Chinese academia, which provides a historical context for his method (Weng, 2015b). By preserving the original’s difficulty, the translator resists domestication, compelling the reader to journey “over to the author,” as Schleiermacher phrased it (Schleiermacher, 2021). This philosophy extends from syntax to lexicon. In his translation of Plato’s metaphysics at [477a], Gu’s choice of “一个完全地‘是’的东西是完全地可认识的” (“A thing that completely ‘is’ is completely knowable”) exemplifies lexical foreignization (Plato, 2010, p. 258). The slightly awkward, technical adverbial (“完全地”) and the bracketing of “is” (是) forces the reader to treat “being” not as a common verb, but as a specialized, alien philosophical concept. Gu’s approach can be understood through Gadamer’s “fusion of horizons,” where the translator’s horizon encounters the source text’s, creating a new one that preserves the text’s foreignness (Foran, 2018). His stylistic choices reflect this deliberate preservation of distance (Gadamer, 2011, p. 388), rendering Plato as a formidable figure accessible only through intense scholarly labor.

At the opposite end, the translations by Guo Binhe and Zhang Zhuming (1986) and He Xiangdi (2021) exemplify a domesticating strategy, prioritizing clarity for a contemporary Chinese audience. As Venuti observes, this can create an “illusion of transparency” by rewriting the text in discourse aligned with local cultural values (Venuti, 2018, p. 16). Guo Binhe’s version, with the shortest sentences (ASL: 17.985) and simplest syntax (tree depth: 10.369), is a clear example. Its domesticating function is immediately apparent when placed alongside Gu Shouguan’s rendering of the same passage at [477a]. Where Gu uses the technical and literal “是” (is), Guo opts for the far more common and concrete “有” (you, to have/to exist): “完全有的东西是完全可知的” (“A thing that completely exists is completely knowable”) (Plato, 1986, p. 220). This seemingly small substitution is a profound act of cultural translation, replacing an abstract metaphysical concept with a more intuitive notion of existence, thereby lowering the cognitive barrier for a non-specialist reader. This choice aligns with its context: published by the Commercial Press in 1986 during the “Reform and Opening Up” era, its purpose was likely pedagogical.⁵ Guo’s high lexical diversityGuo’s high lexical diversity (MATTR: 0.45) also suggests a strategy of using varied vocabulary to enhance accessibility.

He Xiangdi’s 2021 translation represents a modern evolution of this domesticating impulse. His postscript is a remarkably transparent account of a translator’s process, explicitly stating his wish to re-translate the Republic for a “new era,” as previous versions had “room for improvement” in their accuracy and readability. He details a meticulous four-step methodology, moving from a literal “word-for-word” draft to a polished final version, a process designed to balance scholarly fidelity with modern Chinese fluency. His decision to reformat the text into a clear, turn-by-turn dialog is another powerful domesticating move that prioritizes the reader’s experience (Plato, 2021, pp. 474–476). He’s translation shows moderate syntactic complexity (ASL: 20.567, tree depth: 10.941), positioning it between Guo’s accessible style and Gu’s complex structures, while his cognitive processes score (16.434%) is the highest among all translations, reflecting his emphasis on explicit reasoning markers. This shared domesticating tendency among Guo and He can be understood through Ricoeur’s concept of translation as a “Work of mourning,” where the translator must give up the ideal of perfect translation and accept the task of creating “Equivalence without adequacy” (Ricoeur, 2006, pp. 8, 10). For Ricoeur, this work involves both mourning the loss of perfect equivalence and remembering that the mother tongue is not sacred but one language among many, allowing the translator to welcome the foreign text while making it accessible to the target audience (Ricoeur, 2006, p. 4). The domesticating strategies of Guo and He reflect this balance between faithfulness and accessibility. Their horizons of understanding, shaped by pedagogical goals and historical contexts, fuse with the source text to bring the reader closer to the author rather than maintaining distance.

The analysis of logical connectives reveals a more nuanced pattern that challenges simple assumptions about explicitation in translation. When Addition markers such as “and/和” are excluded, the overall density of logical connectives in Chinese translations (283.70 per 10,000 tokens) is actually lower than in English translations (332.12 per 10,000 tokens), representing approximately 85% of the English density. This finding suggests a pattern of implicitation rather than explicitation, where Chinese translators reduce the explicit marking of logical relationships compared to the English source texts. As Vinay and Darbelnet define it, explicitation is “A stylistic translation technique which consists of making explicit in the target language what remains implicit in the source language because it is apparent from either the context or the situation” (Vinay & Darbelnet, 1995, p. 342). However, the observed pattern in these translations demonstrates the opposite phenomenon: implicitation, where information that is explicit in the source language becomes implicit in the target language. This pattern aligns with fundamental typological differences between the two languages: English is a hypotactic language that favors explicit connectives, while Chinese is a paratactic language that often implies logical relationships through context, word order, and semantic coherence. In the process of English-to-Chinese translation, translators may adopt implicitation strategies, reducing the use of explicit connectives to align with Chinese expression habits and cultural background (Huang, 2022). While explicitation has often been regarded as a distinctive feature of translation, with target texts tending to be longer than source texts (Olohan & Baker, 2000), the observed implicitation pattern in these philosophical translations thus reflects not merely translator choice but the fundamental structural differences between the two language systems. However, this overall pattern of implicitation is nuanced by functional distribution: while Chinese translations use fewer connectives overall, they show particular emphasis on certain functional categories. Most notably, Chinese translations exhibit significantly higher relative frequencies in the Subsequent category (ranging from 0.881 to 5.262 per 10k) compared to English translations (0.392–0.818 per 10k), with the highest Chinese frequency (Xie Shanyuan: 5.262) being approximately 5.07 times higher than the average English frequency. This suggests that while Chinese translators may reduce overall connective density, they strategically increase the use of temporal sequence markers (如 “于是/以致”) to choreograph narrative progression and hypothetical scenarios, reflecting different discourse expectations in Chinese philosophical writing. This pattern can be understood through Blum-Kulka’s theory of shifts in cohesion and coherence, where translation necessarily entails changes in textual relationships, and the choice of which cohesive markers to retain, reduce, or amplify reflects both language-specific norms and translator strategies (Blum-Kulka, 1986). The finding of overall implicitation with selective emphasis on specific functional categories demonstrates that translation strategies are not monolithic but involve complex negotiations between source-text features, target-language conventions, and translator priorities.

Ultimately, the deepest interpretive divergences are in the translation of core philosophical concepts. The treatment of δικαιοσύνη is the most telling case. The term is fractured into distinct concepts in Chinese. Gu Shouguan and He Xiangdi consistently use “正义” (zhèngyì, righteousness), emphasizing moral correctness. Guo Binhe also primarily uses “正义” but occasionally employs “公正” (gōngzhèng, fairness) and “公道” (gōngdào, impartiality). Xie Shanyuan, in contrast, reverses this pattern, using “公道” as the primary term (243 occurrences) and “正义” only rarely (8 occurrences), shifting the emphasis from moral righteousness to public reason and equitable treatment, foregrounding the socio-political dimension of the concept. This choice is evident in passages like Socrates’ concluding words in Book I [354b]: “而如果我不知道‘公道是什么’ _，我就更不知道公道是不是美德_，以及拥有公道的人究竟是不是幸福的了” (“And if I don’t know what justice is, I will hardly know whether it is a virtue or not, and whether the one who has it is happy or not”) (Plato, 2016, p. 59). These are not just different words; they are different philosophical starting points that create four different Republics (Yee, 2010). Each choice reflects a cultural negotiation—mapping Greek concepts onto the Chinese philosophical landscape. As Ricoeur argues, translation creates “equivalence without adequacy,” balancing faithfulness and betrayal (Ricoeur, 2006, p. 10). The divergent choices for δικαιοσύνη exemplify this negotiation. The “critical void” is, in fact, a vibrant space where translators continually reinvent a classic text. Through their divergent choices, informed by their personal histories and eras, these translators have gifted the Chinese-speaking world with four unique philosophical arguments.

Conclusion

This study demonstrated that computational stylistics can reveal how four Chinese translators of Plato’s Republic created distinct interpretive projects. Our multi-layered framework uncovered systematic differences beyond surface-level variation: Gu Shouguan’s translation shows the highest syntactic complexity, reflecting a foreignizing strategy, while Guo Binhe’s is the most accessible, embodying a domesticating approach. The analysis of core philosophical terms reveals that translators are active interpreters; for instance, the Greek term δικαιοσύνη is fractured into four distinct Chinese concepts, each reflecting a different philosophical emphasis. The examination of logical connectives shows that Chinese translations have lower connective densities than English benchmarks, suggesting implicitation, yet with strategic emphasis on categories like Subsequent markers. Cognitive-linguistic profiling confirms these differences, with Chinese translations showing higher cognitive process coverage, indicating a tendency to foreground reasoning vocabulary.

While these findings provide robust evidence, several methodological challenges warrant reflection. Although the NLP tools were validated with high accuracy, their performance on philosophical texts may still introduce subtle biases. The corpus provides valuable diachronic coverage, but its limited size constrains generalizability. Finally, the cross-language LIWC comparison, despite careful mapping, remains subject to the challenge that even aligned categories may encode culturally specific nuances.

Despite these limitations, this framework establishes a reproducible methodology for comparative translation analysis. Future work should expand the corpus to include more translations, apply the framework to other philosophical genres, develop domain-specific NLP tools to reduce bias, and extend the analysis to other languages to explore cross-cultural patterns. Through such extensions, computational stylistics can continue to illuminate how translators, as cultural mediators, shape the reception of canonical philosophical texts.

Supplemental Material

sj-rar-1-sgo-10.1177_21582440261416755 – Supplemental material for Stylistic Fingerprints: A Multi-Layered Computational Analysis of Chinese Translations of Plato’s Republic

Supplemental material, sj-rar-1-sgo-10.1177_21582440261416755 for Stylistic Fingerprints: A Multi-Layered Computational Analysis of Chinese Translations of Plato’s Republic by Lexin Huang and Liangkun Chen in SAGE Open

Footnotes

ORCID iDs

Lexin Huang

Liangkun Chen

Ethical Considerations

The data that were used in the study are open-access texts of journal articles. The author has no ethical issues to report.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work was supported by the MOE (Ministry of Education in China) Liberal Arts and Social Sciences Foundation (22YJC710045).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

Supplemental Material

Supplemental material for this article is available online.

Notes

References

Baker

(2001). Towards a methodology for investigating the style of a literary translator. Target, 12(2), 241–266. https://doi.org/10.1075/target.12.2.04bak

Ball

Koliousis

Mohanan

Peacey

(2024). Computational philosophy: Reflections on the PolyGraphs project. Humanities and Social Sciences Communications, 11(1), 186. https://doi.org/10.1057/s41599-024-02619-z

Becker

Culotta

(2025, May 6). A Computational analysis and visualization of in-text reference networks across philosophical texts. https://doi.org/10.48550/arXiv.2504.20065

Biber

(1988). Variation across speech and writing. Cambridge University Press.

Blum-Kulka

(1986). Shifts of cohesion and coherence in translation. In House

Blum-Kulka

(Eds.), Interlingual and Intercultural Communication (pp. 290–305). John Benjamins Pub Co.

Boase-Beier

(2014). Stylistic approaches to translation. Routledge.

Boase-Beier

(Ed.). (2023). Stylistics and translation. In Burke

(Ed.), The Routledge handbook of stylistics (2nd ed., pp. 420–435). Routledge.

Botz-Bornstein

Mostafa

M. M.

(2017). A corpus-based computational analysis of philosophical texts: Comparing analytic and continental philosophy. International Journal of Social and Humanistic Computing, 2(3/4), 230–246. https://doi.org/10.1504/ijshc.2017.084757

Brandwood

(1990). The chronology of Plato’s dialogues. Cambridge University Press.

10.

Burrows

(2002). “Delta”: A measure of stylistic difference and a guide to likely authorship. Literary and Linguistic Computing, 17(3), 267–287. https://doi.org/10.1093/llc/17.3.267

11.

CNKI. (n.d). Lǐxiǎngguó. Chinese Book Citation Statistical Analysis Database. https://cbad.cnki.net/?r=%2FBookInfo%2FBookListIndex%3Fpagecode%3D001

12.

Daelemans

(2013). Explanation in computational stylometry. In Gelbukh

(Ed.), Computational Linguistics and Intelligent Text Processing (pp. 451–462). Springer.

13.

Eder

Rybicki

Kestemont

(2016). Stylometry with R: A package for computational text analysis. The R Journal, 8(1), 107–121. https://doi.org/10.32614/rj-2016-007

14.

Foran

(2018). Gadamer and Ricoeur. In Rawling

Wilson

(Eds.), The Routledge handbook of translation and philosophy (pp. 90–103). Routledge.

15.

Frazier

(1985). Syntactic complexity. In Zwicky

A. M.

Dowty

D. R.

Karttunen

(Eds.), Natural language parsing: Psychological, computational, and theoretical perspectives (pp. 129–189). Cambridge University Press.

16.

Gadamer

H. G.

(2011). Truth and method. Bloomsbury Academic.

17.

Ghazala

H. S.

(2018). The cognitive stylistic translator. Arab World English Journal For Translation and Literary Studies, 2(1), 4–25. https://doi.org/10.24093/awejtls/vol2no1.1

18.

Grieve

(2007). Quantitative authorship attribution: An evaluation of techniques. Literary and Linguistic Computing, 22(3), 251–270. https://doi.org/10.1093/llc/fqm020

19.

Halliday

M. A. K.

Hasan

(2014). Cohesion in English. Routledge. https://doi.org/10.4324/9781315836010

20.

Herrmann

J. B.

Jacobs

A. M.

Piper

(2021). Computational Stylistics. In Kuiken

Jacobs

A. M.

(Eds.), Handbook of empirical literary studies (pp. 451–483). De Gruyter.

21.

Hołobut

Rybicki

(2020). The stylometry of film dialogue: Pros and pitfalls. Digital Humanities Quarterly, 14(4), 1–37. http://www.digitalhumanities.org/dhq/vol/14/4/000498/000498.html

22.

Huang

(2022). A comparative study of the explicitation of conjunctions in Chinese-English translation of different styles. Modern Linguistics, 10(9), 2073–2083. https://doi.org/10.12677/ml.2022.109279

23.

Johnson

J. H.

(2008). Corpus stylistics and translation (Essay 9; p. 15). Centro di Studi Linguistico-Culturali (CeSLiC). https://amsacta.unibo.it/id/eprint/2507/

24.

Johnson

Tarrant

(2014). Fairytales and make-believe, or spinning stories about Poros and Penia in Plato’s symposium: A literary and computational analysis. Phoenix, 68(3/4), 291–312. https://doi.org/10.1353/phx.2014.0043

25.

Koentges

(2020). The un-platonic menexenus : A stylometric analysis with more data. Greek, Roman and Byzantine Studies, 60, 211–241. https://api.semanticscholar.org/CorpusID:219407873

26.

Ledoit

Wolf

(2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365–411. https://doi.org/10.1016/s0047-259x(03)00096-4

27.

(2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4), 474–496. https://doi.org/10.1075/ijcl.15.4.02lu

28.

Malmkjaer

(2003). What happened to God and the angels: An exercise in translational stylistics. Target, 15(1), 37–58. https://doi.org/10.1075/target.15.1.03mal

29.

McEnery

Xiao

(2004). The Lancaster Corpus of Mandarin Chinese: A Corpus for Monolingual and Contrastive Language Study. In Lino

M. T.

Xavier

M. F.

Ferreira

Costa

Silva

(Eds.), Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04). European Language Resources Association (ELRA). https://aclanthology.org/L04-1117/

30.

McEnery

Xiao

(2010). Corpus-based contrastive studies of English and Chinese. Routledge.

31.

Meng

(2013). Exploratory statistical techniques for the study of literary translation. RAM-Verlag.

32.

Meng

Oakes

M. P.

(Eds.) (2021). Corpus exploration of Lexis and discourse in translation. Routledge.

33.

Olohan

Baker

(2000). Reporting that in translated English. Evidence for subconscious processes of explicitation? Across Languages and Cultures, 1(2), 141–158. https://doi.org/10.1556/acr.1.2000.2.1

34.

PaddleNLP Contributors

. (2021). PaddleNLP: An easy-to-use and high performance NLP library. GitHub. https://github.com/PaddlePaddle/PaddleNLP

35.

Pennebaker

J. W.

(2013). The secret life of pronouns: What our words say about us (Vol. 352, p. xii). Bloomsbury Press. https://doi.org/10.1016/S0262-4079(11)62167-2

36.

Pennebaker

J. W.

Boyd

R. L.

Jordan

Blackburn

(2015). The development and psychometric properties of LIWC2015. https://doi.org/10.15781/T29G6Z

37.

Plato. (1986). Lǐxiǎngguó ( Guo

Zhang

, Trans.). The Commercial Press.

38.

Plato. (1991). The Republic of Plato (2nd ed.). ( Bloom

, Trans.) Basic Books.

39.

Plato. (2004). Republic. ( Reeve

C. D. C.

, Trans.) Hackett Publishing Company, Inc.

40.

Plato. (2010). Lǐxiǎngguó ( Gu

, Trans.). Yuelu Publishing House.

41.

Plato. (2016). Lǐxiǎngguó ( Xie

, Trans.). Shanghai Translation Publishing House.

42.

Plato. (2021). Lǐxiǎngguó ( He

, Trans.). Yunnan People’s Publishing House.

43.

Zhang

Bolton

Manning

C. D.

(2020). Stanza: A Python natural language processing toolkit for many human languages. In Celikyilmaz

Wen

T.-H.

(Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (pp. 101–108). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-demos.14

44.

Ricoeur

(2006). On translation. Routledge.

45.

Saldanha

(2011). Translator style. Translator, 17(1), 25–50. https://doi.org/10.1080/13556509.2011.10799478

46.

Schleiermacher

(2021). On the different methods of translating. In Venuti

(Ed.), The translation studies reader (4th ed., pp. 51–71). Routledge.

47.

Sir Ross

W. D

. (1976). Plato’s theory of ideas. Greenwood Pub Group.

48.

Stamatatos

(2009). A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology, 60(3), 538–556. https://doi.org/10.1002/asi.21001

49.

Tausczik

Y. R.

Pennebaker

J. W.

(2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54. https://doi.org/10.1177/0261927x09351676

50.

Venuti

(2018). The translator’s invisibility: A history of translation. Routledge.

51.

Vinay

J. P.

Darbelnet

(1995). Comparative Stylistics of French and English: A methodology for translation. ( Sager

J. C.

Hamel

M.-J.

, Trans.) John Benjamins Publishing Company.

52.

Weng

(2015a). Re-locating Plato: A Chinese translation and interpretation of Plato’s Symposium. Intertexts, 19(1–2), 67–82. https://doi.org/10.1353/itx.2015.0004

53.

Weng

(2015b). The Straussian reception of Plato and nationalism in China. Comparatist, 39, 313–334. https://www.jstor.org/stable/26254732

54.

Yee

C. D. K.

(2010). On translating the “Republic.” Review of Metaphysics, 64(2), 337–360. https://www.jstor.org/stable/29765377

55.

Yngve

V. H.

(1960). A model and an hypothesis for language structure. Proceedings of the American Philosophical Society, 104(5), 444–466. https://www.jstor.org/stable/985230

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

8.38 MB