Semantic Overlaps Between Chinese Two-Character Words and Constituent Characters: A Normative Study

Abstract

In written Chinese, the graphic units are Chinese characters (CCs). Most of the commonly used characters often join with others to form two-character words (2C-words) or words of more than two characters. Indeed, over 70% of the commonly used words are 2C-words. Since almost all characters are meaningful in their own right, there are semantic overlaps between 2C-words and their constituent characters. The present study investigated how normative semantic overlap of 2C-words and their constituent characters (SWC) might be influenced by whether the constituent characters are word or word-not characters (Wording) and by whether they are left or right characters (Positioning) and might be predicted by ordinary features of the constituent characters. The results confirmed earlier work that word-not characters are more strongly associated than word characters with 2C-words, and that right characters are more strongly than left characters associated with the 2C-words in semantics. The present study is also the first to provide evidence concerning the prediction of SWC by the norm features of the constituent characters. Skilled readers’ perception of the semantic features, frequency features and number-of-word features of the constituent characters may be mediated by Wording and Positioning in 2C-word semantic processing. However, they are not likely to perceive the visual features of the constituent characters. Rather, they seem to take the constituent characters of m-CCs as individual units, which should be highly familiar to them. In a semantic task on 2C-words, skilled readers may process the constituent characters in number of meanings, concreteness, imageability and emotion arousal, but not in sensory experience arousal. There appears to be a close association in valence between the 2C-words and the word characters, but not between the 2C-words and the word-not characters. These findings strongly support the theoretical argument that both words and characters should be taken as language units in Chinese. However, Wording and Positioning should be considered carefully when considering a CC as a language unit. These findings may be of more general significance for semantic understanding of compound words.

Keywords

semantic overlaps Chinese two-character words constituent characters normative study

Introduction

In written Chinese, the graphic units are Chinese characters (CCs). Most of the commonly used characters often join with others to form two-character words (2C-words) or words of more than two characters (mC-words). Since almost all characters are meaningful in their own right, there are semantic overlaps between 2C-words and their constituent characters (Tse & Yap, 2018; X. Zhou & Marslen-Wilson, 2000), much like compound words in English. For example, “basketball” is composed of “basket” and “ball,” meaning a game in which a ball is thrown through a basket; similarly, “红旗” (red flag) is the semantic combination of the constituent characters “红” (red) and “旗” (flag). In an English two-morpheme compound, the morpheme on the right side is more likely to be the semantic head than the morpheme on the left side (Dronjic, 2011; Juhasz et al., 2015). This is also the case in many 2C-words, where the constituent characters on the right are more likely to be the meaningful heads than those on the left (T. Q. Xu, 2010).

Gagné et al. (2019) conducted a normative study on the semantic associations between over 8,000 English compounds and their constituent morphemes. They asked participants to rate each compound in terms of the degree to which its meaning can be predicted from its constituents. They also obtained linguistic characteristics that might influence compound processing and carried out a series of data analyses. The compounds whose meanings were highly predictable from their constituents were likely to be processed faster than those whose meanings were not easily predictable from their constituents. In other words, a normative semantic association between compounds and their constituents is strongly predictive of compound processing. Similarly, in Chinese, the normative semantic overlap of 2C-words and their constituent characters (SWC) should also be valuable in 2C-word processing. Several normative studies have been conducted into the ordinary features of CCs (e.g., Z. G. Cai et al., 2022; Y. Liu et al., 2007; Sze et al., 2014; R. Wang et al., 2020). It is possible that the strength of SWC might be a mathematical function of norm features of the constituent characters. The purpose of the present study is to confirm this speculation via a normative study.

Research Questions

Wording and Positioning

In English, compound words account for a small portion of the vocabulary (Gao et al., 2022; P. D. Liu & McBride-Chang, 2010), but in Chinese, 2C- and mC-words make up 72% and 22% of the commonly used words (State Language Affairs Commission, 2008), respectively, probably because of the particular language units of CCs. A CC corresponds to a syllable in spoken Chinese. In their first years of school, children learn that 2C- and mC-words are composed of single characters (Cheng et al., 2018; McBride, 2016). To meet the minimum literacy requirements set by education policy, they are required to master the 2,500 most commonly used CCs (State Language Affairs Commission, 1988), each of which is estimated to be used as a constituent character in about 20 commonly used 2C-words, on average (K. Zhang, 1997).

CCs may fall into three categories, according to whether they function only as one-character words, only as constituent characters or both as one-character words and as constituent characters. Those that can be used as constituent characters but not be used as one-character words are called “word-not characters” in the present study; those that can be used both as one-character words and as constituent characters are called “word characters.” Word characters have more meanings and can join with other CCs to form more 2C-words than word-not characters (Ge, 2018; J. Zhou, 2019). Word characters might be different from word-not characters in how they semantically overlap with the corresponding 2C-words.

The constituent characters of 2C- and mC-words have fixed relative positions. Those on the left and right side of 2C-words are referred to as left and right characters, respectively. Left characters are potentially different from right characters in their semantic and syntactic contributions (Pan, 2002; T. Q. Xu, 2010). In the present study, Research Question One explores how SWC might be influenced by whether the constituent characters are word or word-not characters (Wording) and by whether they are left or right characters (Positioning).

Ordinary Features

CCs have many ordinary features, including number of strokes, number of components, frequency, familiarity, age of acquisition (AOA), number of meanings, semantic transparency, concreteness, imageability, valence, emotion arousal, sensory experience arousal and number of 2C-words in which they are used as left or right characters. The more strokes a CC has, the more complex it is in its visual complexity. CCs may also be divided into single characters and compound characters; over 95% of the commonly used CCs are compound characters that consist of more than one component. For example, the single character “木” and the compound character “村” are composed of 3 and 6 strokes, respectively; “村” and “树” are thought of consisting of two (“木” and “寸”) and three components (“木,”“又,” and “寸”), respectively. Frequency indicates how often they are used in everyday language activities. Familiarity indicates the extent to which the reader is familiar with a CC (Juhasz et al., 2015). The AOA for a specific CC refers the year in which the reader first learns it (Y. Liu et al., 2007). Most of the limited number of components originate from single characters. According to the Dictionary of Chinese Characters Information (Science Publishers, 1988), about 53%, 21%, and 19% CCs have one, two and at least three meanings, respectively. Semantic transparency suggests the degree to which a compound character is semantically associated with its components (Tse & Yap, 2018). Imageability refers to the degree to which a CC arouses the readers’ sensory-experience-based (Song & Li, 2021) or emotional-experience-based mental images (R. Wang et al., 2020). There is high degree of diversity in how concrete the meanings are among the meaningful CCs. Valence indicates the degree to which a CC is positive or negative in meaning (Yee, 2017). Sensory experience arousal and emotion arousal refer to the extent to which a CC arouses the reader’s sensory experience (Yin & Ye, 2013) and emotional experience (Newcombe et al., 2012), respectively. Number of words (L) and number of words (R) suggest the number of 2C-words in which a CC is used as a left and right character, respectively.

Several studies have explored the norm features of CCs (e.g., Z. G. Cai et al., 2022; Y. Liu et al., 2007; Sze et al., 2014; R. Wang et al., 2020). For example, the processing efficiency of CCs in lexical decisions can be significantly predicted by features such as number of strokes, frequency, AOA and number of meanings (Sze et al., 2014). The naming time for a CC can be significantly predicted by AOA, frequency, familiarity, concreteness, number of strokes, number of words, imageability and number of components (Y. Liu et al., 2007), or by frequency, familiarity, number of strokes and number of words (Chang et al., 2016). However, few studies have investigated how SWC is predicted by norm features of the constituent characters. In the present study, Research Question Two explores the prediction of SWC by norm features of the constituent characters at each treatment level of Wording by Positioning.

Significance

The present study uses the 24,473 commonly used 2C-words (State Language Affairs Commission, 2008) as sample words, the constituent characters of which are included in the 2,478 meaningful most commonly used CCs (1,936 word characters and 542 word-not characters) (m-CCs). Whether an m-CC is a word or word-not character is determined according to C. Wang (2017). Considering the significant status of CCs in Chinese literacy education, the findings of the present study should be extremely valuable.

There has been theoretical debate about whether characters (H. Wang, 2011; T. Q. Xu, 2010), words (Ge, 2018; Lv, 1953) or both characters and words (Chen, 2014; Dong, 2004; J. Zhou, 2010, 2019) can be regarded as the basic structural units of Chinese. This makes it practically confusing when teaching Chinese as a second language to decide whether to emphasize (Kang, 2004; C. Lu & Wang, 2006; Lv, 1983; J. Zhou, 2010) or ignore (Ge, 2018; Z. Wu & Wang, 2009; Zhao, 2012) CCs. The present study is expected to produce further clarity on the relationship between 2C-words and their constituent characters.

It is generally accepted that the constituent characters are processed in the early stage of 2C-word recognition (Miwa et al., 2014; Taft, 2003; Tsang & Chen, 2010, 2013a, 2013b). The processing of a 2C-word is subject to the influences of frequency, number of strokes, number of meanings and number of word formations of its constituent characters (Huang et al., 2006, 2011; Miwa et al., 2014; Peng et al., 1999; Sun et al., 2018; Tsang et al., 2018; Tse et al., 2022; Tse & Yap, 2018; B. Zhang & Peng, 1992). However, few studies have investigated the relative importance of character features in predicting semantic activation of the corresponding 2C-words. The present study is likely to fill this gap from a normative perspective.

In addition to providing a rich reference resource for experimental studies on Chinese mental lexicons, the study is also significant for Chinese vocabulary teaching. More generally, given the similarity between English compounds and many 2C-words in syntactic structures, the findings may also enhance understanding of the associations between compound words and their constituent morphemes.

Methods

This section consists of two sub-sections: obtaining ordinary norm feature scores of the m-CCs and evaluating SWC scores for the sample words. The study was approved by the Ethics Committee of Qufu Normal University.

Norm Feature Scores

Objective Feature Scores

The scores in frequency, number of strokes and number of meanings of the m-CCs were obtained with reference to SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles (Q. Cai & Brysbaert, 2010), Xinhua Dictionary (Institute of Linguistics, Chinese Academy of Social Sciences, 2011) and the online version of Handian (https://www.zdic.net/), respectively. The scores of number of words (L) and number of words (R), in which each m-CC was used as the left or right character, respectively, were obtained with reference to Vocabulary Expert (S. Wu, 2000).

Subjective Feature Scores

Subjective feature scores were collected by a series of questionnaire surveys to a sample of skilled readers of Chinese.

Participants

Participants were 4,158 college students (1,643 males, mean age = 19.21 years, SD = 1.53 years) from Qufu Normal University. They were Chinese native speakers and were blind to the purpose of the study.

Materials

The 2,478 m-CCs were randomly divided into 21 groups, with each group listed on a single sheet of paper. A seven-point scale (with instructions printed at the top) was used to obtain scores for familiarity, AOA, semantic transparency, concreteness, imageability, sensory experience arousal, emotion arousal and valence; a five-point scale was used for number of components. Considering the familiarity questionnaire as an example, the main instructions were: “There are seven numbers ([1][2][3][4][5][6][7]) printed on the right side of each CC on the list. If you are very familiar with the CC, put a tick (√) next to the largest number [7]; if you are very unfamiliar with the CC, put a tick (√) next to the smallest number [1]. The more familiar you are with the CC, the larger the number you tick (√).”

Procedure

There were 189 questionnaires copied 22 times. The 189 × 22 questionnaire sheets were randomly delivered to participants. Each participant responded to a one-sheet questionnaire independently.

SWC Scores

Participants

Participants were 9,024 college students (4,763 males, mean age = 19.54 years, SD = 1.46 year) from Qufu Normal University. They were Chinese native speakers and had not participated in the questionnaires on the m-CCs.

Materials

Each sample word was paired with the corresponding left and right characters. A list of 48,946 pairs of sample words and the constituent characters were created, randomly divided into 411 groups. Each group contained 119 or 120 word-character pairs, which were printed on a single sheet of paper. To assess the strength of SWC for the sample words, 411 seven-point scaled questionnaires (0 = the constituent character has nothing to do with its meaning in isolation; 6 = the constituent character means exactly the same as it does when in isolation) were designed.

Procedure

The 411 questionnaires were copied 22 times and delivered to the participants. Each participant responded independently to a one-sheet questionnaire.

Results

A small proportion of the returned questionnaires were invalid and discarded for the subjective norm features (3.5%) and for the SWC scores (1.1%) (X. Xu & Li, 2020), leaving at least 18 valid respondents for each questionnaire item. Table 1 displays the descriptive results. The reliability coefficients were relatively low for familiarity and semantic transparency scores, probably because of participants’ high familiarity with the m-CCs. The corresponding data were not included in the follow-up analyses.

Table 1.

Norm Feature Scores and SWC Scores.

		M	SD	Min	Max	Skewness	Kurtosis	Cronbach’s alpha
Word character	Number of meanings	4.31	2.74	1.00	22.00	1.56	4.16	-
	Concreteness	5.43	0.93	2.16	7.00	−0.72	0.12	.83
	Imageability	4.04	0.95	1.60	6.89	0.15	−0.37	.88
	Valence	4.17	0.83	1.70	6.73	−0.19	−0.01	.86
	Emotion arousal	3.00	0.85	1.17	6.06	0.66	0.19	.82
	Sensory experience arousal	3.42	0.89	1.32	6.13	0.09	−0.66	.93
	Semantic transparency	5.69	0.50	3.52	6.91	−0.59	0.64	.44
	Familiarity	6.54	0.29	5.10	7.00	−0.95	1.43	.49
	AOA	1.46	0.31	1.00	2.67	0.75	0.24	.91
	Frequency	3.58	0.80	0.48	6.31	−0.01	0.39	-
	Number of strokes	9.05	3.25	1.00	23.00	0.40	0.26	-
	Number of words (L)	16.09	18.17	0.00	141.00	2.75	10.22	-
	Number of words (R)	16.09	26.43	0.00	756.00	13.12	328.60	-
	Number of components	2.05	0.39	1.00	3.89	0.00	1.01	.87
	SWC scores, left character	4.05	0.49	1.35	5.50	−0.68	2.18	.78
	SWC scores, right character	4.15	0.53	1.00	5.75	−0.77	2.09	.91
Word-not character	Number of meanings	3.09	1.78	1.00	12.00	1.14	2.02	-
	Concreteness	5.25	0.92	2.68	6.90	−0.32	−0.70	.81
	Imageability	3.80	0.83	1.67	6.29	0.25	−0.20	.76
	Valence	4.23	0.51	1.70	5.45	−1.34	3.95	.85
	Emotion arousal	3.04	0.79	1.41	5.35	0.53	−0.27	.76
	Sensory experience arousal	4.27	0.91	1.73	6.20	−0.52	−0.27	.90
	Semantic transparency	3.09	0.74	1.48	5.30	0.42	−0.10	.86
	Familiarity	6.46	0.29	5.50	7.00	−0.77	0.47	.29
	AOA	1.63	0.33	1.00	2.71	0.33	−0.32	.91
	Frequency	3.18	0.68	1.18	5.85	0.07	0.70	-
	Number of strokes	9.53	3.29	2.00	22.00	0.43	0.39	-
	Number of words (L)	7.60	8.05	0.00	56.00	2.52	8.63	-
	Number of words (R)	9.51	11.80	0.00	117.00	3.26	17.37	-
	Number of components	2.12	0.37	1.05	3.74	0.01	1.44	0.85
	SWC scores, left character	4.14	0.52	2.25	5.50	−0.84	1.52	0.90
	SWC scores, right character	5.43	0.53	3.70	6.73	−0.47	0.16	0.46

To answer Research Questions One and Two, a linear mixed model analysis and four multi-linear regression analyses were conducted, respectively.

Linear Mixed Model Analysis

A linear mixed model analysis was conducted on the SWC scores using lme4 (Bates et al., 2011) in R (R Development Core Team, 2012) with the m-CCs as the random factor and Wording and Positioning as the fixed factors. The results showed that the main effects were significant for Wording and Positioning. The SWC scores were significantly greater for the word-not characters (M = 4.188, SD = 0.517) than for the word characters (M = 4.099, SD = 0.513) (β = .090, SE = 0.019, t = 4.803, p < .0001), and were significantly larger for the right characters (M = 4.190, SD = 0.527) than for the left characters (M = 4.100, SD = 0.500) (β = .094, SE = 0.015, t = 6.157, p < .0001). The interaction was not significant between Wording and Positioning (β = −.005, SE = 0.037, t = −0.143, p = .887).

Multi-Linear Regression Analyses

Four multi-linear regression analyses were conducted to estimate prediction of the SWC scores by the norm features of the left-word, right-word, left-word-not and right-word-not characters. As displayed in Table 2, the scores of number of meanings, concreteness, imageability, valence, emotion arousal and frequency of the left-word characters were significant in predicting changes in SWC scores (F(12, 1,836), p < .0001, adjusted R² = .23). The scores of number of meanings, concreteness, imageability, emotion arousal, valence, frequency and number of words (L) of the right-word characters were significant in predicting changes in SWC scores (F(12, 1,733), p < .0001, adjusted R² = .25). The scores of number of meanings, concreteness, imageability, emotion arousal and AOA of the left-word-not characters were significant in predicting changes in SWC scores (F(12, 475) = 10.06, p < .0001, adjusted R² = .19). The scores of number of meanings, concreteness, imageability, emotion arousal, frequency and number of words (R) of the right-word-not characters were significant in predicting changes in SWC scores (F(12, 474) = 12.61, p < .0001, adjusted R² = .23).

Table 2.

Coefficients of Constituent Character Norm Features in Predicting SWC Scores.

Positioning	Wording	Norm feature	β	SE	t	p
Left	Word	Number of meanings	−0.275	0.005	−10.818	.000
		Concreteness	.066	0.016	2.261	.024
		Imageability	.128	0.017	4.064	.000
		Valence	−0.060	0.013	−2.740	.006
		Emotion arousal	.096	0.015	3.823	.000
		Sensory experience arousal	.017	0.017	0.570	.569
		AOA	.059	0.053	1.833	.067
		Frequency	−0.119	0.020	−3.831	.000
		Number of strokes	.040	0.005	1.276	.202
		Number of components	.030	0.035	1.084	.278
		Number of words (L)	.030	0.001	1.106	.269
		Number of words (R)	.013	0.000	0.554	.579
	Word-not	Number of meanings	−0.166	0.013	−3.645	.000
		Concreteness	.150	0.029	2.904	.004
		Imageability	.181	0.035	3.172	.002
		Valence	−0.052	0.026	−1.138	.256
		Emotion arousal	.108	0.032	2.202	.028
		Sensory experience arousal	.075	0.038	1.384	.167
		AOA	.172	0.102	2.728	.007
		Frequency	−0.033	0.043	−0.602	.548
		Number of strokes	−0.059	0.010	−0.976	.330
		Number of components	.002	0.074	0.030	.976
		Number of words (L)	.060	0.003	1.226	.221
		Number of words (R)	.052	0.002	1.019	.309
Right	Word	Number of meanings	−0.220	0.005	−8.556	.000
		Concreteness	.086	0.017	2.891	.004
		Imageability	.095	0.018	2.950	.003
		Valence	−0.062	0.014	−2.788	.005
		Emotion arousal	.160	0.016	6.230	.000
		Sensory experience arousal	.048	0.018	1.559	.119
		AOA	.031	0.057	0.951	.342
		Frequency	−0.164	0.021	−5.216	.000
		Number of strokes	.035	0.005	1.097	.273
		Number of components	.037	0.038	1.305	.192
		Number of words (L)	.055	0.001	2.016	.044
		Number of words (R)	−0.021	0.000	−0.884	.377
	Word-not	Number of meanings	−0.155	0.013	−3.487	.001
		Concreteness	.116	0.028	2.322	.021
		Imageability	.200	0.035	3.554	.000
		Valence	.040	0.025	0.889	.374
		Emotion arousal	.155	0.032	3.223	.001
		Sensory experience arousal	.080	0.037	1.494	.136
		AOA	.098	0.098	1.581	.115
		Frequency	−0.190	0.041	−3.576	.000
		Number of strokes	−0.106	0.009	−1.811	.071
		Number of components	−0.002	0.072	−0.046	.963
		Number of words (L)	−0.019	0.003	−0.400	.689
		Number of words (R)	.144	0.002	2.912	.004

Discussion

As expected, the results suggested clear answers to the research questions. SWC scores were significantly affected by whether the constituent characters were word or word-not characters (Wording) and by whether they were left or right characters (Positioning) in parallel. The prediction of SWC by the character norm features was mediated by Wording and Positioning, which is particularly important for the understanding of the relationship between 2C-words and their constituent characters.

Influences of Wording and Positioning on SWC

The results of the mixed model analysis suggested that the SWC was stronger for the word-not than for the word characters and was stronger for the right than for the left characters. These findings are consistent with the evidence that word-not characters are more strongly associated than word characters with 2C-words (Gao et al., 2022; Shimomura, 1999; M. Wu, 2008), and that right characters are more strongly associated semantically than left characters with 2C-words (Fu, 2003; Li, 2019; Z. W. Lu et al., 1957; Yuan & Huang, 1998; J. Zhou, 2006). A word-not character is semantically constrained in the context of everyday language (M. Wu, 2008). For instance, the dictionary definition of the word-not character “身” is body, but it only appears in words such as “出身” (family background) and “终身” (lifelong). Since it is not a word in its own right, a reader may only be able to infer its meaning by how it is used within words (N. Wang, 1999). A greater priming effect was observed for the word-not character than for the word character primers in a lexical decision task (Gao et al., 2022; Shimomura, 1999), which can be interpreted with the interactive-activation model (Taft, 1994, 2003). Nodes were activated only at the character level for the word-not character primers. For the word character primers, however, nodes were activated both at the character level and at the word level. The word-level activation might have inhibited the priming effect of the word character primers. The word-level processing of the word character primers may have interfered with recognition of the 2C-word targets. All of this evidence appears to suggest a closer semantic association between word-not characters and 2C-words than between word characters and 2C-words.

Most 2C-words can be divided into five categories according to the syntactic relations between their constituent characters (Ge, 2018). For example, the constituent characters in “飞机” (plane), “窗户” (window), “地震” (earthquake), “开车” (to drive a car) and “提高” (to improve) form structure of modification, structure of coordination, structure of predication, predicate-object structure and verb-complement structure, respectively. The largest category is 2C-words with modification structure, in which the left and right characters are modifier and semantic head, respectively (Fu, 2003; Li, 2019; Z. W. Lu et al., 1957; Yuan & Huang, 1998; J. Zhou, 2006). The right characters are endocentric and are more closely associated than the left characters to the 2C-words in semantics (Yan, 2007). Therefore, the results appear to show that the right characters were more closely associated than the left characters with the corresponding 2C-words in semantics.

Prediction of SWC by Character Norm Features

Summary of Results

The 12 norm-feature predictors listed in Table 2 could be grouped into four categories (Song & Li, 2021): semantic features (number of meaning, concreteness, imageability, emotion arousal, valence and sensory experience arousal), frequency features (frequency and AOA), visual features (number of strokes and number of components) and number-of-word features (number of words (L) and number of words (R)). As summarized in Table 3, the prediction of SWC by the constituent character norm features was mediated by Wording and Positioning.

Table 3.

Norm Features of the Constituent Characters Significantly Predicting SWC at Each Treatment Level of Wording by Positioning.

	Word	Word-not
Left	Number of meaningsConcretenessImageabilityEmotion arousalValence	Number of meaningsConcretenessImageabilityEmotion arousal
Left	Frequency	AOA
Right	Number of meaningsConcretenessImageabilityEmotion arousalValence	Number of meaningsConcretenessImageabilityEmotion arousal
	Frequency	Frequency
	Number of words (L)	Number of words (R)

First, four semantic features (number of meanings, concreteness, imageability and emotion arousal) of the constituent characters significantly predicted the strength of SWC. The higher the valence scores of the word characters, the smaller the SWC scores. However, the SWC scores were not significantly predicted by the valence scores of the word-not characters. The scores for sensory experience arousal of the constituent characters did not significantly predict changes in SWC scores.

Second, SWC scores became significantly smaller as the frequency scores of the left- and the right-word characters increased. The frequency score increase of the right-word-not characters resulted in a significant decrease in the SWC scores, but this change was not seen with respect to the left-word-not characters. An increase in AOA of the left-word-not characters led to a significant increase in SWC scores, but that of the right-word-not characters did not. The AOA scores of the left- or the right-word characters did not predict the SWC scores.

Third, SWC scores were significantly predicted by the scores of number of words (L) of the right-word characters and number of words (R) of the right-word-not characters. The number-of-word scores of the constituent characters did not have other predictions for SWC strength.

Fourth, the scores for number of strokes and number of components of the constituent characters did not significantly predict SWC scores.

Implications

2C-word recognition is subject to changes in constituent character features (Miwa et al., 2014; Peng et al., 1999; Sun et al., 2018; Tse et al., 2022; Tse & Yap, 2018; B. Zhang & Peng, 1992), and the relationship between 2C-words and their constituent characters is semantic in nature (Dronjic, 2011; Tse & Yap, 2018; X. Zhou & Marslen-Wilson, 2000). The multi-linear regression analysis results allow some specifications on these theoretical understandings.

The essentially semantic relationship between 2C-words and their constituent characters (Dronjic, 2011; Tse & Yap, 2018; X. Zhou & Marslen-Wilson, 2000) seems to be strongly confirmed by the finding that SWC was significantly predicted by number of meanings, concreteness, imageability and emotion arousal of the constituent characters. Valence and emotion arousal are closely associated as two critical dimensions of emotional semantics (Yao et al., 2017; Yee, 2017). In comparison with emotion arousal, however, only the valence score of the word constituent characters contributed significantly to the prediction of SWC. This may imply that valence may be perceivable for word but not for word-not characters. The scores of constituent-character sensory experience arousal did not predict changes in SWC, suggesting that this feature is less perceivable than other five semantic features of the constituent characters in 2C-word processing.

2C-word recognition is influenced by changes in frequency of constituent characters (Peng et al., 1999; Sun et al., 2018; Tse et al., 2022; Tse & Yap, 2018; B. Zhang & Peng, 1992). However, the findings in the present study regarding the frequency scores of the constituent characters may be of limited generalizability for the left-word-not characters. That AOA of the constituent characters affects 2C-word recognition might also be limited, since SWC was not significantly predicted by AOA scores of the left-word, right-word and right-word-not characters.

The finding that visual features did not significantly predict changes in SWC has two implications. First, participants should have been highly familiar with the m-CCs and therefore there may have been a ceiling effect of the influence of visual features on their perception of the constituent characters. Second, participants might also have tended to ignore their perception of the visual features of the constituent characters in the task of evaluating the strength of SWC. In other words, skilled readers may ignore the visual complexity of the constituent characters when semantically processing 2C-words. Consistent with the mixed model analysis result that the SWC was stronger for the right than for the left characters, number of words (L) of the right-word characters and number of words (R) of the right-word-not characters significantly predict changes in SWC scores.

Conclusion

The present study has confirmed earlier work that word-not characters are more strongly associated than word characters with 2C-words (Gao et al., 2022; Shimomura, 1999; M. Wu, 2008), and that right characters are more strongly than left characters associated with the 2C-words in semantics (Fu, 2003; Li, 2019; Z. W. Lu et al., 1957; Yuan & Huang, 1998; J. Zhou, 2006). The present study is also the first to provide evidence concerning the prediction of SWC by the norm features of the constituent characters. First, skilled readers’ perception of the semantic features, frequency features and number-of-word features of the constituent characters may be mediated by Wording and Positioning in 2C-word semantic processing. However, they are not likely to perceive the visual features of the constituent characters. Rather, they seem to take the constituent characters of m-CCs as individual units, which should be highly familiar to them. Second, in a semantic task on 2C-words, skilled readers may process the constituent characters in number of meanings, concreteness, imageability and emotion arousal, but not in sensory experience arousal. There appears to be a close association in valence between the 2C-words and the word characters, but not between the 2C-words and the word-not characters. These findings strongly support the theoretical argument that both words and characters should be taken as language units in Chinese (Chen, 2014; Dong, 2004; J. Zhou, 2010, 2019). However, Wording and Positioning should be considered carefully when considering a CC as a language unit. These findings may be of more general significance for semantic understanding of compound words.

The study has some limitations. Familiarity and semantic transparency scores did not seem to be valid for the constituent characters of the sample words, which is likely because participants were very familiar with the m-CCs. Otherwise, the present study would likely have achieved a deeper understanding of the semantic overlap between 2C-words and the constituent characters. Participants were college students, and therefore not representative of how other populations might perceive relationships between 2C-words and their constituent characters. For example, children often confuse the positioning of the constituent characters and further work from a developmental perspective may be warranted.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by research Grants awarded to Li Degao by the National Social Science Fund of China under Grant 21AZD139.

ORCID iDs

Lifeng Xue

Degao Li

Wenling Ma

References

Bates

Maechler

Bolker

(2011). Lme4: Linear mixed–effects models using S4 classes. R package version 0.999375–42. http://CRAN.R-project.org/package=lme4

Cai

Brysbaert

(2010). SUBTLEX-CH: Chinese word and character frequencies based on film subtitles. PLoS One, 5(6), e10729.

Cai

Z. G.

Huang

Zhao

(2022). Objective ages of acquisition for 3300+ simplified Chinese characters. Behavior Research Methods, 54, 311–323.

Chang

Y. N.

Hsu

C. H.

Tsai

J. L.

Chen

C. L.

Lee

C. Y.

(2016). A psycholinguistic database for traditional Chinese character naming. Behavior Research Methods, 48(1), 112–122.

Chen

(2014). Necessary condition of grammatical description: Two kinds of grammatical units—A response to Prof. Pan et al on syntactic units. Essays on Linguistics, 58(1), 280–311. 357.

Cheng

Liu

(2018). The developmental trajectories of oral vocabulary knowledge and its influential factors in Chinese primary school students. Acta Psychologica Sinica, 50(2), 206–215.

Dong

(2004). Thesaurus and morphology of Chinese. Peking University Press.

Dronjic

(2011). Mandarin Chinese compounds, their representation, and processing in the visual modality. Writing Systems Research, 3(1), 5–21.

A. P.

(2003). The way of word formation and the recognition and understanding of compound words in Chinese information processing. Applied Linguistics, 12(4), 25–33.

10.

Gagné

C. L.

Spalding

T. L.

Schmidtke

(2019). LADEC: The large database of English compounds. Behavior Research Methods, 51, 2152–2179.

11.

Gao

Wang

Zhao

C. G.

Yuan

(2022). Word or morpheme? investigating the representation units of l1 and l2 Chinese compound words in mental lexicon using a repetition priming paradigm. International Journal of Bilingual Education and Bilingualism, 25(7), 2382–2396.

12.

B. Y.

(2018). Modern Chinese lexicology. The Commercial Press.

13.

Huang

H. W.

Lee

C. Y.

Tsai

J. L.

Lee

C. L.

Hung

D. L.

Tzeng

O. J.

(2006). Orthographic neighborhood effects in reading Chinese two-character words. Neuroreport, 17(10), 1061–1065.

14.

Huang

H. W.

Lee

C. Y.

Tsai

J. L.

Tzeng

O. J.

(2011). Sublexical ambiguity effect in reading Chinese disyllabic compounds. Brain and Language, 117(2), 77–87.

15.

Institute of Linguistics, Chinese Academy of Social Sciences. (2011). Xinhua Dictionary. The Commercial Press.

16.

Juhasz

B. J.

Lai

Y. H.

Woodcock

M. L.

(2015). A database of 629 English compound words: Ratings of familiarity, lexeme meaning dominance, semantic transparency, age of acquisition, imageability, and sensory experience. Behavior Research Methods, 47(4), 1004–1019.

17.

Kang

(2004). A study of modern Chinese grammar for information processing. Shanghai Lexicographical Publishing House.

18.

(2019). The semantic construction and conceptual construal of Chinese pseudo-modification compounds. Language Teaching and Linguistic Studies, 41(1), 80–90.

19.

Liu

P. D.

McBride-Chang

(2010). Morphological processing of Chinese compounds from a grammatical view. Applied Psycholinguistics, 31(4), 605–617.

20.

Liu

Shu

(2007). Word naming and psycholinguistic norms Chinese. Behavior Research Methods, 39(2), 192–198.

21.

Wang

(2006). The regularized gram-meaning of common grams and equations indicating word-meaning of bigram-words. Chinese Language Learning, 27(2), 3–13.

22.

Z. W.

, et al. (1957). Chinese morphology. Science Publishers.

23.

(1953). Handouts on Chinese language grammar. China Youth Publishing House.

24.

(1983). Language teaching and learning, in essays on Chinese language by Lv Shuxiang. The Commercial Press.

25.

McBride

C. A.

(2016). Is Chinese special? Four aspects of Chinese literacy acquisition that might distinguish learning Chinese from learning alphabetic orthographies. Educational Psychology Review, 28(3), 523–549.

26.

Miwa

Libben

Dijkstra

Baayen

(2014). The time-course of lexical activation in Japanese morphographic word recognition: Evidence for a character-driven processing model. The Quarterly Journal of Experimental Psychology, 67(1), 79–113.

27.

Newcombe

P. I.

Campbell

Siakaluk

P. D.

Pexman

P. M.

(2012). Effects of emotional and sensorimotor knowledge in semantic processing of concrete and abstract nouns. Frontiers in Human Neuroscience, 6(275), 275–315.

28.

Pan

W. G.

(2002). Chinese character as the very basic unit of Chinese grammar and studies on the Chinese language. East China Normal University Press.

29.

Peng

Liu

Wang

(1999). How is access representation organized? The relation of polymorphemic words and their morphemes in Chinese. In Wang

Inhoff

A. W.

Chen

H–C.

(Eds.), Reading Chinese script: A cognitive analysis (pp. 65–89). Lawrence Erlbaum.

30.

R Development Core Team. (2012). An introduction to R, version 2.15.1 R foundation for statistical computing. Retrieved August 29, 2012, from http://cran.r-project.org/

31.

Science Publishers. (1988). Dictionary of Chinese characters information. Science Publishers.

32.

Shimomura

(1999). Kanji lexicality effect in partial repetition priming: The relationship between Kanji Word and Kanji character processing. Brain and Language, 68, 82–88.

33.

Song

D. G.

(2021). Psycholinguistic norms for 3,783 two–character words in simplified Chinese (pp. 1–15). Sage Open.

34.

State Language Affairs Commission. (1988). Commonly-used characters in contemporary Chinese. Language & Culture Press.

35.

State Language Affairs Commission. (2008). Lexicon of common words in contemporary Chinese. The Commercial Press.

36.

Sun

C. C.

Hendrix

Baayen

R. H.

(2018). Chinese lexical database (CLD) : A large-scale lexical database for simplified Mandarin Chinese Behavior Research Methods, 50, 2606–2629.

37.

Sze

W. P.

Rickard Liow

S. J.

Yap

M. J.

(2014). The Chinese Lexicon Project: A repository of lexical decision behavioral responses for 2,500 Chinese characters. Behavior Research Methods, 46(1), 263–273.

38.

Taft

(1994). Interactive-activation as a framework for understanding morphological processing. Language and Cognitive Processes, 9(3), 271–294.

39.

Taft

(2003). Morphological representation as a correlation between form and meaning. In Assink

Sandra

(Eds.), Reading Complex Words (pp. 113–138). Kluwer.

40.

Tsang

Y. K.

Chen

H. C.

(2010). Morphemic ambiguity resolution in Chinese: Activation of the subordinate meaning with a prior dominant-biased context. Psychonomic Bulletin & Review, 17(6), 875–881.

41.

Tsang

Y. K.

Chen

H. C.

(2013a). Early morphological processing is sensitive to morphemic meanings: Evidence from processing ambiguous morphemes. Journal of Memory and Language, 68(3), 223–239.

42.

Tsang

Y. K.

Chen

H. C.

(2013b). Morpho-semantic processing in word recognition: Evidence from balanced and biased ambiguous morphemes. Journal of Experimental Psychology. Learning, Memory, and Cognition, 39(6), 1990–2001.

43.

Tsang

Y. K.

Huang

Lui

Xue

Chan

Y. W. F.

Wang

Chen

H. C.

(2018). MELD-SCH: A mega-study of lexical decision in simplified Chinese. Behavior Research Methods, 50(5), 1763–1777.

44.

Tse

C. S.

Yap

M. J.

(2018). The role of lexical variables in the visual recognition of two-character Chinese compound words: A megastudy analysis. The Quarterly Journal of Experimental Psychology, 71(9), 2022–2038.

45.

Tse

C. S.

Chan

Y. L.

Yap

M. J.

Tsang

H. C.

(2022). The Chinese Lexicon Project II: A megastudy of speeded naming performance for 25,000+ traditional Chinese two-character words. Behavior Research Methods, 49, 1–21.

46.

Wang

(2017). Study on the repeated relationship among commonly-used character, one-character morpheme, commonly-used one-character word and character-forming component [Master Dissertation]. Ji’nan University.

47.

Wang

(2011). Single Chinese character-based study on modern Chinese morphology. The Commercial Press.

48.

Wang

(1999). On the historical causes of condensation of the original disyllable compound words. A collection of classical literature and Cultural Essays (the second album). Hangzhou University Press.

49.

Wang

Huang

Zhou

Cai

Z. G.

(2020). Chinese character handwriting: A large-scale behavioral study and a database. Behavior Research Methods, 52, 82–96.

50.

(2008). Research on the Attributes of high frequency modern Chinese Morpheme items: The grades of morpheme items for TCSL [Doctoral dissertation]. Communication University of China.

51.

(2000). Chinese vocabulary expert. Zhejiang Literature and Art Press.

52.

Wang

(2009). Summary of modern Chinese vocabulary. Foreign Language Teaching and Research Press.

53.

T. Q.

(2010). Selected writings of Xu Tongqiang. The Peking University Press.

54.

(2020). Concreteness/abstractness ratings for two-character Chinese words in MELD-SCH. PLoS One, 15(6), e0232133.

55.

Yan

H., J

., (2007).The study on the semantic structure of compound words in contemporary Chinese [Doctoral dissertation]. Capital Normal University.

56.

Yao

Zhang

Wang

(2017). Norms of valence, arousal, concreteness, familiarity, imageability, and context availability for 1,100 Chinese words. Behavior Research Methods, 49, 1374–1385.

57.

Yee

L. T. S.

(2017). Valence, arousal, familiarity, concreteness, and imageability ratings for 292 two-character Chinese nouns in Cantonese speakers in Hong Kong. PLoS One, 12(3), e0174569.

58.

Yin

H. S.

(2013). Language and situated simulation theory: Concept and outlook. Psychological Exploration, 33(4), 308–314.

59.

Yuan

C. F.

Huang

C. N.

(1998). Study on Chinese morphemes and word formation based on the Chinese morpheme data bank. Chinese Teaching in the World, 12(2), 7–12.

60.

Zhang

Peng

(1992). Decomposed storage in the Chinese lexicon. In Chen

H.-C.

Tzeng

(Eds.), Language processing in Chinese (pp. 131–149). North-Holland.

61.

Zhang

(1997). Statistical analysis of basic characters in Chinese word formation. Language Teaching and Linguistic Studies, 19(1), 42–51.

62.

Zhao

(2012). Meaning extraction of Chinese characters from words in modern Chinese and the teaching of vocabulary. Chinese Teaching in the World, 26(3), 379–389.

63.

Zhou

(2006). Study on the Chinese lexicology and lexicography. The Commercial Press.

64.

Zhou

(2010). Analyzing the relationship between Zi and Ci and improving the teaching of Zi and Ci. Applied Linguistics, 1, 97–105.

65.

Zhou

(2019). A survey of modern Chinese lexicology. Peking University Press.

66.

Zhou

Marslen-Wilson

(2000). Lexical representation of compound words: Cross-linguistic evidence. Psychologia, 43, 47–66.