Abstract
We investigated the effect of part of speech adoption on the utterance length of Mandarin-speaking children. A total of 209 typically developing Taiwanese children aged 3–6 years participated in the study. They included 90 boys and 119 girls recruited from preschools in Miaoli City, New Taipei City, and Taipei City. We collected children’s language samples in four contexts: conversation (school life), story retelling (“Little Red Riding Hood”), free play, and conversation (daily life). One-on-one conversation with participants comprised the main interaction form. Language samples collected from the participants were transcribed and analyzed; all words in the samples were coded with part of speech tagging. Part of speech factors that could predict utterance length were examined using stepwise regression. The results showed that prepositions and conjunctions had positive effects on utterance length while interjections had a negative effect. Increasing utterance length may be a teaching or clinical intervention goal for some children. Educators and clinicians can consider guiding such children to use prepositions and conjunctions to increase their utterance length.
Plain language summary
We investigated the effect of part of speech adoption on the utterance length of Mandarin-speaking children. A total of 209 typically developing Taiwanese children aged 3–6 years participated in the study. They included 90 boys and 119 girls recruited from preschools in Miaoli City, New Taipei City, and Taipei City. Language samples collected from the participants were transcribed and analyzed; all words in the samples were coded with part of speech tagging. Part of speech factors that could predict utterance length were examined using stepwise regression. The results showed that prepositions and conjunctions had positive effects on utterance length while interjections had a negative effect. Increasing utterance length may be a teaching or clinical intervention goal for some children. Educators and clinicians can consider guiding such children to use prepositions and conjunctions to increase their utterance length.
Children’s utterance length increases as they grow and reflects their syntactic development. Words are used to form utterances; thus, children’s word acquisition is fundamental for their syntactic development (Nóro & Mota, 2019). The words used in children’s utterances can be categorized into different classes. The part of speech (POS) system is essential for categorizing words based on their grammatical roles. English language encompasses eight fundamental POS categories: nouns, pronouns, verbs, adjectives, adverbs, prepositions, conjunctions, and interjections (Atwell, 2008). Previous studies have shown developmental changes from the perspective of POS. Research by Tsai (2009) into the usage of prepositions and conjunctions by children revealed an increase in their utilization with age. Y. J. Liu (2011) discovered that typically developing (TD) 4-year-olds employed nouns more frequently and diversely than 3-year-olds. Moreover, H. M. Liu and Chen (2015) found that in the vocabulary acquisition process, children initially grasp content words like nouns and verbs before mastering function words such as conjunctions and prepositions.
As children age, their language abilities naturally evolve. With this maturation, older children tend to employ certain parts of speech (POS) words more often than their younger counterparts. Moreover, the length of their utterances tends to expand as they grow (Rice et al., 2010). In the process of learning to craft sentences, children may specifically choose certain types of words that enable them to create more extended utterances. Previous studies have explored the relationship between the word classes children use in their utterances and the length of their utterances. It has been suggested that certain word classes are particularly important for helping children form longer utterances (Le Normand & Thai-Van, 2022; Le Normand et al., 2013).Mandarin is often noted for its absence of inflectional morphemes that denote aspects such as number, tense, gender, and case, relying instead on word types and word order to convey semantic shades and grammatical relationships (Erbaugh, 1992; C. N. Li & Thompson, 1981; Shi & Huang, 2016). Given the scarcity of grammatical morphemes, the acquisition of a diverse vocabulary becomes crucial for Mandarin-speaking children to form both accurate and extended utterances (C. N. Li & Thompson, 1981; Yu et al., 2011). These children tend to utilize a variety of word types to construct both correct and increasingly complex sentences. The study of utterance length, from the perspective of parts of speech (POS), is therefore vital to identify which POS categories are instrumental for these children in creating longer utterances. Despite the importance of this inquiry, existing research on the topic remains sparse. To address this gap, our study investigates the link between POS categories and utterance length among Mandarin-speaking preschoolers in Taiwan.
Linguistic Features of Mandarin
English-speaking children must acquire inflectional morphology to construct grammatical sentences. Thus, children’s acquisition of inflectional morphology has been investigated in various studies. Mandarin, however, contains little inflectional and derivational morphology. Mandarin is commonly addressed as lacking inflectional morphemes to indicate aspects like number, tense, gender, and case (Erbaugh, 1992; C. N. Li & Thompson, 1981; Shi & Huang, 2016). Instead, semantic nuances and grammatical relationships between words are predominantly conveyed through different types of words and word order (Cheung et. al., 2024). As grammatical morphemes are rare in Mandarin, word acquisition is particularly important for children to construct long and correct utterances (C. N. Li & Thompson, 1981; Yu et al., 2011). For tense markers, aspect markers, such as “le,” “zhe,” and “guo,” instead of grammatical morphemes are used in Mandarin. Tardif (2006) observed that Mandarin-speaking children begin marking verbs with the aspect marker “le” at age two, followed shortly by “zai” for progressive aspect marking. Acquisition of the other aspectual markers, “guo” for experiential and “zhe” for durative, typically occurs about a year later (Chen & Shirai, 2010; Cheung, 2008). To address the past tense of “eat,” Mandarin-speaking children say “chī le,” and to address the progressive “-ing” of eat, they say “chī zhe.” Instead of adding grammatical morphemes, they combine different types of words and verbs to create correct forms. C.-T. J. Huang et al. (2009) suggested, therefore, that to understand Mandarin grammar, we must understand the POS classification of words and word composition. As POSs and word composition are key for Mandarin-speaking children to form grammatical sentences, POS frequency and composition are particularly important when investigating Mandarin-speaking children.
Since Mandarin lacks grammatical morphemes, mastering diverse POS of words becomes crucial for Mandarin-speaking children to develop their language skills effectively. Learning to use different POS of words correctly may also enable Mandarin-speaking children to form longer utterances. For instance, the insertion of the modifier “de” between two nouns enables various modifier-modified relationships, such as possessor-possessed. Mandarin-speaking children are able to use the modification structure of “de” extensively when they are age of 2 (Erbaugh, 1992). When children used “de,” they may connect two nouns (i.e., Mommy’s hair, mā de tóu fà) and may increase the utterance length. When children form more complex utterances, they tend to produce longer utterances (Scarborough et al., 1991). Syntactic development extends beyond simple declarative sentences into constructions involving multiple verbs, such as pivotal construction (object control) and serial verb construction. Mandarin-speaking children begin to employ these complex predicates at age two (Xue et al., 2022; Yang & Yang, 2015). Conjunctions may be a type of word that effect utterance length. In Mandarin, although clauses can be juxtaposed without conjunctions to form complex or compound sentences (Erbaugh, 1992; Xue et al., 2022), using conjunctions in children’s production can still increase their utterances length. Children generally use conjunctions to connect two clauses. Conjunctions are vital for cohesion and coherence, playing a pivotal role in sentence construction across various lengths and complexities, ranging from simple to compound sentences (Halliday & Hasan, 1976). As such, there exists a direct correlation between the presence of conjunctions and the length of sentences. The use of words from diverse POS categories is crucial for Mandarin-speaking children in forming utterances, and the types of words they choose can influence the length of their utterances.
Children’s Utterance Length
It has been suggested that children’s utterance length relates to their grammatical ability. The mean length of utterance (MLU) is a developmental indicator of children’s grammatical ability; a longer MLU indicates that a child has better grammatical skills. MLU is positively related to children’s age; some studies have suggested that MLU is sensitive to the age of typically developing (TD) children when the MLU value of a morpheme is below 2.5–3.0 (Klee, 1992; Rondal et al., 1987). Other studies have found that MLU remains sensitive to age, even past Brown’s stage V (Brown, 1957). Miller (1981) examined the relationship between child age and MLU measured in morphemes in a sample of 123 children aged 17–59 months and found that MLU increased at an average rate of 1.2 morphemes per year. Rice et al. (2010) analyzed MLU in 1,564 language samples of 306 children aged 2 years and 6 months to 9 years, comprising 170 children with specific language impairments and 136 TD children. They revealed an age progression in MLU up to 6 years. Regarding studies about MLU in Mandarin, in Cheung’s (1998) study, five children aged between 1;6 and 3;6 were observed over a span of 2 years, with 67 one-hour spontaneous speech samples analyzed. It was discovered that the Mean Length of Utterance (MLU) exhibited a significant correlation with age. Additionally, language samples were collected from 80 children aged 4 to 7. It was observed that MLU scores for 6-year-olds were notably higher than those of 4- and 5-year-olds, while MLU scores of 7-year-olds surpassed those of 4-year-olds significantly. In a subsequent study by Jin and Jin (2009) in Shanghai, language samples from 50 children aged 4 to 6 were analyzed, further confirming a correlation between MLU scores and the children’s ages. R. J. Huang et al. (2016) collected language samples of Mandarin-speaking children aged 3–4 years for MLU analysis. Their results showed that the MLU of 4-year-olds was significantly higher than that of 3-year-olds. Wu (2020) further revealed that the MLU of 6-year-olds was significantly higher than that of 5-year-olds. Tang et al. (2023) also observed shorter utterance length associated with the younger age group in general for Mandarin-speaking children aged from 25 to 60 months. These findings collectively underscored the sensitivity of MLU to age-related language development.
MLU was applied widely in studies about children’s syntactic performance and development. Xu et al., (2022) examined word order acquisition of Mandarin-speaking typically developing children and children autism spectrum disorders. They analyzed the correlation between caregivers and children’s MLU to study the effects of caregivers’ input. Kuo and Lin (2021) examined the validity of a syntactic measure, Index of Productive Syntax (IPSyn; Scarborough, 1990) in Mandarin. They reported a significant correlation between MLU and the overall Mandarin IPSyn scores. Cheung et al. (2024) developed a syntactic measure, the Mandarin Assessment of Productive Syntax-Revised (MAPS-R), for evaluating syntactic abilities in Mandarin-speaking children. They analyzed MLU to support the validity of MAPS-R. The results showed Significant age-related differences in both MLU and MAPS-R scores. They also found strong correlations between MLU and MAPS-R scores. It was suggested that the results confirmed the validity of MAPS-R as a measure of syntactic development.
MLU was also broadly used in studies about evaluating children with language impairments. Hao et al. (2018) studied narrative production of Mandarin-speaking children with language impairment and used MLU as one of the measures. They found children with LI showed shorter MLU than typically developing children. Wu (2020) analyzed MLU of Mandarin-speaking children with developmental language disorders (DLD) aged 5 and 6 years old and found that children with DLD had lower MLU than typically developing children. These studies showed that MLU was a widely used developmental measure with strong validity.
Importance of POSs in Child Language
The POS system helps categorize words according to their grammar roles. There are eight basic POSs in English: nouns, pronouns, verbs, adjectives, adverbs, prepositions, conjunctions, and interjections (Atwell, 2008). POS classifications were often used in theories about sentence structure, such as X bar theory (Chomsky, 1970) and structural linguistics (Saussure, 1916). In these theories, sentences were described by organizing different POS categories such as noun, verbs, and adjectives. The investigation of language development and language disorders often requires classifying words into different categories to analyze which kinds of words are used during different developmental stages. The POS classification method is commonly used for exploring children’s use of words, such as verbs, nouns, and so on.
The classification of words in Mandarin is based on the functions and characteristics of different syntactic structures. There are several classifications for POSs in Mandarin (Tsai, 2009). Ho (2005) classified 13 POSs: nouns, verbs, adjectives, numerals, quantifiers, pronouns, adverbs, prepositions, conjunctions, auxiliary words, modal particles, interjections, and onomatopoeia. Y. H. Liu et al. (1996), meanwhile, revealed 12 commonly used POSs: nouns, verbs, adjectives, numbers, classifiers, pronouns, adverbs, prepositions, conjunctions, particles, onomatopoeia, and interjections. Wu et al. (2021) adopted 11 of the POS categories from Y. H. Liu et al. (1996), except for onomatopoeia, as it was not considered effective for classifying words. Regarding the 11 POS categories, nouns constitute words that indicate the name of a person, object, or thing; verbs express the actions, behaviors, development, and changes of a person or thing; adjectives describe the shape, characteristics, temperament, and state of a person or thing; numerals indicate a number or sequence; quantifiers represent quantities of people, things, or units; pronouns are used in place of or to demonstrate a noun; adverbs modify verbs, adjectives, or other adverbs; prepositions interpret words related to time, place, or objects; conjunctions connect two words; particles express additional meanings, such as tense (e.g., le, zhe); and interjections express emotions (Ho, 2005).
POSs have been widely used in linguistics and child language research, and previous studies have investigated children’s production of POSs. For example, Qi (2002) and Nelson (1973) studied the vocabulary of children and found that nouns accounted for the largest proportion of words used. Tsai (2009) analyzed the number of prepositions and conjunctions used by children and found that the number increased as children grew. Y. J. Liu (2011) suggested that frequencies and types of nouns used by TD children aged 4 years were significantly higher than those of children aged 3 years. H. M. Liu and Chen (2015) suggested that when children acquire vocabulary, they first learn content words, such as nouns and verbs, and then learn function words, such as conjunctions and prepositions. Zhang and Zhou (2020) analyzed the POS performance of 341 Mandarin-speaking children aged 3–5 years. Their results showed that children aged 5 years performed significantly better than children aged 3–4 years regarding nouns, verbs, classifiers, and prepositions. Regarding adverbs and conjunctions, children aged 5 performed significantly better than children aged 3–4 years, and children aged 4 years performed significantly better than children aged 3 years. There were no significant differences in pronouns and adjectives among all age groups. Accordingly, Zhang and Zhou (2020) suggested that nouns, verbs, adverbs, classifiers, adjectives, conjunctions, and prepositions could be used to differentiate children in different age groups. The aforementioned studies have shown that children of different ages have different POS performances, and POS analysis can be used to observe children’s vocabulary and grammar development.
Regarding children with language disorders, H. M. Liu and Lin (2017) used a standardized measurement (Mandarin-Chinese Communicative Development Inventory of Taiwan) to examine the expressive vocabulary development of 37 children with language delays and 32 TD children, aged 24 and 36 months, respectively. They analyzed the total vocabulary produced by the two groups using four POS categories: common nouns, predicates (verbs and adjectives), closed class (pronouns, prepositions and positions, conjunctions, demonstrative and quantitative words, and interrogative words), and other vocabulary (outdoor products and natural phenomena, games and daily activities, and characters and words related to time). They further analyzed the percentage of each vocabulary type in the total vocabulary size. The results showed that the total produced vocabulary of children with language delays was significantly lower than that of TD children. Even at 36 months, children with language delays could not catch up with the average vocabulary produced by TD children. The vocabulary size analysis for both groups revealed that common nouns were the most developed followed by predicates, other vocabulary, and the closed class.
Le Normand and Chevrie-Muller (1991) explored the differences between French-speaking TD children and children with specific language impairment by classifying words into lexical type, including nouns, verbs, adjectives, and adverbs; grammatical type, including prepositions, articles, pronouns, demonstratives pronouns, possessives, and personal pronouns; exclamative type, including onomatopoeia and interjections; and interrogative type, including interrogative words (e.g., what, where). The abovementioned studies show that the POS system has been widely used in research on child development and children with language disorders.
Relationship Between POSs and Utterance Length
Preschoolers increase their utterance length gradually as they grow up. They slowly expand their use of different word types, such as nouns, verbs, and adjectives, to increase their sentence length and complexity. Children initially use single nouns to identify objects or people. As their language skills develop, they begin to combine other POS categories and nouns in their sentences to describe or specify objects in more detail. For example, instead of saying “ball,” they may say “red ball” or “big ball.” This addition of descriptive nouns expands their sentence length. When using verbs, children initially use simple verbs to express basic actions, such as “run,” “eat,” or “play.” As they progress, they incorporate a variety of verbs and other POS categories into their sentences to convey different actions or events. For Mandarin-speaking children, they combine other POS categories of words and verbs to express past (e.g., wán le), progressive (e.g., wán zhe), and future (e.g., xiăng wán) actions. They may increase their sentence length as they learn to combine different POS categories of words.
Based on theories, the POS system has been used to explore and analyze the construction of sentences. From developmental observation, children use different POSs of words to form utterances and increase utterance length gradually. Hence, the relationship between POSs and utterance length is an issue worth exploring. Previous studies have often used POSs to explore children’s sentence components. For example, Le Normand and Chevrie-Muller (1991) examined the relationship between children’s POSs and MLU. They categorized the children based on their MLU (scores ranging from 1.23 to 1.81 = low MLU group; scores ranging from 2.45 to 2.86 = high MLU group). They conducted discriminant analysis to examine which POS categories could distinguish the two groups; their results showed that the children’s MLU was positively correlated with nouns, verbs, adjectives, prepositions, articles, demonstrative pronouns, personal pronouns, adverbs, and possessive numbers. Meanwhile, articles, prepositions, and verbs could distinguish the two groups. Their study indicated that when children used fewer articles, prepositions, and verbs, they were more likely to form shorter utterances and could not form longer ones. Therefore, the use of articles, prepositions, and verbs may affect MLU.
Le Normand et al. (2013) explored the relationship between POSs and MLU in 312 French-speaking children aged 2–4 years and categorized words into lexical, grammatical, and pragmatic types. Their results showed that MLU had no significant correlation with lexical words but had a significant positive correlation with grammatical words. Their results indicated that when children used more conjunctions, negative verbs, determiners, prepositions, and pronouns, they had longer MLU. They also found that pragmatic words had a significant negative correlation with MLU, indicating that when children used more adverbs and onomatopoeia, they had shorter MLU. Among 18 grammatical words, they revealed that pronouns, prepositions, and determiners were the best predictors for estimating MLU.
Szagun and Schramm (2019) collected language samples from children aged 1 year and 8 months to 2 years and 5 months to examine whether lexical and grammatical words are predictors of MLU. Their results showed that the lexical words of children aged 1 year and 8 months were predictors of MLU for children aged 2 years and 1 month, while grammatical words were predictors of MLU for children aged 2 years and 5 months. Among children aged 2 years and 1 month, articles and copula were the strongest predictors of MLU for children aged 2 years and 5 months. Mueller (2020) studied how function words affected the MLU of children aged 3–6 years; they also analyzed the correlation between MLU and five POS categories (i.e., determiners, prepositions, particles, conjunctions, and pronouns). The results showed that determiners, prepositions, and pronouns were positively correlated with MLU, while determiners (42%), prepositions (28%), and pronouns (24%) accounted for the variance in MLU.
Previous studies have indicated that the utterance length of Mandarin-speaking children increases as they grow (Tang et al., 2023; Wu, 2020). Wu et al. (2021) revealed significant differences in 10 POS categories (i.e., verbs, nouns, particles, prepositions, pronouns, classifiers, number words, adjectives, conjunctions, and adverbs) among different age groups; older children had higher word frequencies in these 10 POS categories than younger children. This indicates that when children have better language ability, they use more words in different categories and produce longer utterances. To date, no research has explored the effect of Mandarin-speaking children’s adoption of POS on their utterance length.
Learning various types of words is crucial for Mandarin-speaking children to form utterances since Mandarin is often described as lacking inflectional morphemes. The POS system is a widely used method to categorize different types of words and outline sentence structures. Previous studies, such as those by H. M. Liu and Chen (2015), have demonstrated that children acquire certain POS categories earlier than others. Additionally, research by Tsai (2009) and Wu et al. (2021) has highlighted that the use of some POS categories significantly increases with age. These findings suggest that developmental changes can be observed from the perspective of POS categories. It has also been shown that utterance length increases with age. Previous research (Wu, 2020; Wu et al., 2021), has provided valuable insights into the developmental trajectory of language in Mandarin-speaking children, highlighting how mean length of utterance (MLU) and the frequency of use across ten POS categories evolve with age. These findings underscore a critical aspect of language acquisition—that as children grow, not only do their utterances become longer, but their production of words in various POSs also increases. As children learn to use various types of words to construct sentences, they might utilize specific word types to form longer utterances. The observation that older children tend to use certain POS categories more frequently, and concurrently increase their utterance length, prompts the question of how these POS categories may influence the length of children’s utterances.
Despite these advancements in understanding, a significant gap remains in our comprehension of the intricate dynamics between the use of specific POS categories and the resulting impact on the length of children’s utterances. This lacuna in research is particularly pronounced concerning Mandarin-speaking children, for whom the relationship between the utterance length and the diversity of POS usage has yet to be thoroughly investigated.
Therefore, the current study seeks to bridge this gap by meticulously examining how the inclusion of different POS categories influences utterance length in Mandarin-speaking children. To investigate which POS categories children employ most frequently in their utterances, we analyzed the mean frequency of each POS category within utterances across various age groups. This analysis aims to offer fundamental insights into the frequency with which children utilize different POS categories in forming utterances. Furthermore, to assess whether specific POS categories are associated with longer utterances, we categorized utterances based on their inclusion or exclusion of certain POS (e.g., utterances containing nouns vs. those without). We then compared the lengths of utterances in these two groups for each POS category to determine if utterances containing particular POS were notably longer.
Our research aims to uncover underlying patterns and predictors of utterance length. This endeavor is not merely academic; understanding these dynamics has profound implications for educational strategies, speech therapy interventions, and the broader theoretical frameworks of language development.
The aim of the current study was to explore the effect of Mandarin-speaking children’s adoption of POSs on their utterance length. The research questions (RQs) were as follows:
What is the mean frequency of each POS category in all utterances of children in each age group?
Are the utterances containing a certain POS significantly longer than utterances that do not contain that POS? For example, are utterances with conjunctions significantly longer than utterances without conjunctions?
What POS categories are predictors of utterance length?
Method
Participants
We used the language samples of the same participants from Wu et al. (2021). This study’s participants included 209 TD children aged 3–6 years. The participants were divided into four groups (aged 3, 4, 5, and 6 years). The participants were recruited from Miaoli City, New Taipei City, and Taipei City in Taiwan. All participants speak Mandarin as their dominant language and Taiwanese was another language used by children. For children aged 5 and 6 years, their parents completed a questionnaire about their personal information, such as dominant language and medical history. The questionnaire results showed that all participants’ native language was Mandarin, with no diagnoses regarding language delay, language disorder, mental retardation, neurological abnormality, paresthesia, psychiatric disorder, or pervasive developmental disorder. The questionnaire also reported the participants’ demographic information. Regarding family financial status, 54% of the families were lower middle class, 31% were middle class, and 15% did not report their status. Regarding their language use, 98% of the reports indicated that Mandarin was children’s dominant language. According to the questionnaire responses, children also speak Taiwanese as another language. One child always spoke Taiwanese, two children often spoke Taiwanese, and four children sometimes spoke Taiwanese. Other information, such as the mother’s educational level and the main caregiver, was addressed in Wu et al. (2021). Most participants in this study had average income and college-level education, common in Taiwanese families. There were no recruitment criteria regarding socioeconomic status or parents’ educational level; therefore, low-income families and low-education parents were a minority in this study. Tests were administered to all participants using the Revised Preschool Language Scale and the Revised School-Age Language Scale (Lin et al., 2008). Table 1 shows the participants’ demographics and T scores of the test, which were all higher than the score of 1 standard deviation below the mean.
Participants’ Demographics and Scores.
Note. y = year; m = month.
Research Tools
Child Language Data Exchange System, Manual Transcription Analysis and Encoding System, and Computerized Language Analysis Program
We used MacWhinney’s (2014) Child Language Data Exchange System corpus as an efficient transcription analysis system and shared database of children’s language samples. It contains three system tools. The first is the children’s language sample data. In addition to the English corpus, there are samples in multiple languages provided by researchers in other countries. The second is the Codes for the Human Analysis of Transcripts, which can perform various types of encoding on the text to be analyzed as the basis for subsequent analysis. The third is the Computerized Language Analysis program, which can analyze basic discourse text types, pronunciation and syntax, and so on.
Academia Sinica Balanced Corpus
In Mandarin, each word may consist of one or more characters. Hyphenation is used to identify the combination of words in sequences and for segmentation. The Academia Sinica Balanced Corpus (Academia Sinica Taiwan, 2014) provides automatic word recognition and instant word segmentation services with 99% accuracy, without counting proper names and compound words. The Sinica Corpus divides continuous words into independent words according to the library of 80,000 words in the Academia Sinica dictionary. Words not listed in the dictionary are segmented into characters. We used the Sinica Corpus to complete the initial word segmentation. If the result of the word segmentation was unclear, manual correction was performed. Corrections were based on factors such as language context, sentence content, and Mandarin linguistics.
Research Process
We performed language sample collection, transcription, sentence selection, word segmentation, and analysis following the procedures in Wu (2020) and Wu et al. (2021). To ensure reliability, the transcription and analysis personnel were trained for 9 hr before the procedure. The researchers explained the collection of language samples, Codes for the Human Analysis of Transcripts method, use of the Sinica Corpus, and Computerized Language Analysis method based on the instruction manuals. All work was carried out by the researchers, bachelor’s and master’s degree students, and research assistants in speech therapy-related disciplines. To ensure sufficiently representative samples, we collected children’s language samples in four contexts: conversation (school life), story retelling (“Little Red Riding Hood”), free play, and conversation (daily life). One-on-one conversation with participants comprised the main interaction form. The collected language samples were transcribed, and 100 valid sentences were selected as the representative language samples. To test the transcription’s reliability, sentences from 10 children in each age group (aged 3–6 years) from each context were randomly selected and transcribed by a second transcriber. Transcription reliability was calculated according to the following formula: consistent words divided by consistent words plus discordant words. The average transcription reliability of the 40 sentences was 0.96. The researchers then segmented the selected 100 sentences into words. To reduce errors, we used the word segmentation system of the Mandarin Academy of Sciences (Academia Sinica Taiwan, 2014). The automatic word segmentation results were subsequently corrected by manual verification; the word segmentation reliability was .97.
POS Tagging
All words were marked with POS tagging in the transcriptions. Words were classified according to 11 categories (Wu et al., 2021): noun (N), verb (V), adjective (A), numeral (Neu), quantifier (Nf), pronoun (Nh), adverb (D), preposition (P), conjunction (C), particle (T), and interjection (I). All words were first automatically tagged using the POS categories and then manually reviewed by the researchers. POSs marked in the word segmentation system of the Mandarin Academy of Sciences were manually converted into 11 basic POSs. All POSs were tagged by the research assistants, who had master’s degrees in speech-language pathology or linguistics. The research assistants received week-long training on tagging POSs. The researchers and research assistants attended weekly meetings to recheck POS tagging. Disagreements and questions were resolved through discussion.
In the children’s sentence production, some utterances were long but contained extra words that were grammatically incorrect. Analyzing these grammatically incorrect utterances could have influenced the results for the factors predicting utterance length. To avoid analyzing erroneous utterances in long utterances, the researchers determined whether the utterances were grammatically correct through manual review and removed sentences containing grammatical errors. After exclusion, the total number of sentences was 3,800 (3-year-olds), 8,039 (4-year-olds), 5,574 (5-year-olds), and 3,675 (6-year-olds).
Analysis
Three analyses were conducted to explore the relationship between POSs and utterance length. First, the total frequencies of each POS were calculated and divided by the total number of utterances in all language samples for each age group (aged 3, 4, 5, and 6 years) to obtain the mean frequency of each POS. The maximum and minimum frequency of each POS category was also calculated. Second, for each POS, utterances were categorized into two groups: utterances with a particular POS and utterances without that POS. The MLU for the two groups was calculated, and a t-test was used to examine the significant differences between the mean lengths of the two groups. Cohen’s d (Cohen, 1988) was calculated to analyze effect size. T-test effect size values of 0.2, 0.5, and 0.8 are considered small, medium, and large, respectively (Cohen, 1988). Finally, forward stepwise linear regression was used to identify possible predictors of the utterance length of the following candidate variables: N, V, A, Neu, Nf, Nh, D, P, C, T, and I. At each step, variables were added when p < .05 and excluded when p > .1. Cohen’s f 2 (Cohen, 1988) was calculated to analyze effect size. According to Cohen (1992), effect size measures for ƒ 2 are 0.02, 0.15, and 0.35, indicating small, medium, and large, respectively.
Ethical Considerations
The parents of all participants provided written informed consent, and the study was reviewed and approved by the University Ethics Committee (201610ES010).
Results
For research question 1, we analyzed the mean, maximum, and minimum frequency of each POS category in all utterances of children in each age group. Tables 2–5 show the results.
Frequencies of Each Part of Speech for 3-Year-Olds.
Note. V = verb; N = noun; T = particle; P = preposition; Nh = pronoun; Neu = numeral; Nf = quantifier; I = interjection; D = adverb; C = conjunction; A = adjective.
Frequencies of Each Part of Speech for 4-Year-Olds.
Note. V = verb; N = noun; T = particle; P = preposition; Nh = pronoun; Neu = numeral; Nf = quantifier; I = interjection; D = adverb; C = conjunction; A = adjective.
Frequencies of Each Part of Speech for 5-Year-Olds.
Note. V = verb; N = noun; T = particle; P = preposition; Nh = pronoun; Neu = numeral; Nf = quantifier; I = interjection; D = adverb; C = conjunction; A = adjective.
Frequencies of Each Part of Speech for 6-Year-Olds.
Note. V = verb; N = noun; T = particle; P = preposition; Nh = pronoun; Neu = numeral; Nf = quantifier; I = interjection; D = adverb; C = conjunction; A = adjective.
For research question 2, we examined the significant differences between utterances with and without a certain POS category. The results showed that when using a character as a unit to calculate utterance length, there was no significant difference between the utterances that did and did not contain the POS category I for children aged 4, 5, and 6 years: t(855.905) = .370, p = .711; t(477.632) = 1.270, p = .205; and t(489.811) = 1.491, p = .137, respectively. For the remaining 10 POS categories, significant differences were shown in all categories between utterances with and without certain POS categories when using both a character and a word as a unit for all age groups. Effect sizes were also analyzed for children aged 3 years; large effect sizes were found for V, N, P, Nh, D, and C POSs; medium effect sizes for T, Nf, and Neu; and a small effect size for A. For children aged 4 years, large effect sizes were found for V, N, P, Nh, D, and C POSs; medium effect sizes for T, Nf, Neu, and A; and a small effect size for I. For children aged 5 years, large effect sizes were found for V, N, T, P, Nh, D, and C POSs; medium effect sizes for Nf and Neu; and small effect sizes for I and A. For children aged 6 years, large effect sizes were found for V, N, T, P, Nh, Nf, D, and C POSs; medium effect sizes for Neu and A; and a small effect size for I. Table 6 shows the results for all groups.
T-Test Results of 11 Part of Speech Categories for Children Aged 3, 4, 5, and 6 Years.
Note. Cohen’s d effect size = 0.20 small, 0.50 medium, and 0.80 large. df = degree of freedom; V = verb; N = noun; T = particle; P = preposition; Nh = pronoun; Neu = numeral; Nf = quantifier; I = interjection; D = adverb; C = conjunction; A = adjective.
Significant at the .01 level.
T-Test Results of the 11 Part of Speech Categories for Children Aged 3, 4, 5, and 6 Years (Cont’d).
Note. Cohen’s d effect size = 0.20 small, 0.50 medium, and 0.80 large. df = degree of freedom; V = verb; N = noun; T = particle; P = preposition; Nh = pronoun; Neu = numeral; Nf = quantifier; I = interjection; D = adverb; C = conjunction; A = adjective.
Significant at the .01 level.
Research question 3, three analyzed which POS categories were predictors of utterance length. Tables 7–10 show the model summaries, unstandardized and standardized coefficients, and significance of independent variables for children aged 3–6 years. Table 11 shows the model coefficient of determination (R 2) and Cohen’s f 2 of the stepwise analysis for children aged 3–6 years. The results of Cohen’s f 2 show that the effect sizes of the stepwise analysis were medium for all four age groups.
Stepwise Analysis of Utterance Length for 3-Year-Olds.
Note. V = verb; N = noun; T = particle; P = preposition; I = interjection; D = adverb; C = conjunction; A = adjective.
Stepwise Analysis of Utterance Length for 4-Year-Olds.
Note. V = verb; N = noun; P = preposition; Nh = pronoun; Neu = numeral; I = interjection; C = conjunction; A = adjective.
Stepwise Analysis of Utterance Length for 5-Year-Olds.
Note. N = noun; T = particle; P = preposition; I = interjection; D = adverb; C = conjunction; A = adjective.
Stepwise Analysis of Utterance Length for 6-Year-Olds.
Note. N = noun; T = particle; P = preposition; Neu = numeral; Nf = quantifier; I = interjection; C = conjunction; A = adjective.
R2 of Stepwise Regression for Children Aged 3–6 Years.
Note. Cohen’s f 2 effect size = 0.02 small, 0.15 medium, and 0.35 large.
For 3-year-olds, the model showed that N, P, C, A, V, T, I, and D were significant factors that could predict utterance length. Among these factors, N, A, V, and I had a negative effect on utterance length; P, C, T, and D had a positive effect on utterance length.
For 4-year-olds, the model showed that C, N, I, P, A, V, Nh, and Neu were significant factors that could predict utterance length. Among these factors, N, I, A, V, Nh, and Neu had a negative effect on utterance length; P and C had a positive effect on utterance length.
For 5-year-olds, the model showed that C, P, N, I, A, T, and D were significant factors that could predict utterance length. Among these factors, N, I, A, and Neu had a negative effect on utterance length; P, C, T, and D had a positive effect on utterance length.
For 6-year-olds, the model showed that C, P, I, N, Nf, T, A, and Neu were significant factors that could predict utterance length. Among these factors, I, N, A, and Neu had a negative effect on utterance length; P, C, Nf, and T had a positive effect on utterance length.
The results about significant POS predictors of utterances length were organized and summarized in Table 12.
Part of Speech Categories as Significant Predictors of Utterance Length in Four Age Groups.
Note. + indicates a positive effect, − indicates a negative effect, and x indicates an insignificant predictor. V = verb; N = noun; T = particle; P = preposition; Nh = pronoun; Neu = numeral; Nf = quantifier; I = interjection; D = adverb; C = conjunction; A = adjective.
Discussion
On average, children aged 3–6 years used at least one V and one N to form utterances. Their mean frequency increased with age, which indicates that children use more N and V in their utterances as they age. When ranking the mean frequency, the top POS categories were N, V, Nh, and D, in that order; their mean frequency increased with age. This indicates that these categories are important when children aged 3–6 years compose utterances.
The T and C POS categories ranked fifth for children aged 3–4 years and 5–6 years, respectively. This indicates that as children age, they use C more often to form utterances. Previous studies have also shown that children use more C as they age. D. Li et al. (1989) found that complex sentences developed rapidly among Mandarin-speaking children aged 4–5 years. These children continued to develop C and used it maturely until the age of 6. Pence Turnbull and Justice (2016) also found that children produced C by 36 months and could subordinate and coordinate C by 52 months.
Research question 2 considered whether there would be significant differences in length between utterances with and without certain POS categories. The I POS category showed no significant difference in utterance length between the two groups (with and without I). This indicates that I plays a less important role in the formation of long utterances. Le Normand et al. (2013) categorized I as a pragmatic word that is relatively independent of the syntactic system. They revealed that pragmatic words have a significantly negative coefficient, meaning that the greater the number of different pragmatic words, the lower the MLU. This finding is similar to our finding that I did not contribute to utterance length.
Research question 3 considered which POS categories could predict utterance length. For children aged 3 years, utterances may be longer when they contain P, C, T, and D. When ranking the regression coefficients, the top three highest categories were P, A, and C. Among these, P and C had positive coefficients, and A had a negative coefficient. This indicates that P and C have the strongest positive impact on utterance length, and A has the strongest negative impact on utterance length. When children use P and C, they are more likely to construct long utterances; thus, these POS categories may be important for children to form long utterances.
Children aged 4 years may produce longer utterances when they contain P and C. The three POS categories with the highest regression coefficients were P, I, and C. Among these, P and C had positive coefficients, and I had a negative coefficient. These results indicate that P and C had the highest positive impact on utterance length, and I had the highest negative impact on utterance length. When children use P and C, they may produce longer utterances. When children use I, they may produce shorter utterances.
Children aged 5 years had longer utterances when they used P, C, T, and D. The POS categories with the three highest regression coefficients were P, C, and I. P and C had positive coefficients, and I had a negative coefficient, showing that P and C had the highest positive effect on utterance length, and I had the highest negative effect on utterance length. Children may produce longer utterances when using P and C and shorter utterances when they use I.
Children aged 6 years had longer utterances when they used P, C, T, and Nf. The POS categories with the three highest regression coefficients were P, C, and I. P and C had positive coefficients, and I had a negative coefficient, showing that P and C had the highest positive impact on utterance length, and I had the highest negative impact on utterance length.
P and C had the strongest positive impact on utterance length for all age groups. For children aged 4, 5, and 6 years, I had the strongest negative impact on utterance length. These results indicate that among the 11 POS categories, P and C influenced children’s utterance length; using P and C made their utterances longer. Thus, such POS categories may be important learning targets for children to help them learn to generate longer utterances.
Conjunctions have a positive influence on utterance length, which aligns with our hypothesis. Conjunctions are words employed to connect words, phrases, or sentences, serving as bridges within sentences to construct complex information and expressions. They can indicate the logical relationship between parts of a sentence, encompassing parallelism, choice, contrast, cause and effect, and more. The language samples of children revealed the usage of various conjunctions such as “hàn”(and), “rán hòu”(and then), “kě shì”(but), and “suú yú”(so). For example, in the utterance, “tā bú dà yě láng dàng chéng húo rén rán hòu (and then) jiù gēn dà yě láng shuō tā de núi shēng bìng le” (She thinks Big Bad Wolf is a good guy and then tells Big Bad Wolf that her grandma is sick.), the conjunction rán hòu (and then) connects two utterances to covey sequential meaning. In another example, “mā ma méi yú u kōng suú yú (so) qúng xiúo gong mào qù bú pú táo jiù hái yú u (and) dàn gāo song gěi wài pó” (Mom was not free, so she asked Little Red Riding Hood to give the wine and cake to grandma.), the child used the conjunction “suú yú” (so) to connect two utterances and “hái yú u” (and) to connect two words. In Mandarin, clauses can be juxtaposed without conjunctions to form complex or compound sentences (Erbaugh, 1992; Xue et al., 2022). However, when children use conjunctions in their utterances, they tend to connect two words, phrases, or sentences, resulting in an increase in utterance length.
Prepositions are like the glue of language; they hold the elements of a sentence together by showing the relationships between different parts. They are connectors or bridges that link nouns, pronouns, or phrases to other words in a sentence. Their primary role is to give additional information to express various relationships such as time, object, place, direction, scope, reason, purpose, tool, comparison, etc. (Ho, 2005). Using P, children are able to connect phrases and therefore increase the utterance length. Upon observing the language samples of children, we noticed that the most frequently used preposition in this study was “gēn.” In Mandarin, “gēn” is commonly employed to denote relationships between people or things. Children in this study frequently utilized “gēn” to indicate the object of an action in the context of story retelling, as well as to convey the meaning of someone telling something to someone else. For example, in the sentence “yŏu yī tiān mā ma gēn(P) xiăo gong mào shuō wài pó shēng bìng le kě shì wú méi kòng qù tàn wàng tā” (One day, mother told Little Red Riding Hood, “grandma was sick but I didn’t have time to visit her.”), children often generated dialogue spoken by characters in the story following the pattern of “A gēn B shuō” (A told B). By incorporating additional information conveyed by the characters in their utterances, children often produced lengthy utterances of this type. Therefore, when the preposition “gēn” was used in a child’s utterance, it tended to be longer. This may help explain why the presence of prepositions served as a positive predictor for utterance length.
H. M. Liu and Chen (2015) reported that children first acquire content words, such as nouns and verbs, and subsequently acquire function words, including prepositions and conjunctions. As children age, they demonstrate more mature usage of prepositions and conjunctions. Additionally, their utterances become longer over time (R. J. Huang et al., 2016; Wu, 2020). Children aged 5 and 6 years old tended to produce more prepositions and conjunctions and had longer utterances than children aged 3 and 4 years old. This developmental trajectory aligns with our findings, which identify prepositions and conjunctions as positive factors contributing to increased utterance length. Additionally, our results are similar to those of previous studies on the relationship between POS categories and utterance length. For example, Le Normand et al. (2013) found that when children use more conjunctions, negative verbs, determiners, prepositions, and pronouns, they have longer MLU. Our results also showed that when Mandarin-speaking children aged 3–6 years used prepositions and conjunctions, they produced longer utterances. Le Normand et al. (2013) further found that pragmatic words (i.e., interjections) had a negative correlation with MLU. Similarly, we found that interjections had a negative effect on utterance length.
Conclusion
For Mandarin-speaking children, acquiring a diverse vocabulary is crucial to constructing sentences. The part of speech (POS) system categorizes word types and outlines sentence structures effectively. Research, such as that by H. M. Liu and Chen (2015), has revealed that children tend to master certain POS categories earlier than others. Furthermore, studies by Tsai (2009) and Wu et al. (2021) have demonstrated a significant increase in the use of some POS categories with age. These findings suggest that language development can be observed through changes in POS usage. Additionally, it has been observed that the length of utterances tends to increase as children age. This implies that as children become more adept at using various word types for sentence construction, they may employ specific word types to create longer sentences. The tendency of older children to use certain POS categories more frequently, resulting in longer utterances, suggests a potential influence of these POS categories on the development of utterance length.
In this current study, we examined how Mandarin-speaking children use POS to form utterances and how different POS categories can predict utterance length. The results showed that, on average, Mandarin-speaking children aged 3–6 years used at least one V and N to form an utterance. The mean frequency of V and N use in their utterances increased with age and can, therefore, be considered essential elements in children’s ability to form utterances. Utterances with and without I showed no significant difference in utterance length. I played a less important role in the formation of long utterances. P and C had a positive effect on utterance length, and I had a negative effect. More P and C in utterances may contribute to longer utterances. Increasing utterance length may be a goal for some children in teaching and clinical interventions. Educators and clinicians might, therefore, consider guiding children to use P and C to increase their utterance length.
Limitations and Future Directions
We analyzed 11 basic Mandarin POS categories. These POS categories can be classified into detailed subcategories, which were not explored here. More detailed subcategories of POS should be coded and analyzed in future studies to examine how they affect utterance length. Further, we did not divide utterances into long and short groups; thus, the POS differences in these groups were not compared. Future studies can compare children’s use of P and C in long and short utterances to examine whether lower use of P and C leads to shorter utterances.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by grants from MacKay Medical College (MMC-RD-112-2B-P002 and MMC-RD-113-1C-P005), the National Science and Technology Council (NSTC 111-2314-B-715 -010 -MY2), and Mackay Memorial Hospital (MMH-MM-10806).
Data Availability Statement
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
