Abstract
The onset of literacy marks a significant change in children’s development. Written language is more complex than everyday conversation, and even books targeted at preschoolers contain more varied words and more complex syntax than child-directed speech does. We review the nature and content of children’s book language, focusing on recent large-scale corpus analyses that systematically compared written and spoken language. We argue that exposure to book language provides opportunities for learning words and syntactic constructions that are only rarely encountered in speech and that, in turn, this rich experience drives further developments in language and literacy. Moreover, we speculate that the range, variety, depth, and sophistication of book language provide key input that promotes children’s social and emotional development. Becoming literate changes things, and researchers need to better understand how and why reading experience shapes people’s minds and becomes associated with a range of skills and abilities across the life span.
It is impossible to imagine the world without the written word. Written words fill people’s daily lives, and even within the confines of laboratory experiments in psychological science, the visual word is probably the most ubiquitous stimulus. Written language is predicated on spoken language both in human evolutionary history (spoken language is biologically primary, whereas writing systems are cultural inventions) and in individuals’ developmental history (children’s progress in learning to read and write rests intimately on their underlying language prowess).
Although language is complex, it contains structure via a multitude of probabilistic relationships between multiple levels of correlated features (e.g., Seidenberg & MacDonald, 2018). Several theoretical accounts treat language development as the product of general learning mechanisms operating across this rich input (e.g., Ramscar, 2021). In this view, the transition to adultlike language comprehension and production is a gradual one, reflecting the accumulation of experience with sounds, words, sentences, and their contexts over a number of years.
The onset of literacy marks a significant change in children’s language experience. Most obvious is the acquisition of orthography—visual symbols that allow spoken language to be transported to and from the page. With instruction and practice, children learn how orthography connects with sounds and meaning, and the reading system comes to embody the distributional properties of their writing system (Castles et al., 2018). As Frith (1998) noted, “Language is never the same again” (p. 1011). The acquisition of an orthographic code changes how spoken language is processed, as evidenced by systematic differences between literate and nonliterate people across a range of nonreading tasks, including speech perception and production, and verbal working memory (for a review, see Dehaene et al., 2015).
Understandably enough, how children learn the orthographic code has been a central focus of research (Treiman, 2020). However, written language is different from spoken language in important ways (e.g., Biber, 1988), and learning to read provides children with access to these differences. What are the implications of this for children’s language and literacy development?
Writing and Speaking, Reading and Listening
Speakers and listeners work together to communicate in the moment, whereas written language is traditionally solitary and remote. Incomplete and ambiguous utterances are common in conversations but rarely trouble listeners for long. In the absence of a shared situation and shared cues such as facial expression, intonation, and gesture, written language has a difficult job to do—it has to work hard so that the intended meaning of the writer can be re-created in the mind of the reader. Writing is characterized by greater precision and increased syntactic complexity relative to speech (Roland et al., 2007). There are also differences in lexical richness (Hayes, 1998), as people use more varied words and more sophisticated vocabulary when writing than when speaking. Although factors such as formality and genre influence patterns of language use within each modality (and, as social media demonstrate, both written and spoken language continue to adapt and evolve), written language has greater linguistic variety at its disposal (Biber, 1988). Writers can choose from its repertoire to communicate nuance and complexity effectively.
As children progress through the education system, reading becomes an increasingly important vehicle for learning. To benefit from this, children need to become fluent in the language of the book. This framing resonates with the concept of academic language (e.g., Phillips Galloway et al., 2020). As the name suggests, research on academic language tends to be considered in the context of formal education extending through the middle- and high-school years. However, opportunities to learn about book language begin much earlier.
Early Exposure to Book Language
Montag et al. (2015) examined the vocabulary content of picture books targeted at preschoolers and found that books have greater lexical density and diversity than child-directed speech does; that is, they contain more words and more unique words. We (Dawson, Hsiao, et al., 2021) replicated these differences in density and diversity and found “book words” to be more sophisticated than words common in child-directed speech. Book words were more commonly nouns and adjectives, and they tended to be longer and more morphologically complex; they were also more abstract, acquired later in development, and more emotionally arousing. Listening to book language therefore provides exposure to vocabulary that is quantitatively and qualitatively different from that experienced via day-to-day conversation.
A similar conclusion holds for syntactic complexity. Cameron-Faulkner and Noble (2013) analyzed 20 picture books written for 2-year-olds and found that they contained more complex constructions than child-directed speech. Focusing on relative clauses (clauses connected to the main part of a sentence by a pronoun such as “that,” “who,” or “which”; e.g., “She loved the garden that she used to tend”), we compared child-directed speech, picture books targeted at preschoolers for shared reading, and reading books for older children’s independent reading. Relative clauses were rare in speech relative to both samples of book language (Hsiao et al., 2022; see also Montag, 2019).
The findings from these studies, summarized in Figure 1, show that learning about book language can begin before children can read themselves. Children who hear written language in the context of shared reading will experience words and syntactic structures that are systematically different from those they experience in speech. Given that language and literacy build from distributional information in the input, children who experience less book language will likely be at a learning and achievement disadvantage. Although the importance of shared reading has been long recognized (Noble et al., 2019), theoretical accounts of its contribution have emphasized factors such as dialogic exchange and talk about or around the text. Demir-Lira et al. (2019) found that parents used a broader range of vocabulary and more complex sentence structures during shared book-reading activities with 1- and 2-year-olds than they did in other interaction contexts, a finding consistent with book language itself being central to the importance of shared reading. Part of this effect was driven by the quality of the language parents used when talking around the text (e.g., providing descriptions of pictures), but it was mainly due to the complexity and diversity of the text itself.

Infographic summarizing key differences in the nature of language input children experience via books and spoken language. The outer ring contrasts features and properties for the two registers, and example words are shown in the inner ring (for data and annotated examples, see Dawson, Hsaio, et al., 2021; Hsiao et al., 2022).
Many factors are likely to be relevant in research on the complex relationship between shared reading and children’s language and literacy outcomes, including parental education level and family risk for language and literacy difficulties. It is also important to address issues of causality. Parents do not just create an environment for their child—genes are also at play, as language and literacy are heritable traits (Hart et al., 2021). This means that the shared-reading environment parents provide might reflect their own capabilities, passed to their children via biology as well as culture. The fact that differences between people in language proficiency are associated with genetic differences does not undermine the need to understand the nature of experience, however. The hypothesis that exposure to particular patterns of language brings about language development, and that book experience provides a particular means for this, should be testable experimentally by directly manipulating exposure and relating this to patterns of learning. Such research has potential to inform intervention as well as theory. To date, attempts to improve language and literacy outcomes via general shared-reading interventions have generated only small effects (for a meta-analysis, see Noble et al., 2019). Greater gains might arise if specific aspects of book language were targeted in a focused manner (e.g., Dawson, Brockbank, & Nation, 2021).
Reading Experience and Variation in Print Exposure
As children become independent readers, their opportunities to learn from books increase, and by adulthood, skilled readers have accumulated vast knowledge of written words. Reading experience is typically quantified via proxy measures, such as the number of authors’ names a person can recognize or the number of books in the home. Substantial evidence indicates a close association between print exposure and a range of outcomes, including individual differences in reading and vocabulary across the life span (Mol & Bus, 2011). This is not surprising given that text is lexically richer than speech and is therefore the primary supplier of new words, once children can read. Print exposure has beneficial effects beyond those for vocabulary, however. It also correlates with how well adults deal with spoken language in tasks tapping sentence comprehension and grammaticality judgment (e.g., Da˛browska, 2018; Favier & Huetigg, 2021).
In line with usage-based accounts, these findings support the idea that reading experience shapes language development and leaves a legacy that is evident in how well adults deal with language. Questions remain—not least about specificity and the type of experience that text provides, as distinct from language experience more generally. The importance of text is supported by findings that the contextual history of individual words in written language (derived by tracking usage across a large corpus of children’s books) is associated with how well children process the same words in laboratory tasks, such as those requiring decisions about lexicality or meaning (Hsiao et al., 2020).
Arnold et al. (2019) reported a correlation between children’s print exposure and their comprehension of ambiguous pronouns: More avid readers were more likely to show the adult bias of linking a pronoun with a grammatical subject (e.g., linking “he” with “Panda Bear” in “Panda Bear is having lunch with Puppy. He wants a pizza slice”). Arnold et al. speculated that this relationship is driven by exposure to literate language: In the absence of social cues such as the speaker’s gaze, written language needs to convey who did what to whom, and by attending to this while reading, children gradually develop adultlike patterns in pronoun comprehension.
Focusing on relative clauses, Montag and MacDonald (2015) found that children and adults who read more (as captured by measures of print exposure) were more likely to use passive relative clauses (e.g., “the book that was carried by the woman”) in their own speech, presumably a reflection of the observation that passive relative clauses are more common in text than conversation. Building on this, we (Hsiao et al., 2022) separated the content of a large corpus of books written for 5- to 16-year-olds according to the targeted age groups and found that as the targeted reading age increased, so too did the frequency of relative clauses. Furthermore, and as in speech, different types of relative-clause structures in children’s books co-occurred in predictable ways with specific lexical properties. More work is needed to relate input statistics from large and developmentally sensitive written-language corpora to children’s performance on a range of tasks. Nevertheless, it is worth noting that distributional patterns we identified in book language align with the ease with which adults process lexical-syntactic combinations in laboratory tasks. Directly manipulating children’s exposure to specific forms in book language and then tracking consequences for language processing will be particularly valuable in helping research in this area move beyond correlational evidence.
Beyond Language and Literacy
Although differences between written and spoken language have been recognized for a long time, it is only more recently that these differences have been systematically described and quantified for young children’s language experience (Fig. 1). The cumulative effect of reduced exposure to book language is likely to be large. Logan et al. (2019) estimated that by the time children are 5 years old, those who have been read to five times a week will have experienced an additional 1.4 million words, compared with children who have not been read to. And it is not just the number of words that matters; also important are the nature of the words (Dawson, Hsiao, et al., 2021) and the type of syntactic structures in which they appear, relative to speech (Hsiao et al., 2022; Montag, 2019). Reading experience shapes learning of the orthographic code (Castles et al., 2018), but it also has a wider influence in that text provides a rich substrate for learning words, sentence structures, and discourse patterns that are rare in speech. In turn, this knowledge is available to support growth in reading comprehension and to be used by children in their own writing.
Taking a closer look at the content of children’s books also points to more distal consequences of variation in experience with book language. In adult corpora, fiction contains more complex emotion words (e.g., “despair,” “relief,” “irritation,” “pride”) than both nonfiction books and everyday speech, and adults who read more are more adept at recognizing complex emotions (Schwering et al., 2021). Written language may therefore provide unique linguistic input that propels the formation of emotional categories over time, driven by the need for readers to be able to derive emotional situations from text. Our finding that even books written for preschoolers contain more emotion words than child-directed speech does (Dawson, Hsiao, et al., 2021) once again suggests that opportunities for learning start early. This observation also invites speculation regarding outcomes for children who do not experience much book language, or those who are less able to access it fully because of language-learning difficulties. In a longitudinal study, Griffiths et al. (2020) found that language skills at age 5 to 6 years predicted emotion recognition at age 10 to 12 years, and that children with language disorders were poor at recognizing facial and vocal emotion cues. Causal relations cannot be inferred from these data, but language has been posited as a “critical ingredient” in the perception and experience of emotion (e.g., Lindquist, 2017). Perhaps the wide-ranging, varied, deep, nuanced, and sophisticated language needed to serve this function is the language that is most representative of the book.
Similar arguments can be made about syntax. Children’s grasp of theory of mind (the ability to attribute mental states to oneself and others) is intimately linked with language development and, in particular, the acquisition of complex syntax (for a recent overview, see Kaltefleiter et al., 2021). This makes sense. English sentences that express another person’s mental state are complex, as they contain an embedded element: In “Mary thinks that the sky is green,” the subordinate element (in italics) is false, but the overall sentence may be true. Explicit training with this type of sentence structure improves children’s performance on theory-of-mind tasks (Hale & Tager-Flusberg, 2003), which suggests a relationship that is at least partially causal. Although day-to-day exposure to such utterances is grounded in social interaction and conversation, we note again the utility of books not only for providing situations that naturally invite such conversational exchange, but also for providing greater exposure to the rich linguistic forms themselves. Dyer et al. (2000) concluded that the text of books for 3- to 6-year-olds is laden with mental-state language: emotional, cognitive, and evaluative content words used in complex sentences with verbs such as “feel,” “think,” “want,” and “know.” Over time, this rich input provides opportunity to build and communicate knowledge about the psychological world; 9- to 12-year-olds who use more mental-state terms in their own narratives have stronger language skills (Hamilton et al., 2021), and they also read more fiction. These findings complement the various associations between fiction reading, emotion processing, empathy and theory-of-mind processing reported in the adult literature (e.g., Oatley, 2016; Schwering et al., 2021). Discussion has tended to focus on the role of fiction in promoting imagination and emotional experience. Plausibly, book language itself may play a more direct role, and although patterns of causality remain to be established and are likely to be complex, it is clear that books provide access to situations and characters beyond the everyday, and that the language needed to communicate these is different from everyday language.
Concluding Remarks
Exposure to book language provides opportunities to experience words and sentences that are rarely encountered in conversations. These systematic differences start early in life and are evident in the books children hear in the context of shared reading. From infancy onward, children have opportunity to learn from this input. Such learning establishes the foundations for more advanced language development as well as literacy, and establishes the distributional properties that will become inherent in their adult language systems. Becoming literate changes things, and researchers need to better understand how and why reading experience shapes people’s minds and becomes associated with a range of skills and abilities across the life span. Progress toward this goal will come from further consideration of the nature of book language itself; how it builds and changes over time as children develop; how it is used by caregivers, teachers, and children; and how reading experience varies across individuals.
Recommended Reading
Dawson, N., Hsiao, Y., Tan, A. W. M., Banerji, N., & Nation, K. (2021). (See References). Provides an in-depth analysis of the lexical content of books targeted at preschool children, comparing it with child-directed speech.
Montag, J. L., & MacDonald, M. C. (2015). (See References). Details the syntactic complexity of children’s book language relative to child-directed speech, focusing on different types of relative clauses.
Seidenberg, M. S., & MacDonald, M. C. (2018). (See References). Reviews language and literacy development from the perspective of statistical learning and considers how variability in the input (i.e., language and reading experience) influences language development and language processing.
