Abstract
This paper integrates studies from Intercultural Communication and Multimodal Discourse Analysis, aiming to reveal how Chinese English textbooks conduct ideological education. By adopting studies of cultural products and practices, along with the system of interactive meanings, it develops a new approach to evaluating the cultural representations at macro level and their image interaction with learners at micro level. Furthermore, analysis of 253 multimodal texts from a series of Chinese English textbooks shows that the distribution of cultural products and practices echoes the curriculum’s requirements of building cultural self-confidence. The occurrences of cultural practices reflect the shift from brief expressions to the manipulation of cultural products. Moreover, images interacting with learners tend to offer more cultural information, guide them to keep a suitable social distance and create equality for them. The framework and findings can be used to uncover the strategies of ideological education and contributes to foreign language textbook compilation.
Plain Language Summary
This paper aims to reveal how Chinese English textbooks conduct ideological education through multimodal texts that consist of written language and pictures. It focuses on the distribution of cultural products and practices to represent cultural content as well as the strategies of using images to interact with language learners. The findings show that (1) the instances of cultural products on Chinese side are more than those in foreign contexts, which echoes the requirements from official curriculum to build cultural self-confidence; (2) the occurrences of cultural practices have a similar distribution and reflect the shift from brief expressions (e.g., greetings) to manipulation of cultural products (e.g., how to participate in the cultural practices); (3) the images being used to interact with learners tend to offer information, guide learners to keep a suitable social distance and maitain the equality between cultural content and those who view the images. The study contributes to the textbook compilation and ideological education in a foreign language.
Introduction
Ideology is known as a systematic body of concepts about human life and culture, or a set of beliefs held by a particular group, and is influenced by a number of social or contextual factors. Beyond this definition, there are a number of studies and theories indicating that ideology is highly concerned with its functions of governing a class or maintaining the power in societies (Althusser, 1971; Fairclough, 1992; Marx & Engels, 1978). But this study will not go further to political aspects because the language learning materials for young learners, under this review, have the orientation of portraying a world that is safe, clean, harmonious, benevolent, undisturbed (Lavrenteva & Orland-Barak, 2023), as well as the trend of depoliticization.
A huge number of studies investigate how the mechanism of ideology is manifested via diverse modes, such as images, videos, rituals, music (Long & Liu, 2018; Luo, 2021; Wei & Miao, 2021; Yang & Bai, 2023; Ren, 2023; etc.). But the manifestation is discussed in a macro way, which cannot help the students acquire the skills of interpreting or indicating the ideology verbally or non-verbally. Among the diverse modes, textbooks that contain verbal and visual descriptions can be viewed as a carrier that manifests the ideology. In view of this, we start our exploration by examining Chinese English textbooks in primary school, because they play an important role in cultivating young students’ understanding of cultures from domestic or foreign countries so as to impart the main-stream or dominant ideology to them.
It has been found that a number of papers, conducted in a micro way, have explored discursive strategies of representing ideology in Chinese English textbooks (Chen, 2010; Feng, 2019; Guo & Feng, 2015; Xiong & Qian, 2012), indicating social values are infused into textbooks through multimodal texts. And it is shown that a series of studies, explored in a macro way, reveals the characteristics of cultural representations in Chinese foreign language textbooks (Ge, 2022; Tan & Zhang, 2022; Zhang & Li, 2022a, 2022b). These studies show that ideology is embedded in cultural representations and scattered throughout the development of textbooks.
But few studies have explored the integrative features of social values with cultural practices, and examined the multimodal texts in transmitting ideological education (Feng, 2019; Liu, 2005). And the strategies that English textbooks use to interact with their learners for the purpose of ideological education remain unexplained.
With this in mind, we will explore how ideological education is woven throughout foreign language textbooks. To be specific, the issues of how ideology is coded into cultural practices and how textbooks interact with language learners discursively have not yet been fully addressed.
In what follows, we will briefly review the culture analysis in Chinese English textbooks from two fields, and propose a new framework for combining the culture analysis at micro and macro levels. And then we shall introduce the methodologies for carrying out this study, followed by the findings and discussion.
Theoretical Background
Multimodal Analysis of Language Textbooks
The current critique of research on multimodal teaching materials recognizes two perspectives of culture analysis. Both perspectives are inclined to reveal the mechanism of cultural representation in foreign language textbooks, and advocate for high-quality representation of cultural content.
The first research orientation conceptualizes English textbooks as cultural artefacts that are infused with moral or social values (Yuen, 2011; Liu, 2005; Feng, 2019; Liu & Qu, 2014). Accordingly, all multimodal texts in language learning textbooks are examined with reference to social values and commonsensical meanings, and are treated as repositories showcasing the cultural content, images, behaviors and so on. This strand follows a socio-semiotic approach originated from the systemic-functional linguistics (Halliday, 1978, 1994) and developed by the grammar of visual design (Kress & van Leeuwen, 2021). Studies in this field tend to employ the qualitative method, and they not only analyze the multimodal meanings across modalities (e.g., verbal, tones and gestures, as discussed in Martin & Zappavigna, 2019), but also examine the intersemiotic relations between verbal and non-verbal modes (Martinec & Salway, 2005; Unsworth, 2006). The recent studies show how textbooks transmit cultural values through the co-construction of verbal texts and images (Liu & Qu, 2014; Weninger & Kiss, 2013), and stress the nature of interaction between learners and textbooks from the perspective of interpersonal meanings. Weninger (2021) categorizes multimodal discourse analysis in this strand into two groups. The first group treats meaning as representation and examines how language textbooks construe meanings or depict activities in the actual world. Whereas the second one sees meanings as engagement or interaction between learners and the textbooks, thus viewing that the textbooks can enact interpersonal meanings and influence the learners.
The second research strand takes the perspective of intercultural communication, viewing culture in language textbooks as a holistic body consisting of diverse components, and the language textbooks as culture representation (Li et al., 2023; Moran, 2009; Tan & Zhang, 2022; Zhang & Li, 2022a, 2022b). Studies in this tradition focus on the intensity or distribution of cultural components in language textbooks, such as distributions of products, practices, perspectives, communities and persons, which reveal the characteristics of target and native cultures along with the ways of presenting cultures. And these studies use quantitative methods to evaluate the overall cultural representation, cultural categories and forms of representation. It turns out that, in foreign language coursebooks, content related to target culture accounts for the highest percentage, cultural practices occur more frequently than other cultural components and representing forms tend to emphasize the understanding of different cultures rather than the expressing or exercising of them.
Cultural Practices, Curriculum Standards and Textbooks
Culture and society are integrated and can only be described from the counterpart’s perspective (Bauman, 2009 p. 214). In this sense, cultural practices and social practices converge, which can be verified through the development of Western philosophical research. It is shown that Williams (1977, 2018) developed Karl Marx’s idea of social practices and stated that social practices encompass the cultural practices. In Williams’ view, culture is seen as a holistic lifestyle and plays an important role in the realm of social superstructure. And his view is accepted by many countries in constructing their cultural systems or transmitting ideologies. China is an example of this.
The annual Chinese government report frequently emphasizes social practices as significant guidance to realize cultural self-confidence. And the Chairman of China emphasizes the necessity of adhering to the cultural self-confidence for the prosperity of socialist culture, the importance of enhancing cultural practices in the process of knowledge transmission and the significance of fostering citizens and the development of Chinese society through cultural practices. Similarly, the English Curriculum Standards for compulsory education enacted by the Ministry of Education of China explicitly gives weight to the prominent position of cultural awareness in student education. Its basic rationale is that foreign language learning is not only helpful to all-round development of individuals but also to their participation in cultural practices (Cheng & Gong, 2005). It is intended to build upon the students’ knowledge, social values and cultural awareness (Wang, 2013) and cultivate the adolescents’ personality, patriotism, cultural literacy and sense of social responsibility.
To achieve this aim, the English Curriculum Standards (Ministry of Education of the People’s Republic of China, 2022, p. 24) clearly states the basic components of developing students’ cultural knowledge, and divides it into material and non-material types. The former type includes knowledge about food and drinks, clothes, architecture, transportation and so on, while the latter one consists of knowledge of philosophy, science, history, language, literature, art, education, social values, moral education, aesthetic sentiment, labor consciousness, social norms, customs and so forth. The English Curriculum Standards have already set up the criteria for language coursebook compilation, but the problems are whether the material type of knowledge accounts for more or less pages than the non-material type of knowledge, and how the non-material type of knowledge is presented to or interacted with the learners.
The first problem can be handled automatically with Google Cloud Vision or reckoned manually. The automatic and manual methods can be used to show the characteristics and distribution of cultural representations in multimodal texts (Baker & Collins, 2023; Zhang & Li, 2022a, 2022b). However, the automatic method is merely able to recognize the categories of images, such as food, clothes, people and so on. And the manual one shows the limited ability to deal with a large volume of image corpus.
The second problem can be dealt with a combined method in which qualitative analysis is set up from the applicable theory, and quantitative one is performed to reveal the mechanisms lying behind. Among the relevant studies, cultural knowledge is transmitted through transitivity processes (Guo & Feng, 2015), linguistic choices and grammatical metaphor (Xiong & Qian, 2012), appraisal elements (Chen, 2010; Feng, 2019).
On the whole, two problems are concerned with the ways in which ideology or cultural value is represented in the coursebooks and interacted with the learners. It is not necessary to separate both and is possible to treat them as a whole. The apparent gap, in terms of multimodal textbook analysis, is that ideology is investigated through the categories and distribution of cultural representations on the one hand, and through the discursive strategies of cultural practices on the other hand. But there is no explicit modeling of how ideology is presented to and interacted with learners simultaneously. To address this gap, the current study proposes a new approach for the systematic understanding of ideological education in English textbooks, and reveals the ontogenetic change of ideological content across primary school levels.
Towards a New Approach
In order to explore ideological education in the Chinese context, we plan to see what strategies are adopted when Western and Chinese culture encounter each other in English textbooks. We will narrow down the scope of cultural representations and draw on the classification of cultural components coined by Moran (2009). Furthermore, we will utilize the conceptual framework in social semiotics proposed by Kress & van Leeuwen (2021), hoping to uncover the strategies for ideological education from macro-level to micro-level.
At the macro-level, ideology is reflected in culture and transmitted by languages or meaning-making resources. The evaluation of culture reflects how ideology works in a society. This evaluation can be achieved by analyzing the five cultural components: cultural products, practices, perspectives, communities and persons (Moran, 2009).
Cultural products are the gateway to the new culture when entering the culture, and may be visible or invisible, such as food, buildings, education, recreation, music, dance and so on. These products may also be unique to a specific culture (such as Nigerian wooden masks and Chinese dragon boats) or common to all humankind. This study sticks to the unique products from English or Chinese side in order to find out the strategies of spreading English or Chinese culture adopted by the textbooks in Chinese context. Cultural products are constituted with four sub-components: artifacts, places, institutions and art forms. They encourage people to participate in cultural practices and can be recognized at the first sight by us in many language textbooks. This is the reason why we take cultural products as the entry point of analyzing the unfolding cultural practices.
Cultural practices encompass all the actions carried out by members of a culture, and may be solitary activities, interactive exchanges or collective tasks. They concern how people use language or other products to engage in cultural communication, in other words, how language constructs people’s experiential meanings through the manipulation of cultural products. It can be inferred that the analysis of cultural practices in the field of intercultural communication is equivalent to the studies of experiential meanings through transitivity processes in the domain of systemic-functional linguistics, but is simplified to reveal “what is happening between people” in visual representations (Guo & Feng, 2015).
Cultural practices contain four sub-components: acts, operations, scenarios and lives (Moran, 2009, pp. 54–70). Acts typically consist of brief utterances or responses, comprising established language expressions and non-verbal communication, such as expressing enjoyment, asking for information and greetings. Operations describe practices that involve the use of cultural artifacts. Scenarios are practices enacted in specific situations, involving operations, acts and other sets of specific practices. Lives, on the other hand, represent the stories of members of the culture. The relationship among these components is illustrated in Figure 1. An exemplary instance is observed when people come together and engage in a marriage ceremony. The ceremony involves scenarios associated with guests’ participation: meeting, blessing, having dinner, toasting and leaving. Guests are required to carry out acts (e.g., toasting) and the operation of raising the wineglasses.

The components of cultural practices.
Cultural perspectives are implicit or explicit beliefs and values shared by members of the culture, which are manifested in products and practices. Cultural communities consist of specific cultural groups where people carry out practices in specific social settings through different interpersonal relationships. Cultural persons refer to those who are celebrated with their own culture.
Moran’s idea reveals cultural practices occur in social situations and embodies our understanding of ideological transmission. Such an idea, we argue, can be used to describe the cultural practices that appear frequently in Chinese or English contexts, or both of them, and guide scholars to examine how ideology is encoded in foreign language textbooks.
However, the crucial point is that while the analysis of cultural practices is able to display ideological transmission, it cannot reveal how the transmission of ideology interacts with language learners. This study, now, turns to the interpersonal analysis of images (see Figure 2) due to its auxiliary nature, even though image possesses the capability of depicting what is happening in cultural practices.

A simplified system of interactive meanings in images.
At the micro-level, ideology is explored through languages or multimodal resources due to the fact that they can describe people’s experiential and attitudinal meanings in cultural practices. When the interaction occurs between the coursebook and its learners, the attitudes, social values and ideology are all expressed via linguistic or extralinguistic resources.
Tracing back to the interactive functions of language, the intermediate elements between speaker and listener are concerned with information, goods and services. Such a study is termed speech function (Halliday, 1978, 1994). Concerning information, we demand it with the use of “question,” and we give it with the aid of “statement.” Meanwhile, when it relates to goods and services, we demand them by expressing “command” and give them by indicating “offer.”
The interactive acts of image are similar to speech function of language but differ in the way of giving or demanding (Kress & van Leeuwen, 2021). When images “demand,” they gaze at or point at the viewers and require them to provide invitation, warning or summons. If images “offer,” they indicate that there are smiles, gestures, puzzled expressions or emoticons available for the viewers.
The interactive functions of language and image reveal the dialogistic nature of human communication in which attitudes, social values and ideology are among the interlocutors or between the textbook and its learners.
The interaction may achieve intimacy or increase distance among interlocuters from local to global circumstances, depending on their interpersonal relationships. It is similar to the case of interaction between textbook and its learners. If the images are shown in short, medium or long shots, the social distance will be reflected from personal to social and then to impersonal range (Kress & van Leeuwen, 2021).
The interaction may also imply expressing subjective or objective attitudes. Speakers express individual attitudes by appraisal elements (Martin & White, 2005), while images illustrate attitudes by angles and perspectives. If subjective attitudes are presented to the viewers, images will have a central perspective and show a high and frontal angle, which gives the sense that images are always subjective. On the contrary, if objective attitudes are represented to the viewers, diagrams, maps and charts alike will neutralize the central perspective (Kress & van Leeuwen, 2021).
To sum up, the section provides a novel approach to combining the theories at macro-level and micro-level in cultural analysis. Its aim is to explain what strategies of ideological education are used in English textbooks, thereby outlining the possible multimodal choices for representing the ideology and interacting with potential learners. Theories at macro-level and micro-level jointly describe the underlying mechanism of ideological education in textbooks, and enable us to see what cultural practices along with their discursive features are appropriate for teachers to take actions. In the following sections, we proceed to a quantitative analysis and examine what dominant ideology permeates through different school levels of English textbooks.
Research Design and Methodology
Textbooks and Their Texts
The texts for this study are selected from a series of Chinese English textbooks authorized by the Ministry of Education of the People’s Republic of China. These textbooks are widely used in primary schools across numerous provinces in China. Specifically designed for students in grades 3 to 6, they have been recognized as National Excellent Textbooks. Published by Foreign Language Teaching and Research Press between 2012 and 2014, the series consists of 8 textbooks that have been reprinted over 110 times.
The entire series of English textbooks is well-crafted, featuring a large number of images that contribute to the development of students’ language and visual literacies. Each textbook adopts a task-based approach, comprising 11 independent modules. Each module is further divided into 2 separate units, with each unit having an overarching theme and a variety of tasks, such as reading, speaking and writing. Within each task, there may be cultural products that are associated with corresponding cultural practices across different topics.
The criteria for selecting texts are closely related to the presence of cultural products (Moran, 2009). In this study, we define cultural products as tangible or intangible items that represent specific cultural practices. We categorize cultural products into two groups: those representing English culture and those representing Chinese culture. For example, the textbooks introduce learners to cultural products related to the Spring Festival in China and Thanksgiving in America, along with the associated cultural practices engaged in by people during these festivals.
Through the text collection process, we identified 253 multimodal texts related to cultural products. Among these texts, there are a total of 602 images. However, not all images are relevant to our study. Some images are solely used to introduce the leading characters who engage in conversations or monologues within a given topic. Additionally, some images, such as those containing only English or French letters, do not display any cultural products. After excluding these non-relevant images, we counted a total of 424 relevant images that are associated with artifacts, places, institutions and art forms. These images provide valuable visual representations of cultural products and practices, which will be further analyzed in our study.
Data Analysis Procedures
Initial Text Selection and Cultural Product Categorization
At the initial stage, all texts were manually extracted from the series of English textbooks. A total of 253 texts, regardless of length, were identified. The criterion for text selection was whether the text displayed culturally-loaded, recognizable products from either Chinese or English culture, such as moon cakes in Chinese culture and sandwiches in English culture.
The reasons for this selection criterion are as follows: Firstly, there are numerous compiler-designed social activities in a series of English textbooks, and the starting point for text selection could be diverse and random. Secondly, cultural practices related to festival customs are relatively few, which are not sufficient to reveal strategies of ideological education in language coursebooks. Thirdly, the frequent appearance of certain cultural products in the textbooks is of significant importance in understanding how cultural practices are implemented.
We categorized cultural products into four types: artifacts, places, institutions and art forms. Then, we examined features and distribution of each type to identify the prominent type of cultural practices. The statistical analysis of cultural products was conducted manually to reveal their distribution across different school levels. Specifically, the four types of cultural products, along with their domestic or foreign origins, were categorized and counted in a table for subsequent data visualization.
Encoding Cultural Practices Based on Moran’s Classification
In the subsequent stage, we encoded the prominent type of cultural practices into four types according to Moran’s classification. The aim was to ascertain what typical experiences are transmitted across different levels of English textbooks. Disagreements (less than 10%) were resolved through discussion.
This stage is intended to depict experiences represented in cultural practices, which is similar to the analysis of ideational meanings in systemic-functional linguistics (Halliday, 1978, 1994). However, it simplifies complex procedures into an analysis of the categorization and occurrences of cultural practices. The most important analysis at this stage is to recognize and quantify cultural practices. The examination method is based on the categorization among cultural acts, operations, scenarios and lives (Moran, 2009).
We manually collected statistics for the four types of practices from 424 images. For example, when an image represents the handling or use of certain cultural products, we label it as an instance of cultural operations. If an image lacks signs of conveying experiences in cultural practices or is merely a symbol of cultural products, it is not counted.
Analyzing Interactive Meanings Based on Visual Grammar
The next stage focuses on the interactive meanings within the framework of visual grammar (Kress & van Leeuwen, 2021). The goal is to determine how textbooks establish interpersonal relationships with learners through images. We concentrate on the system of contact, social distance and attitude in textbooks by examining elements such as smiles, gazes, angles, shots and perspectives in images.
During the data analysis, we treat images and the verbal text as a unified whole and attempt to locate them in the multimodal text to assess whether they depict cultural practices. Here is a detailed example to illustrate the analysis process. Suppose there is a multimodal text with an image of a girl drinking coffee accompanied by written sentences like “I like coffee. I like tea.”
We analyze the image’s elements related to interactive meanings. The girl’s smile and cozy gesture in the image creates a sense of happiness or joy, which can be seen as a form of contact (Offer) with the reader. The short shot (Personal) and the central perspective with a front angle (Subjectivity) contribute to a close social distance and a subjective attitude, making the reader feel more involved in the scene. The written sentences provide information about the girl’s preferences, while the image visually reinforces and enriches this information through the girl’s smile and gesture. By combining the image and text, we can better understand how the textbook establishes an interpersonal relationship with learners.
Images that do not describe cultural practices are excluded from this analysis. For instance, if an image only shows a static cultural product without any human interaction or contextual information that could convey an experience in cultural practices, it will not be considered in the analysis of image interaction. This exclusion may lead to a decrease in the number of images relevant to this part of the analysis.
Findings: An Integrated Analysis
To reveal the strategies of ideological education in the Chinese context, this study examines the cultural products and practices along with the image interaction in a series of Chinese English textbooks. We first present the results of textbook analysis based on the theories from intercultural communication. The results indicate distributions of cultural products and practices across school levels. This is followed by the results of evaluation of textbook images on the basis of social semiotics. The outcome shows the inclination of image interaction with the potential language learners.
Figure 3 illustrates that the most striking bars are represented by cultural artifacts (blue) and cultural places (orange). The number of cultural artifacts leads the prominent figure, followed by cultural places. This may be due to that English textbooks for primary students guide them to know the Chinese and foreign cultural artifacts, such as dumplings and sandwiches, and to recognize the places in different countries, for instance, London and Shanghai. Artifacts are utilized in many cultural practices. Places indicate the physical features of the natural or man-made environment where cultural practices occur.

Distribution of cultural products across school levels.
This study also finds that domestic artifacts mentioned in the textbooks outnumber foreign ones, although this is not the case in the textbooks of level 6. This reflects the requirement of building cultural self-confidence from the official bodies or curricula. It is similar for the distribution of cultural places. We see that the number of domestic places is significantly higher than foreign ones, even though there is a different trend in the textbooks of level 4. And the cultural places are not represented in textbooks of level 3, possibly because the pupils at this school level may not be familiar with them and compilers intentionally choose not to display any places.
Other findings are less obvious but still relevant to the cultural practices. Institutions that deal with the business of living (Moran, 2009, p. 50) frequently mention recreation and leisure in the series of textbooks, such as taijiquan and chess. Art forms including dance, music and clothing styles are mentioned less frequently in the textbooks, but they are still used to reflect cultural differences. For example, a lion dance in Chinatown and a piano play by a little foreign girl. These examples illustrate how cultural practices vary, even within the context of leisure activities.
Based on the findings, we continue to explore what cultural practices are related to the cultural products. As shown in Figure 4, the number of cultural acts has been gradually increasing from the very beginning, peaking at the point where cultural operations are most frequently used to represent cultural practices.

Distribution of cultural practices across school levels.
On the one hand, at the initial two levels, language learners are exposed to more cultural acts than cultural operations. Learners are taught brief language expressions accompanied by pictures, such as “Do you like rice? Does your mother like noodles?” Subsequently, operations described with images are illustrated to learners, for instance “In spring, Daming flies a kite in the park.” Language learners thus become familiar with cultural acts more frequently than cultural operations. This may result from the commonsensical principle “knowing before doing.” The textbooks are instructing learners to get familiar with the brief expressions or responses in a foreign language.
The frequency of both Chinese and foreign cultural acts decreases in the following two levels compared to the first two levels, but there is a surprising increase in Chinese cultural acts at level 6.
On the other hand, cultural operations are more frequently depicted than cultural acts in the last two levels of language textbooks. The changes reflect the fact that language learners are trained to utilize or operate the cultural products through language expression as their school levels get higher. For example, the product (e.g., kite) is frequently mentioned with an act (e.g., flying) at lower grades but with more operations (e.g., drawing, cutting and making) at higher grades. In light of the number of cultural operations, Figure 4 also reveals a contrast in which Chinese culture features more prominently at level 5 and less so at level 6 in comparison with foreign culture.
The analysis above is the examination of ideological transmission concerning the cultural products and practices at the macro level. But the multimodal texts being analyzed should be correlated with the evaluation of image interaction at the micro level. The forthcoming analysis of image interaction reveals a tendency in how textbook compilers choose images to engage with potential learners.
Figure 5 displays two striking characteristics. One is concerned with the distribution of interactive meanings by images in English textbooks. The other involves the changes in cultural practices in both Chinese and English contexts across school levels. These two features, when combined together, are used to indicate the inclination towards how cultural practices are negotiated with young language learners through images.

Inclination of image interaction across school levels.
It is shown that textbooks utilize images to engage language learners in three ways simultaneously. In terms of the way of contacting, images are used to offer information of cultural practices more than to demand learners to express messages about the cultural practices. When it comes to the way of maintaining social distance between images and their viewers, the images tend to adopt medium shot to depict cultural practices most frequently in order to create an environment that is relatable to learners. It is a kind of social setting, like a school, a family or a supermarket environment, where the cultural practices occur. There are a number of images showing the personal distance through short shots, mainly depicting the manipulation of cultural products. And for those images showing the impersonal distance through long shots, they depict the places, buildings and areas alike, no matter which country they are located in. With regard to the way of conveying subjective or objective attitude, images in textbooks are all hand-drawn by the compilers and take the central perspective and front angle so as to make the cultural practices or products prominent. When the maps, places or landmarks of buildings are shown in the textbooks, images neutralize the central perspective and take a long shot to represent cultural practices, such as visiting places of interest in China or the British Museum in London.
It is evident that the number of image interaction related to cultural practices increases as the school levels go higher. To be specific, the number rises higher when it comes to the main ways of image interaction, namely the major way of keeping contact with learners (Offer), the chief way of maintaining social distance (Social) and the principal way of choosing an attitude (Subjectivity). This manifestation proves that the multimodal texts containing cultural practices become increasingly prevalent in English textbooks as the school level increases. Furthermore, it is easy to note that each school level features more domestic cultural practices than foreign cultural ones when images are used to engage with learners, which may reflect their need to build cultural confidence even when they are learning foreign languages.
Discussion
Culture Representation and Its Focus in the Textbooks
The previous studies about cultural representation are concerned with whether domestic and foreign cultures should be presented to learners in a balanced proportion. Such studies (e.g., Tan & Zhang, 2022; Zhang & Li, 2022b) take a holistic approach to examine the intensity and proportionality of diverse cultures in language coursebooks. They state that the textbooks being investigated have an appropriate proportion, but they have not explained why the cultural products, cultural persons and cultural practices outnumber other categories of cultural components.
The current study does not investigate the proportionality of cultural components in textbooks, but focuses on the distribution of cultural products and practices across school levels. The underlying assumption is that the examination of cultural products and practices is able to reveal the compilers’ attitudes in guiding language learners’ participation in domestic or foreign culture. Based on the idea, we take an ontogenetic approach to see what the series of textbooks plan to transmit in terms of cultural products and practices.
The findings can be generalized into two points. For the first point, cultural artifacts and places are likely to repeat many times across school levels. All cultural artifacts are artificial crafts, and one can easily distinguish which country they originate from. An example of cultural artifacts (e.g., a Chinese kite) shows how people manipulate it when their English knowledge develops. The following Examples (1–6) cover several types of verb tenses and are provided to learners for expressing the uses of cultural artifacts.
(1) She’s got a kite. It’s a nice kite. (3BM09U21)
(2) There’s a boy. He’s flying a kite. (4AM07U13)
(3) Amy can fly her kite. (4BM03U11)
(4) Will you take your kite on Saturday? Yes, I will take my kite and my football on Saturday. (4BM04U13)
(5) How do you make a dragon kite? (5BM08U23)
(6) What did they do last week? They flew their kites last week. (The answer is shown by a picture that needs leaners’ practice of speaking in English.) (5BM09U13)
The same applies to the example of cultural places, institutions and art forms. For instance, the place called the Great Wall (or the city of London) and the institution of recreation (e.g., chess) are mentioned many times in the series of English textbooks, as shown in the Examples (7–9) and (10–12) respectively. All the occurrences of Chinese products outnumber those of foreign products but the margin of superiority is minor. In this case, the textbooks play a role in introducing the foreign cultural products while giving slight emphasis to expressing Chinese cultural products in English.
(7) Did you visit the Great Wall? Yes, I did. (4BM09U23)
(8) Daming and his father went to the Great Wall at the weekend. (5AM03U21)
(9) I want to visit China. I want to visit the Great Wall. (6AM09U11)
(10) What are they doing? They’re playing chess. (4AM03U13)
(11) There are two girls. They’re rowing a boat. No, they’re playing chess. (4AM07U23)
(12) We are having a party. Xiaofei and I are playing chess. (6BM05U23)
For the second point, cultural acts and operations are the most frequent instances showing how cultural practices are emphasized for the primary students. These acts instruct learners how to express brief utterances and give responses through common expressions, as shown in the following Examples (13–17). Cultural acts, such as greetings or expressing thanks, can be used by people from any culture. But the multimodal texts selected in the current study show that all the cultural acts are related to cultural products. The reason is that we are treating the cultural products as the criteria of selecting multimodal texts in order to uncover the differences between Chinese and English cultures. Additionally, the findings display that more Chinese cultural acts are presented to learners than foreign cultural ones. This may enhance the chances of experiencing Chinese cultural acts in another language when the learners are still young.
(13) Happy birthday, Sam. Here’s your cake. (3AM06U12)
(14) Do you want some noodles? Yes, please. (4AM04U13)
(15) It’s a big city. What is it? It’s London. (4BM02U23)
(16) I won a chess game last week. Now I feel happy. (5AM09U21)
(17) Where is Shanghai? It’s in the east of China. (6AM01U22)
Cultural operations involve the manipulation of cultural artifacts, for instance, the action of flying, taking, or making a kite in diverse contexts. A multimodal text of making a dragon kite in the textbook shows us the way to manipulate Chinese cultural artifacts. It consists of several acts shown in the Example (18) along with relevant images. It also portrays the little girls’ participation in the cultural practices of making a dragon kite. Another multimodal text in the same unit requires learners to engage in similar cultural practices and, at the same time, express how to finish these acts in English. Through repeating the application of the same cultural artifacts, the textbooks are designed to develop language learners’ ability of expressing their own culture and enhancing their cultural awareness.
(18) I drew a dragon. I cut the paper. I made a kite for my sister. (5BM08U21)
The number of scenarios is smaller compared to the number of acts and operations. The underlying reason is that cultural acts and operations are situated within cultural scenarios while cultural operations can also be conducted independent of scenarios (Moran, 2009). The textbooks being investigated reduce the language density by incorporating images that serve as the backgrounds for cultural operations. The images, in their rich details, capture more cultural acts and operations that are not expressed by the written language. That is to say, images and verbal language are combined to display the cultural scenarios to language learners. For example, the scenario of family reunion during Spring Festival (Example 19) or Thanksgiving Day (Example 20) illustrates cultural acts in written language and cultural operations in images (e.g., sending red envelopes to children by a grandpa sitting on a sofa, or saying grace to the God by family members sitting around a food table). The images of these scenarios have been repeated more than once across a variety of tasks, changing the content of language expression but still laying emphasis on the cultural practices.
(19) At the Spring Festival, we have a big family dinner. We say, “Happy New Year!” (4AM10U13)
(20) A: What do you do on Thanksgiving Day?
B: We always have a special meal. We say “thank you” for our food, family and friends. (6AM04U13)
To sum up, the achievement of ideological education is attributed to the repeated emphasis on foregrounding the manipulation of cultural products and the participation in cultural practices. The reiteration could benefit language learners in terms of ideological education from two perspectives. The occurrences of products and practices from both Chinese and foreign cultures have not formed a dramatic gap, but have shown a similar distribution, with artifacts and places, as well as acts and operations standing out. This echoes the commonsensical idea of treating foreign language textbooks as the harmonious and undisturbed world (Lavrenteva & Orland-Barak, 2023). By contrast, if there were a great increase in products and practices, no matter which culture they originate from, the emphasis on or inclination to certain cultures would interfere with the construction of harmonious world. It may prioritize certain cultures over others, thereby failing to enhance ideological education in a balanced way.
Image Interaction and Its Enforcement in Textbooks
By viewing the cultural representations and their focus in the textbooks at the macro level, this study uncovers how Chinese English textbooks address ideological education. Cultural representations are conveyed through multimodal texts which use images to depict the manipulation of cultural products and the participation in cultural practices. Moreover, the images play an auxiliary role in displaying cultural features between Chinese and English-speaking countries. Apart from the auxiliary role, images also have a major role of engaging with the learners in the following ways, which can be viewed as the interpersonal function at the micro level.
The way in which images interact with language learners tends to offer information in the task of reading and listening (e.g., Examples 21, 23) and to require practice in the task of speaking or writing (e.g., Examples 22, 24). This confirms that the series of English textbooks designed for primary students teach them cultural information by providing much more language expression and images, thereby facilitating the coursebook-learner interaction.
(21) Look at that man. What is he doing? He’s making noodles. (4AM04U12)
(22) Do you want some rice? Yes, please. (The image, here a bowl of rice, presents a task waiting learners to imitate a similar practice with reference to the cultural products.) (4AM04U13)
(23) Can I help you? I want a hamburger, please. (6BM01U13)
(24) Can I help you? I want a cola, please. (The images, such as cola and hot dog in the same task, require learners to perform a dialogue in terms of products in foreign cultures) (6BM01U13)
When images are used to represent cultures from country to country, the way of image shooting affects how image interacts with potential learners. When cultural products are displayed to learners, images present artifacts in short shots, places in long shots, institutions and art forms in medium shots. In the case of the cultural practices embedded in images, their displays mainly take the form of medium shots due to the auxiliary nature of images and the fact that it is difficult for images to fully capture all the ongoing practices within a limited space. Medium shots, as shown in Figure 5, are used frequently to guide the learners to maintain an appropriate social distance when cultural products and practices are being exposed to them in images. Overall, the series of textbooks tend to create a social distance of being neither overly personal nor overly impersonal between the cultural representations and the language learners.
Another way in which images engage with learners is encoded in attitudes of being subjective or objective (Kress & van Leeuwen, 2021). On the one hand, images in the textbooks tend to be subjective with the central perspective that centralizes themselves, and with eye-level angle that signifies equality for learners. For instance, interlocutions among friends, family members or between teachers and students occur in the circumstances such as parks, living rooms or classrooms. The conversations are shown in images that feature a central perspective and an eye-level or frontal angle. A typical example is a conversation between a mother and her children when the mum is making dumplings and explains what she is doing. The image portrays the mother’s cultural practice through an eye-level angle and a central perspective. On the other hand, a number of images in the textbooks have a disposition to decentralize the central perspective so as to create an objective sense, regardless of the image’s size. Examples of this are shown in the introduction of cultural places, such as discussions on major cities along with their weather (e.g., the city of London), or on famous places of interest (e.g., the British Museum and the Huangshan Mountain) from the UK or China. The relevant images in these examples use a top-down view to neutralize the central perspective and provide learners with an objective depiction of the world as it is. This avoids the politicization or preferential treatment of any cultural places, whether domestic or foreign.
In conclusion, images in Chinese English textbooks tend to engage with the language learners by providing cultural information, maintaining the suitable social distance and adding subjectivity. This unveils the discursive strategies employed by images. The strategies are related to ideological education due to the interactive nature of images. And they are used to instruct learners to get information about cultural products and practices, encouraging them to view the images at eye-level and, most often, from a central perspective. As a result, textbooks can enhance the interactions between learners and the cultural representations depicted in images and finally foster learners’ cultural awareness and their understanding of cultural practices in another language.
Ideological Education and Multimodal Texts across School Levels
Ideological education is embedded in or scattered throughout the foreign language textbooks. It may be reflected by direct quotations from sages or their perspectives. But in English textbooks from any country, the primary goal is to develop learners’ language proficiency as well as their cultural awareness. With regard to raising cultural awareness (Feng, 2015; Zhang & Li, 2022b), textbooks should be designed carefully to avoid stereotypes or cultural conflicts in order to create a harmonious and undisturbed world.
The series of Chinese English textbooks being investigated repeats the cultural components across different school levels and link them with varying linguistic knowledge. The cultural components, such as the same cultural products or practices, recur frequently but their related grammatical usage is constantly evolving. For instance, the grammar shifts among the present tense, past tense, future tense and present continuous tense in relation to the uses of the same cultural artifacts (e.g., Examples 1–6). In this way, the repetition of products and practices within the tasks involving different verb tenses to describe the kite-related practices benefits the development of language proficiency. This not only enhances learners’ language proficiency but also raises their cultural awareness.
Some studies have conducted a holistic analysis of the cultural representations in English coursebooks and emphasized that the percentage of cultural components from certain countries is crucial in textbook design (Tan & Zhang, 2022; Zhang & Li, 2022b). This study, however, takes an ontogenetic view to examine the cultural representations in multimodal texts as language knowledge develops across different school levels. It views that the combination of linguistic knowledge and cultural exhibition is significant. The combining of both could help learners improve their ability to express or discuss cultural practices and cultural products in a foreign language.
The Chinese English textbooks use multimodal texts to guide learners’ interaction with cultures from China and English-speaking countries at the same time. Particularly, the images in multimodal texts tend to offer learners information about the manipulation of cultural products and the participation in cultural practices. They also help learners to establish a suitable social distance in viewing the products and practices, avoiding extremes of being either too personal or overly impersonal. And they adopt a central perspective and an eye-level angle, creating equality between image contents and image viewers.
An interesting finding in these textbooks is that foreign cultures are being discussed within Chinese settings (e.g., in the park, classroom or family house) while Chinese culture is presented among English-speaking contexts alike. For example, the conversations about Chinatown in a New York family setting and the life changes of an elderly Chinese woman in an English TV interview reflect the interactive nature of cultural communication. But this point extends beyond our research focus, which concentrates on identifying cultural components and neglects the mixed backgrounds in which they are mentioned.
Conclusion
This study treats multimodal texts as a means of representing cultural components and interacting with language learners. It proposes a new approach to exploring strategies of ideological education in English textbooks for Chinese learners, even when cultural perspectives or beliefs are not explicitly stated.
At the macro level, our analysis reveals that the distribution of cultural products and practices in the textbooks aligns with the requirements set forth in administrative reports and curriculum guidelines. At the micro level, we observe that images in the textbooks tend to convey cultural messages rather than demand a response. These images are often presented in medium shots, maintaining a distance that is neither overly personal nor impersonal. Additionally, they centralize the perspective and present it at eye-level for learners, facilitating a more engaging and relatable learning experience.
Our findings suggest that integrating cultural analysis at the macro level and image interaction at the micro level can uncover the general tactics of ideological education in the textbooks. This includes how cultural representation is presented to learners and how images interact with viewers. Moreover, we have identified another strategy: the repetition of cultural products or practices combined with language knowledge. This approach not only enhances language proficiency but also cultivates cultural awareness simultaneously.
However, this study has certain limitations. First, there may be confusion among potential readers regarding the distinction between cultural learning through language education and ideological education. It is important to clarify that this paper does not conflate the two. Instead, it examines how cultural learning through language education reflects or manifests the enactment of ideological education for learners. Specifically, we analyze the manifestation of ideological education through instances of cultural products and practices (Moran, 2009) and their ways of interacting with learners in images (Kress & van Leeuwen, 2021). This focus can lead to a fluctuation in research interest, oscillating between intentionally infused ideological education in foreign language education and naturally emerging ideological elements that becomes embedded in language textbooks. Our study does not initially distinguish between these two aspects, which may not fully do justice to either.
Second, there appears to be a lack of a coherent argument from the outset. The argument draws from complex fields such as ideological education, intercultural communication, multimodal discourse analysis, language education, cultural learning and curriculum standards. Some of these areas are only superficially reviewed. The research is centered on ideology in English textbooks, but ideology is a broad term that cannot be easily quantified. Our study indicates that the manifestation of ideology is closely related to cultural practices (Williams, 2018), which are in turn connected with cultural products (Moran, 2009). The evaluation of ideology in textbooks also involves the requirements of curriculum standards and the multimodal representation of cultural materials. These various concerns may slightly divert readers’ attention.
Based on our findings and discussions, we propose the following suggestions for future research. First, future studies could focus on synchronic and diachronic analyses. Textbook features can vary not only among different publishers but also over time. For example, a synchronic study could compare the representation of ideology in textbooks from different publishers, while a diachronic study could examine how the representation of ideology has changed in textbooks over the past few decades. Second, future research could delve into the strategies employed in co-selecting multimodalities to facilitate learners’ development of disciplinary literacy (Martin & Unsworth, 2024; Zhou et al., 2025) or social identity (van Leeuwen, 2021). For instance, researchers could investigate how different combinations of visual and textual modes can enhance learners’ understanding of disciplinary literacy or their sense of belonging to a particular social group.
Footnotes
Acknowledgements
We appreciate the suggestions and comments from the anonymous reviewers. This work would not be publishable if it had not received valuable insights from the reviewers.
Ethical Considerations
There is no ethical violation in accessing the textbooks. Furthermore, since this study did not involve human participants, approval from an institutional review board was not required.
Author Contributions
Jinyou Zhou and Lin Wang developed the research framework, conducted data analysis and completed the manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Humanities and Social Science Fund of Ministry of Education of China (Grant No. 23XJC740011), the Education Reform Project at Southwest Medical University (Grant No. YJG202240), the Research Center for the International Dissemination of Ba Shu Culture (Grant No. 2024YB09), and the Special Support Program for Young Scientists and Technologists at Southwest Medical Univeresity (Grant No. 11/00031723; 11/00270618).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The complete corpus data and all relevant retrievals used in this study are available upon reasonable request from the authors via email.
