Abstract
As an emerging business model, live broadcast e-commerce is growing rapidly in China in recent years and prompts the emergence of network anchors, attracting the customers to buy a variety of goods in their live broadcasts. In this paper, we take Li Jiaqi, one of most popular live streaming anchors in China as example, collect and transcribe his speech data of live broadcasts to build live broadcast speech-text corpus, and then conduct corpus-based analysis on linguistic characteristics from the aspects of phonetic, usage of various language units, and rhetoric ways, by using the methods and tools from Phonetics, Natural Language Processing, and Corpus Linguistics. Based on the linguistic analysis, we propose the hypothesis that features in the live broadcasts can have some effects on purchase intention of customers, and testify it with questionnaire survey.
Introduction
The year 2019 is generally recognized as the “first year of live broadcast e-commerce in China” (Cunningham et al., 2019). Live broadcast (streaming) e-commerce refers to the form of e-commerce that uses live broadcast as a channel to achieve marketing purposes, which consists of three basic parts: people, goods, and venues (Cai et al., 2018). Live streaming is a synchronous social media containing some unique features such as simultaneity and authenticity, which is different from asynchronous social media such as Weibo and Twitter. As (Li and Ku, 2018) argued that the growth and popularity of social commerce in recent years are increasingly transforming e-commerce from product-oriented environment to a customer-driven and social-centered. As a result, live broadcast e-commerce has the attributes of both social media and e-commerce, it can provide dynamic display of products and realize direct interactive communication between the sellers and customers (Park & Lin, 2020).
Since 2020, due to the impact of COVID-19, both the marketing and consumption environment have undergone tremendous changes (Gupta et al., 2021). As social media can track consumer’s digital clues more accurately and gain further understanding into purchase process (Wang & Lee, 2020), most traditional offline sales have been forced to move online, accelerating the digital transformation of new e-commerce businesses. The trend that “everyone can broadcast and sell anything” has become more and more prominent in China.
According to the China live Streaming e-commerce industry research report (36Kr Research 2020), live broadcast e-commerce has its own industrial chain structure, it relies on the underlying infrastructure, including mobile payment, logistics, e-commerce operation, and live broadcast services, the upstream of live broadcast e-commerce is the merchants (including brand owners, distributors, and manufacturers), and the downstream is customers. The platforms (traditional e-commerce platforms represented by Taobao, entertainment content platforms like Douyin and shopping guide community platforms like Muogujie) and Multi-Channel Network (MCN) are in the middle part, linking merchants and customers to meet the demands of both parties while creating a closed loop of industrial ecology.
The new phenomenal live commerce model has prompted the emergence of many network anchors and live streamers. Li Jiaqi is no doubt one of the most famous head network anchors in China. Known as the “King of Lipstick” in China, he held more than 240 live broadcasts in 2020, the products he sold in broadcasts covered various categories such as beauty makeup, clothing, food, daily necessities, and home appliances. He sold 439 products in his live room and the final sales reached US$1.9 billion on the first day of pre-sales leading up to Alibaba’s annual shopping festival in November 2021.
After watching Li’s broadcasts several times, the authors are curious about the fact why he can successfully arouse customers’ strong purchasing intentions in his live broadcast rooms and guide them to buy various products?
Some previous research studied the customers’ motivation on live broadcasts from various aspects in recent years. Cai and Wohn (2019) explored the complex affective and cognitive factors that influence the customers’ buying, and pointed out that product quality is a key factor influencing repurchase intentions. Fu et al. (2020) studied the relative importance of informational social influence, normative social influence, and perceived information quality on the consumer’s social shopping intention from the information processing perspective. Meng et al. (2020) constructed a theoretical framework of the influence of live broadcasting celebrity on consumer decision-making through the combination of qualitative and quantitative research, verified the psychological mechanism of the live broadcasting celebrity influencing purchase intention, and argued that most notable features that evoke customers are credibility, professionalism, and interactivity. Based on signaling theory and uncertainty, Lu and Chen (2021) proposed broadcasters’ physical characteristics conveyed through vicarious product trials and values shared via instant interaction as two signals that can help reduce product uncertainty and cultivate trust for the customers. Zhou et al. (2021) focused on psychological factors and demographic variables that affect Chinese customers’ adoption of live e-commerce shopping, and performed a cross-sectional survey of the e-commerce customers.
There are many comprehensive factors behind the success of live streaming. As for Li Jiaqi himself, his grounded personality has captured the hearts of many fans. By presenting himself as a friend, he sells in a way that does not seem forceful whilst helping people navigate the maze of merchandise available by offering his own personal opinions. On the other hand, Li Jiaqi is keen on public welfare undertakings such as poverty alleviation. For example, in 2020, he worked with China Central Television (CCTV) to promote the sales of products and stimulate consumption in some provinces in China. This establishes a good personal stature for him, and attracts more audience and customers.
We believe that the characteristics of anchors’ speech also contribute to the success of live streaming, as live streaming is fundamentally language-centered and chat-centered social shows. When watching his broadcasts, we can clearly feel that Li Jiaqi’s speaking style in the live broadcasts is very different from those in daily scenes. In this paper, we would like to study the linguistics features used in the broadcasts and explore following two research questions based on rich examples in our self-built corpus, by using methods and tools in the field of Phonetics, Natural Language Processing (NLP), and Corpus Linguistics.
(1) What are the characteristics of Li Jiaqi’s speech in the live broadcasts?
(2) Do these characteristics have effects on customers’ purchase intention?
As spoken language is composed of sound and meaning, the analysis will be divided into two parts: phonetic features (section 4) and language unit features (section 5), where the latter includes lexical, phrasal, grammatical and rhetorical features.
Our contributions in this paper are mainly in three aspects: First, we built a large-scale speech-text corpus of live broadcasts e-commerce in China, which can provide essential resource for some specific research tasks.
Second, we comprehensively analyzed the characteristics of live streamers from the perspective of linguistics, deepening the understanding on the features of live broadcasts.
Third, we testify our hypothesis that speech features in the live broadcasts have some effects on purchase intention of customers.
The remaining of this paper are organized as follows: section 2 introduces the process of building the speech-text corpus; section 3 discusses the basic pattern of Li’s broadcasts; section 4 presents the analysis of voice and intonation; section 5 discusses the analysis of linguistic units; section 6 conducts questionnaire survey and last section is the conclusion.
Building the Live Broadcast Speech-Text Corpus
In China, live broadcasts e-commerce are mainly supported by online shopping platforms. Each network anchor has his or her own broadcast room(s) on different streaming platforms. As for Li Jiaqi, his broadcast room is on Taobao APP, which is one of the largest online shopping platforms in China, owned by Alibaba.
Figure 1 shows the basic steps for building the speech-text corpus. We first enter the “Live Broadcasting” channel on Taobao APP, then search and choose Li’s room. In his room, we can see the list of each replay of live broadcasts, from the latest to the earlier ones, there is basically a live broadcast every day. After clicking any replay, we can enter the broadcast and see the playback video. We use a voice recorder to record the playback video and transcribe the recording into text, then proofread the transcribed text in detail to ensure the accuracy of the recorded text as much as possible, finally obtaining the speech-text corpus after essential text prepossessing including word and sentence segmentation and data cleaning.

Flowchart of building the speech-text corpus.
We randomly recorded 30 live broadcasts with more than 10 million audience every time from January 2021 to October 2021, the average duration of each broadcast was 2 to 2.5 hours, sometimes even longer, with a total recording time over 70 hours. The categories of products in live broadcasts include but not limited to “popular snacks,” “fashionable brands,” “daily necessities,” “big brand beauty,” “mother and baby specials,” “national products specials,” and “home improvement items, indicating a brand scope of live commerce. In the live broadcasts, in addition to Jiaqi himself, there are also other assistants and staff. When recording and transcribing data, their speech texts are also be retained, but the corpus as a whole is still dominated by Jiaqi.
Basic Pattern of Li Jiaqi’s Live Broadcasts
Figure 2 shows the screenshot of Jiaqi’s live room on Taobao APP, there are several parts and zones with different functions, marked with boxes and Arabic numerals. In this section, we will introduce the patterns in his broadcasts combined with the screenshot and the speech-text corpus, which is helpful for understanding the characteristics in following sections.

Screenshot of Li Jiaqi’s live room on Taobao APP.
While Li and his staff introduced the products in front of the desk, they can interact with the audience in real time, as interactive live videos are essential to attract customers’ attention and increase their purchase intention (Doong, 2021). At the bottom of the live room is the interactive zone (1),
where the audience can enter text messages and send them, the messages will appear in the comment zone (3) immediately, once the host reads them, he will interact with them or answer the questions as much as possible. The audience can also click the arrow icon to share the live broadcast to other social platforms or their friends, or click the heart icon on the far right to show their support for the broadcast. Zone (2) shows the purchase link, customers can click it and buy the product; zone (4) is the information dashboard, where you can learn about the live broadcast preview and other information; the upper left zone (5) displays the name of the live broadcast room and the number of people currently watching the broadcast (10.182 million people here).
Li Jiaqi’s live broadcasts usually follow a relatively fixed time, pattern and advancing rhythm. Most of the broadcasts start at the prime time around 7:30 to 8:00 pm every night, which is easier to attract more people to watch. The content of Li’s opening remarks for each live broadcast is basically unchanged, and have already become his iconic and exclusive symbols. Once we hear that “Hello! Hello! Here we come. Our live broadcast will start!”, we know he is coming and the broadcast will begin soon.
In the several minutes before the official start of the live broadcast, he quickly makes a preview of the products that will appear in the broadcast to stimulate the audience’s desire to watch the live broadcast and their expectations for the products.
During the broadcast, Li and his team will introduce the advantages and features of the products in detail, with the help of many examples and their own experience and feelings. Then they will give customers the link to buy and the price of the item, which is always much lower than that in the flagship stores, along with lots of freebies, gifts, and sample packs. The main purpose is to attract the audience’s purchase intention as much as possible. It’s worth noting that the order in which the products appear in the live broadcast and the explanations of products follow a carefully designed and formatted arrangement.
In addition, Li Jiaqi often interacts with the audience in various forms, such as answering questions and comments raised by the audience, even using funny body language and expressions etc. After introducing three to five products, there will be a lucky draw event, prizes are generally limited-edition products. When one live broadcast is coming to the end, Li will also introduce the products that will appear in next broadcast in advance.
From the pattern of Li’s broadcasts, we can compare the differences between the popular live e-commerce and traditional e-commerce, as shown in Table 1, indicating live e-commerce has more advantages.
Comparison between Live E-Commerce and Traditional E-Commerce.
Analysis of Phonetic Features
Most of the audience in China who have watched Li Jiaqi’s live broadcasts have a consensus: he treats each live broadcast with passion from beginning to end, with loud voice, rising tone, and high spirits.
To further understand the acoustic features behind his voice, we used Praat 1 (Boersma & Weenink, 2001) to analyze some selected segments from his speech voice. Praat is a famous and free computer software for speech analysis in phonetics. It could be used to describe and analyze the four essential elements as well as labeling and segmentation. And the results can be shown in graphics.
Before starting the analysis, let’s first introduce some basic phonetic terminology: pitch, intensity, duration, and timbre, which are four elements of voice (Ladefoged & Johnstone, 2018). Pitch refers to the highness or lowness of voice, determined by the frequency of vibration of the vocal cords–the quicker they vibrate, the greater the frequency of the sound they produce. Intensity refers to the amount of energy used in producing a speech sound, determined by the amplitude of vibration of the vocal cords—the greater is the amplitude, the louder is the sound. Duration could be defined as a feature of sound that describes its length or quantity, determined by the time the vocal cords vibrate. Timber could be defined as the characteristic quality of a sound determined by resonance, which is the most essential feature that distinguishes one sound from another. Generally, these four factors work together in combination, but experimental work has shown that these factors are not equally important; the strongest effect is produced by pitch, and length is also a powerful factor.
We analyzed a 20-second clip of the opening remarks of a live broadcast. The speech text is as follows: “Hello (Hello! Hello! Here we come. Our live broadcast has started. Hello, here we come. Good evening, everybody!
The Praat analysis of voice fragment “

Analysis of voice fragment.
Figure 3 clearly shows that the duration of the modal particle “
In addition to the modal particle, the duration of the adverb “

Another example of the analysis of a voice fragment.
We further calculated some acoustic features of each Chinese syllable in this sentence, including duration, intensity, and max Fundamental frequency (F0). F0 is defined as the average number of oscillations of vocal folds per second and expressed in Hertz (Hz), which is related to pitch and serves as an important acoustic cue for tone, lexical stress, and intonation.
As shown in Figure 5, both the duration (422.16 ms) and F0 (343.89 Hz) of the modal particle “

Acoustic features of each Chinese syllable in the example sentence: (a) duration (ms), (b) intensity (dB), and (c) fundamental frequency(Hz).
A deep look from the view of acoustic features can further reveal that Li Jiaqi is really good at controlling the speed of speech and the change of voice intonation in live broadcasts to achieve the effect of interacting with the audience. Previous empirical study (Xu et al., 2020) also proved that both cognitive assimilation and emotional state are key drivers facilitating consumer behavior, which can bring emotional, pleasant, or esthetic experiences, and directly influence impulsive consumption, hedonic consumption and social sharing behavior.
Analysis of Language Units Features
As mentioned earlier, Li’s live broadcasts involve numerous categories of products such as beauty, food, and daily necessities. According to various characteristics of the categories, he uses different marketing skills and targeted verbal tricks to attract customers with different needs of different ages. Based on our self-built corpus, we found that there are many common-used language units in the broadcasts, from words to phrases to the sentences level, supporting the smooth running of Li’s live streaming.
Lexical Features
Different words with different parts of speech (POS) are frequently used during Li’s live broadcasts, including but not limited to pronouns, nouns, adverbs, and modal particles.
Pronoun
The first-person pronouns “
Live streams are interactive activities with the audience dominated by the anchor from the first-person perspective. The anchor and customers are in an unequal status. In fact, all the broadcasts could be simplified into two stages: “I(We)(anchors) recommend” and “You(customers) buy.”
Particularly, the first-person pronoun “
(1) “
(2) “
On the other hand, in the promotion stage, the scope of the first-person pronoun “
(3) “
In addition, whether in the recommendation stage or the promotion stage, the usage of first-person pronouns could inspire customers’ sense of trust. For example:
(4) “
(5) “
These two examples can offer the audiences a feeling that, regarding all aspects of the products, the anchor will personally consider for the audience and try it out in advance, doing their best to give customers the best shopping experience and authentic and reliable product introduction and guarantee. Unconsciously, psychological distance and perceived uncertainty are decreased by obtaining more concrete product information through in-depth interactions (Zhang et al., 2019), and the customers’ trust in the anchor and products increase, thereby increasing their desire to buy. Note that the authentic and reliable product introduction are also in accordance with “Maxim of quality” in the famous pragmatic theory Cooperative Principle, proposed by linguist Paul Grice (Hu, 2019), that during the conversation and communication, we should try to provide truthful and justified information.
Noun
The customer group of Li’s live broadcasts is mainly female. He employs kinds of nouns for the selling. After introducing the product, he would first inform the audience with its price on Tianmao Mall APP and price in his live stream, and then presents the purchase link. The price in the live rooms is much lower than the former. In condition to the price concession, there are also other special offers, such as coupons, vouchers, and gifts.
As shown in Figure 6, we use open-source Python word cloud tools
2
to draw the word cloud for nouns with high frequency in the corpus. The most common used nouns in Li’s live streams include but not limited to:

Word cloud of the common-used nouns in Li’ live broadcasts. The word with largest font is “girl,” and the word below is “Tmall.”
Taking the words “same style” as an example, many products, especially beauty products and clothing products, usually have celebrity spokespersons. Li Jiaqi also often invites many celebrities as guests in his live room. When introducing the products, he would specifically emphasize that the products are the same style of certain star(s). With the help of the star’s halo effect, even with descriptions and explanations of the guests themselves, many audience, especially the fans will be attracted to buy the products easily and spontaneously.
It’s worth noting that most of the time under such condition, customers, or fans buy the products simply because of their love for the stars rather than the utility of the products themselves. Which is in accordance with previous study that online shopping in the social commerce setting is driven more by hedonic motivations than by utilitarian motivations in China (Akram et al., 2021), and that hedonic counterpart relates to the emotive and multisensory aspects of the shopping experience, customers’ corresponding emotional buy-in can only be attained through the presence of hedonic consumption activities (Liu et al., 2021). In fact, From the very eye-catching slogan “Rational consumption and happy shopping” on the wall of Li Jiaqi’s live broadcast room (shown in Figure 2, just below the words “LIVE SHOW”), it can be seen that the live broadcast has actually been trying to create a hedonistic atmosphere.
Adverbs
In each live broadcast, Li Jiaqi used adverbs very frequently, especially adverbs of degree. When using one or more adverbs of degree, such as “
Such adverbs can collocate with other adjectives, forming the common “Adverb+Adjective” structures. The same structure can be used in various categories of products. For example, for beauty products, “
We randomly selected a 2-hour broadcast speech-text from our corpus, and conducted simple frequency statistics of the adverbs appeared in the broadcast, Table 2 shows the top ten most commonly used adverbs.
Frequency of the Top 10 Common-Used Adverbs.
Modal Particles
Modal particles are function words used at the end of the sentences to express the speaker’s emotion and reflect the speaker’s will, attitude, and evaluation. Many special modal particles in Mandarin Chinese such as “
The usage of these modal particles can soften the anchor’s tone of voice, narrow down the distance with the audiences, thereby further improving their acceptance of the anchor’s promotion.
Here are some examples:
(6) “
(7) “
(8) “
Although these four types of words are representative words that often appear in live broadcasts, their frequency of occurrence is also different. According to our preliminary statistics, Pronouns have the highest frequency while Modal particles have the lowest frequency, as shown in Table 3. Note that we use five-pointed stars to indicate the high and low frequency of the words here instead of providing exact frequency.
Frequency of the Four Types of Words.
Type-Token Ratios
Besides above words with different parts of speech (POS), we also calculate the Type-Token Ratios (TTR) with Python NLTK library. 3 TTR is the total number of UNIQUE words (types) divided by the total number of words (tokens) in a given text, which shows the lexical richness, or variety in vocabulary (Fikri et al., 2021). The closer the TTR ratio is to 1, the greater the lexical richness of the segment.
We randomly choose 10 texts with word segment processing 4 from our self-built corpus to calculate TTR in each text and then obtain the average TTR across the whole 10 texts, the average is only 0.123 (3,221 average types/26,230 average tokens).
The results show that the vocabulary in Li’s broadcasts is relatively small, and lack richness and variation of words. This also shows from another perspective that Li Jiaqi’s expressions in each live broadcast are highly similar, and he can use fewer common words to finish more live broadcasts, but still can attract enough audiences and customers successfully.
Phrasal Features
The phrases used in the live broadcasts are mainly some relatively fixed terms exclusive to Li Jiaqi, and these phrases are highly recognizable and serve as Li’s labels.
The phrases “
“
“
Grammatical and Sentence Features
In addition to the commonality at the level of words and phrases, Li Jiaqi’s sentence expressions in the live broadcast also have certain commonalities.
First, he often uses short sentences that are simple but informative. Such statements can attract the attention of the audiences and also introduce as much information as possible in a limited time. In this way, circumstantial evidence is also provided to show the anchor’s professionalism since the more information he introduces, the more he seems to be fully aware of the products. Here are some examples:
(9) “
(10) “
Second, he often uses imperative and exclamatory sentences, which express his strong appreciation and fondness of the products without any reservation. These imperative sentences with commanding tone are intensely inflammatory and encouraging, unconsciously stimulating audiences’ purchase intentions.
(11) “
(12) “
(13) “
The aforementioned adverbs of degree and modal particles (such as “
(14) “
(15) “
Third, Li often uses expressions of hunger marketing to drag customers into impulsive decisions through product scarcity or discounts, just as the following examples show:
(16) “
(17) “
Finally, during the live broadcasts, Li often repeats parts of the sentence in a targeted manner, reminding the audience to pay attention to the focus of the information, thereby increasing the audience’s attention to him.
(18) “
Besides, we also found that Li Jiaqi often use some pet phrases in the live broadcast. Except for his highly iconic slogan “Oh my God!”, another typical example is: “
We use Antconc,
5
a powerful freeware corpus analysis toolkit in the field of Corpus Linguistics, to investigate the key words “

KWIC in the corpus.
We can see that the words appear with high frequency (33 times here) and the same subject: first-person pronoun “I,” and following by some adverbs of degree such as “
Li can convince the customers to believe in his words and purchase the products.
Rhetorical Features
Li Jiaqi prefers to use various rhetoric to describe the products in his live broadcasts to make the audience have a more intuitive impression of the products or consumption experience. Exaggeration and metaphor are among the more common rhetorical devices, as shown in the following two examples.
(19) “
(20) “
Other rhetoric such as personification is also used, as shown in the following sentence.
(21) “
Hearing these vivid descriptions, coupled with the anchor’s gestures and body language, the audience will have an intuitive and deeper understanding of the usage and feeling of the products, and will buy them, if possible.
Linguistic Features and Purchase Intention
The famous Stimulus-Organism-Response (S-O-R) theory (Jacoby, 2002; Mehrabian & Russell 1974) is commonly used in online buying studies. According to the theory, the various stimulating cues (such as streamer attractiveness and information quality) can trigger a customer’s (organism) emotional and cognitive process, resulting in approaching behaviors, that is, buying the products. After analyzing the phonetic and semantic features in above sections, we present the hypothesis that these features can also be the stimulus and will have some effects on purchase intention of customers. In this section, we wish to test our hypotheses by means of questionnaire survey.
We employed online survey and received initial 546 responses, after removing incomplete or irregular questionnaires, 524 valid responses were valid for further analysis. Table 4 shows the demographics of the samples. More than 60% of the respondents were female, indicating they are the main consumers and audience groups of live broadcasts, and most audience are young people aged between 21 and 30 years old, with higher education background (undergraduate and postgraduate).
Sample Demographics (N = 524).
To sum up, our findings on the linguistic characteristics can be summarized as follows:
Table 5 is the items proposed in our questionnaire, each item is measured with 5-point scale Likert scale ranged from “strongly disagree” to “strongly agree.” We designed 11 questions based on the phonetic and semantics features, in which Q3 and Q8 are about the relations between the features and purchase intention.
Question Items in the Questionnaire.
For each question, most of the responses are positive. In particular, we would like to present the results of Q3 and Q8 (Figure 8) here. About 61% (320/524) agree or strongly agree that phonetic features will stimulate purchase intention, and even 71% (374/524) agree or strongly agree that the language unit features have positive impacts on purchase intention, which clearly testified our hypotheses.

Statistics of responses to Q3 and Q8.
Conclusion
This paper aims to study and analyze the characteristics of anchor’s speech in the live broadcasts in China, taking the famous anchor Li Jiaqi for example. We first built large-scale speech-text corpus by recording Li’s live broadcasts, then employed methods and tools in the fields of Phonetics, NLP and Corpus Linguistics to analyze his speech features with rich examples, from the aspects of phonetic, various language units and rhetorical features. We also conduct preliminary questionnaire survey to testify our hypothesis that these characteristics in live broadcasts indeed have effects on customers’ purchase intention, thus prompting them to buy various products in the live room.
(i) Li usually speaks very fast in the live broadcasts, with rich intonation, full of passion and exaggerated expression, and he will deliberately increase the length of some voices at the same time to express special emotions and attract the attention of the audience.
(ii) Li is good at using easy-to-understand colloquial expressions and personalized label words at different levels of language units to attract potential audiences, with the help of witty rhetorical expressions and funny body language, to enhance the consumption experience and emotional value outside of shopping.
(iii) Li can use accurate pragmatic communication strategies to achieve high-quality language expression in a limited time, thus narrowing the distance between the consumer and him, and increasing the trust in him.
In the future, we will continue to collect and expand the scale of Li’s live broadcast data, and conduct further study with word embedding and sentimental analysis approaches. We will also collect some other female anchors’ live broadcast data in China, comparing and analyzing the differences of characteristics between male and female anchors.
Footnotes
Acknowledgements
We would like to thank the anonymous reviewers.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This paper was supported by Natural Science Foundation of China (NSFC, Grant No. 61902024) and Beijing Institute of Technology Research Fund Program for Young Scholars.
Ethical Approval
This paper does not involve any animal or human studies, so we will not provide the ethics statement.
