Decoding the Subtleties: Speech Voice Cues and Their Impacts on Viewer In-Consumption Engagement in Travel Live Streaming

Abstract

Through a mixed-method approach, this research examines how speech voice cues influence in-consumption engagement in travel live streaming, underpinned by signaling theory. Study 1 examines the impacts of speech voice cues on in-consumption engagement using real-world travel live streaming data. Study 2 employs semi-structured interviews to reveal the underlying mechanisms of how speech voice cues influence viewer in-consumption engagement. Findings show that speech rate had a significant inverted U-shaped effect, whereas loudness and pitch had a significant negative impact on in-consumption engagement. Piecemeal habits, voice-destination consistency, and perceived credibility were identified as underlying mechanisms in speech voice cues. This research extends signaling theory to a dynamic travel live streaming context, empirically explaining how speech voice cues of travel live streamers influence viewer in-consumption engagement.

Keywords

travel live streaming voice analytics speech voice cues in-consumption engagement Tik Tok

Highlights

This study explores the role of speech voice cues in signaling theory.

Piecemeal habits, voice-destination consistency, and perceived credibility are the underlying mechanisms that underpin speech voice cues.

We found that Speech voice cues significantly impact viewer in-consumption.

These findings contribute to voice analytics in tourism engagement literature.

Introduction

The voice of speech, like the beat of a drum, can sway the soul.

— Chinese Proverb

With the popularity of travel live streaming, an increasing number of tourism destinations are now choosing to promote their destinations through travel live streaming (M. Li et al., 2023; Liang et al., 2024; Lv et al., 2022). This popularity is attributed to the raw and authentic visual content that characterizes travel live streaming, as well as the active engagement and immersive experience it provides to viewers via digital media (Liang et al., 2024), allowing viewers to engage in tourism anytime and anywhere.

Live streamers and viewers lack personal interactions commonly existent in traditional face-to-face tourism exchanges (Deng et al., 2021). To convey credibility, establish connection, and increase engagement, signals are vital (Mavlanova et al., 2016). According to signaling theory, the contributor (individual or entity) chooses attributes to convey information to recipients, particularly in contexts where the information is scarce, one-sided, and emanating from the contributor (X. Li et al., 2019). In travel live streaming, the live streamer acts as the primary content contributor and the principal source of signals (M. Li et al., 2023; Lu & Chen, 2021). As live streaming occurs in real-time and is immediate, no pre-recorded content or post-production edits take place (Deng et al., 2021). In this dynamic setting, live streamers project observable signals (Filieri et al., 2021), such as linguistic style (M. Li et al., 2023), facial emotion (M. Li et al., 2025), and facial attractiveness (Guo et al., 2022), to enhance engagement levels.

A live streamer’s linguistic style encompasses speech voice cues, such as pace, pitch, and loudness (M. Li et al., 2023). We contend that speech voice cues function as signals that influence viewer in-consumption engagement with travel live streaming. This is because the real-time setting of live streaming compels viewers to be more discerning and reliant on signals that convey raw and authentic experiences (X. Wang et al., 2023). Viewers depend on the quality of the signals to make inferences and form connections that affect their real-time decisions about in-consumption engagement. We advocate that if signals—such as the speech pace, pitch, and loudness of travel live streamers, are perceived as authentic and credible—they can significantly enhance viewer in-consumption engagement.

In examining the body of work on travel live streaming, three key gaps exist. First, there is limited knowledge of the role that speech voice cues play in travel live streaming. Earlier research on travel live streamers has mainly focused on the relationship between viewers and streamers, such as exploring para-social relationships that influence viewer assessments of trust and professionalism in streamers (Deng et al., 2021; Guo et al., 2022). However, there is scant research on the linguistic styles, particularly with speech voice cues adopted by travel live streamers. Second, previous research on travel live streaming is lacking in its focus on viewer in-consumption engagement. Viewer in-consumption engagement describes how viewers are engaged on a moment-to-moment basis when watching content on social media platforms in real-time (J. Zhu et al., 2025). This gap provides latitude for research to address viewer in-consumption engagement in dynamic social media content (Q. Zhang et al., 2020). Third, and stemming from the first two gaps, is the absence of a relevant and precise method to investigate viewer in-consumption engagement. Traditional post-event methods that measure “likes,” “comments,” and “shares” are ineffective in analyzing the dynamic travel live streaming context and the extent of viewer in-consumption engagement (J. Zhu et al., 2025). This is because travel live streaming content is spontaneous and fluid, changing continuously in real time (Q. Zhang et al., 2020).

Against this backdrop, and drawing on signaling theory, this research empirically examines the influence of speech voice cues by travel live streamers on viewer in-consumption engagement, and explores the underlying mechanisms or hidden processes on the “how” and “why” behind observed phenomena.

Conceptual Background and Hypothesis Development

Signaling Theory and Travel Live Streaming

Signaling theory considers how individuals or entities communicate or convey information to others through signals, usually in situations where there is asymmetric information (X. Li et al., 2019). Signals are typically regarded as attributes of an entity that may be adjusted based on a signaler’s inclinations, facilitating the transmission of concealed or scarce information from the signaler to another (Kirmani & Rao, 2000). Within a buyer–seller relationship, information asymmetry implies that the consumer possesses a lesser degree of pertinent information than the seller (Fan et al., 2021; Mavlanova et al., 2016). This imbalance makes it difficult for consumers to respond in an appropriate way, such as by making a purchase or leaving a comment (H. K. Cheng et al., 2020; Lu & Chen, 2021).

Signaling theory has been utilized to study online viewer engagement across various domains to understand how subtle cues influence viewer perceptions and behaviors (X. Chen et al., 2023; Lu & Chen, 2021; Y. Wang, Yang, et al., 2024). For example, Lu and Chen (2021) have introduced signaling theory to identify streamer physical characteristics as signals to persuade viewers with similar physical characteristics to purchase clothes and cosmetics. However, their study did not identify specific physical characteristics, making it difficult to launch subsequent studies on physical characteristics as signals. Drawing on signaling theory, X. Chen et al. (2023) have highlighted the importance of consistent signaling cues, such as the self-product, anchor, live content, and Danmuku content on reducing product uncertainty in live streaming e-commerce. Nonetheless, their study acknowledged methodological limitations in the data collected in post-consumption, which did not capture real-time viewer in-consumption engagement. Y. Wang, Yang, et al., (2024) have adopted signaling theory to identify the influence of interpreter voices on tourist purchasing intentions. Such studies primarily focused on a unidirectional communication process, only flowing from the broadcaster to the consumer (X. Li et al., 2021; Y. Wang, Yang, et al., 2024; H. Zhu & Wang, 2022). This fails to capture the dynamic and multidirectional communication process of real-time interactions that are immediate and ongoing in travel live streaming.

These gaps present an opportunity to adopt signaling theory as a theoretical basis for examining a less explored domain of speech voice cues—specifically, pace, pitch, and loudness—and their underlying mechanisms. These cues and underlying mechanisms are investigated for their influence on real-time viewer in-consumption engagement during travel live streaming.

Viewer In-Consumption Engagement

Viewer in-consumption engagement is a phenomenon that describes how viewers are involved on a moment-to-moment basis while watching media content (Q. Zhang et al., 2020; J. Zhu et al., 2025). Conventionally, viewer engagement with online media has been measured by the number of “likes,” “comments,” and “shares” (J. Zhu et al., 2025). This approach has limited value for assessing the extent of viewer engagement during visual consumption because live streaming is dynamic, and the content changes at any moment (Q. Zhang et al., 2020). Measuring viewer in-consumption engagement overcomes this limitation by capturing viewers’ real-time reactions and synchronously linking them to specific live streaming content (Deng et al., 2022). Higher levels of viewer in-consumption engagement indicate greater popularity of travel live streaming, attracting larger viewership (Y. Chen et al., 2025).

The literature on viewer engagement in live streaming gives attention to three key stakeholder perspectives. The first perspective explores the live streaming platform. For example, Lin et al. (2021) have considered the quality of live streaming attributes, such as the positive atmosphere created by the live streamer, and found that these elements positively influence viewer engagement. The second perspective scrutinizes the viewer by centering on the trust they place in live streamers (e.g. Hilvert-Bruce et al., 2018). The third perspective examines the travel live streamer for their physical attractiveness (e.g. Y. Zhang & Prebensen, 2025) and linguistic style (e.g. M. Li et al., 2023) in engaging viewers. While studies that originate from the third perspective have highlighted the importance of speech voice cues in influencing viewers (M. Li et al., 2023), there is a gap in understanding how speech voice cues in dynamic travel live streaming impact viewer real-time in-consumption engagement. This calls for a progressive methodology in investigating travel live streaming, which theoretically unpacks the speech voice cues of pace, pitch, and loudness, and their underlying mechanisms, to determine their impact on viewer in-consumption engagement.

Speech Voice Cues Features

Non-verbal communication extends beyond mere linguistic content to encapsulate a broader scope of communicative elements (Hall et al., 2019; S. Liu et al., 2022; X. Wang et al., 2023). While verbal communication forms the bedrock of human interaction, the potency of non-verbal communication can surpass the spoken word (Islam & Kirillova, 2020; Jung & Yoon, 2011). Speech voice cues are fundamental to non-verbal communication (Naderi Varandi et al., 2023; Y. Wang, Ruan, et al., 2024).

Sundaram and Webster (2000) have suggested a multi-dimensional framework of non-verbal communication that affords a robust analytical lens to assess interactions. Their framework identifies the following dimensions: (1) appearance, such as the implicit messages transmitted via facial aesthetics, attire selection, and hairstyling; (2) kinesics, encompassing body movements, such as sustaining eye contact or gestures communicate messages; (3) proxemics, which considers the significance of physical space and touch in conveying sentiments; and, notably (4) paralanguage, which takes into account the auditory nuances in pace (speech rate), pitch, and loudness.

Research has consistently highlighted the critical role that paralanguage plays in effectively facilitating: (1) information transmission; (2) communication effectiveness; and (3) emotional expression (Naderi Varandi et al., 2023; X. Wang et al., 2023). Speech voice cues are a prominent dimension of paralanguage (M. Li et al., 2023), which primarily include pace (speech rate), loudness, and pitch (Hall et al., 2019; Y. H. Lee & Lim, 2010; Y. Wang, Ruan et al., 2024). This research selected pace (speech rate) for its ability to reflect the speaker’s reliability, expressiveness, and persuasiveness (Y. Wang, Ruan et al., 2024). Pitch was picked due to its high correlation with positive and negative emotions (Mas et al., 2020). Loudness was chosen because variations in volume significantly increase viewer attention (Y. Wang, Ruan et al., 2024).

Pace

Pace refers to the speech rate (Rodero, 2016) and denotes the speed at which verbal communication occurs. Precisely, the rate of speech gauges the velocity of vocal delivery (Rodero, 2020) and is linked to the amount of information conveyed (S. Liu et al., 2020). Variations in pace—rapid or slow—elicit distinct effects on listeners. Previous research emphasizes the positive effects associated with a rapid speech rate. Speech rate has been found to capture the attention of listeners, inducing them to invest more effort in processing information (Chattopadhyay et al., 2003; Rodero, 2020). Moreover, rapid speakers tend to convey more credibility compared to those who speak more languidly (Chebat et al., 2007; X. Wang et al., 2023). A quicker speech rate is also linked to audience satisfaction as it alleviates boredom (S. Liu et al., 2022; Y. Wang, Yang et al., 2024). Although earlier studies extol the merits of quickened speech, it is also important to maintain moderate speed, which a viewer can fully understand and engage with (S. Liu et al., 2020; Y. Wang, Yang et al., 2024). Conversely, speaking too fast may interfere with information processing, comprehension, and interaction (De Waele et al., 2019; Rodero, 2020). From these findings, a U-shaped relationship is noted, demonstrating that an increase in speech rate boosts viewer engagement; however, beyond a certain threshold this relationship shifts from positive to negative. In travel live streaming, signaling theory points to the live streamer’s speech rate in delivering information. A consistent speech rate, which communicates understandable speech, builds connections with viewers by demonstrating the live streamer’s credibility in delivering content. Such perceptions of trustworthiness and expertise profoundly form a viewer’s level of interest and trust in the travel live streamer, thereby influencing their in-consumption engagement. Thus, we propose:

H1: The speech rate of travel live streamers has an inverted U-shaped relationship with viewer in-consumption engagement.

Pitch

Pitch, delineated by the frequency of sound waves and typically measured in hertz (Hz), plays a pivotal role in shaping consumer interpretations (Hagtvedt & Brasel, 2016). Pitch variance is commonly acknowledged in prior literature for its significant impacts on engagement and purchase intention (Y. Wang, Yang et al., 2024). For instance, a higher pitch is frequently associated with diminished credibility, increased apprehension, lack of self-assuredness and reduced assertiveness (Jiang & Pell, 2017). The political campaign literature draws the same conclusions. In this context, candidates with deeper, lower-pitched voices are often perceived as more formidable contenders, which potentially sway electoral outcomes in their favor (Tigue et al., 2012). Lower voices influence listener perception of a speaker’s stature and warmth, shaping their overall attitude and behavioral trends toward the speaker (Barnes, 2024). In travel live streaming, signaling theory draws attention to the live streamer’s pitch in conveying information (Y. Wang, Ruan et al., 2024). For example, travel live streamers are likely to speak in a higher pitch when they are excited, anxious, or surprised (Hilvert-Bruce et al., 2018). Conversely, travel live streamers may modulate to a lower pitch when they are serious or relaxed. As tourism is primarily motivated by a desire to relax and reset (Mannell & Iso-Ahola, 1987), a sharp or jarring high pitch may have a negative impact on viewer in-consumption engagement. Thus, we propose:

H2: A high voice pitch in travel live streamers is associated with lower viewer in-consumption engagement.

Loudness

Loudness, a defining attribute of speech voice cues, is primarily measured using amplitude, represented in decibels (dB; S. Liu et al., 2022; Y. Wang, Yang et al., 2024). The loudness of the speech is key to determining that the message is clear and understood (Y. Wang, Yang et al., 2024). Voices that resonate with increased loudness and confidence more successfully captivate and retain listener attention, acting as auditory focal points (Zougkou et al., 2017). However, overly pronounced loudness may manifest as extreme emotion, potentially distancing or alienating listeners (X. Wang et al., 2021). In travel live streaming, signaling theory highlights the live streamer’s loudness in articulating their message clearly and accurately. For example, travel live streamers are likely to speak loudly when they are confident or passionate about their subject content (M. Li et al., 2023). Alternatively, travel live streamers may adjust to a quieter and softer level to sound calm and collected. Again, as relaxation and invigoration are motives for travel (Mannell & Iso-Ahola, 1987), an overly loud and strident voice may have a negative impact on viewer in-consumption engagement. Thus, it is proposed:

H3: Increased voice loudness in travel live streamers is associated with lower viewer in-consumption engagement.

Research Design

Overview of the Studies

This research adopts a pragmatist paradigm, which embraces methodological and paradigmatic flexibility within a single study to better address complex research questions (Morgan, 2014). In line with this paradigm, this research employs a mixed-methods approach across two studies. Study 1 acquires and confirms knowledge through empirical verification (Goodson & Phillimore, 2004). In Study 1, the specific impact of speech voice cues on viewer in-consumption engagement in travel live streaming was quantitatively assessed. Study 2 adopts interpretivism to explore interviewees’ subjective experiences. This discerns phenomena from interviewees’ perspectives, acknowledging multiple realities constructed through individual experiences and social interactions. In Study 2, researchers qualitatively seek to identify the underlying reasons, contexts, and processes behind the quantitative findings of Study 1. This enables a richer, more nuanced explanation of complex social phenomena (Ivankova et al., 2006).

Whether the findings from both Studies 1 and 2 may be applicable to other contexts or samples depends on transferability (Considine et al., 2005). Transferability describes the extent to which readers of the findings decide whether they may be transmitted to other contexts or interviewees (Lincoln & Guba, 1985). To maximize potential transferability, the research follows protocols by Lincoln and Guba (1985), which advocate systematic, comprehensive, and clear descriptions of the research context, characteristics of live streaming, data collection, and analytical processes. Additionally, direct quotations were used to capture interviewee perspectives in details (Creswell & Creswell, 2003; Kruger & Saayman, 2015).

Study 1 used voice analytics using an automated technique to extract insights from unstructured auditory data (S. Liu et al., 2022) to examine the inverted U-shaped relationship between pace (speech rate) and viewer in-consumption engagement, addressing H1. It also tests the negative relationships of pitch and loudness on viewer in-consumption engagement, addressing H2 and H3. Study 1 draws on real-time data from Chinese TikTok’s travel live streaming. This platform is selected because it is China’s most renowned live streaming platform. In 2023, Chinese TikTok surpassed 1 billion active users daily and, with over 130 million individuals engaged in live streaming activities, this has made it the popular choice of live streamers (Gao et al., 2023). On Chinese TikTok, travel live streamers showcase tourism destinations, attractions, and products, actively connecting viewers with travel live streamers and enhancing viewer in-consumption engagement.

Study 2 used semi-structured interviews to identify mechanisms that underpin the impacts of speech voice cues on viewer in-consumption engagement. The rich, descriptive data acquired from semi-structured interviews provides more comprehensive insights into the psychological triggers involved in viewer in-consumption engagement during travel live streaming (Miles & Huberman, 1994). To elicit interviewees for Study 2, purposive sampling is utilized. Purposive sampling allows for the deliberate selection of the most suitable interviewees or data sources, ensuring that the collected information is highly relevant and directly contributes to addressing specific research aims (M. Li et al., 2023).

Study 1: Voice Analytics

Data collection

Active travel live streaming sessions on Chinese TikTok, operating daily from 8 am to 10 pm between December 5 and December 30, 2023, were collected. These included 12 travel live streaming sessions in China, and four overseas (i.e., Europe; Bali, Indonesia; and Kuala Lumpur, Malaysia). The travel live streaming sessions focused on outdoor activities and did not reveal the travel live streamer’s face. A total of 16 distinct travel live streaming sessions were analyzed, with durations ranging between 1 to 6 hours. Most of these sessions ranged from 30 minutes to 2 hours.

When selecting travel live streaming sessions, a primary consideration was variability, which referred to the diversity of the chosen sample (Eisenhardt, 1989; J. Zhu et al., 2025). The selected travel live streaming content encapsulated a broad spectrum of contexts (Eisenhardt, 1989; J. Zhu et al., 2025), from varied tourism attractions to tourism products presented by diverse live streamers, with viewers ranging from 100 to 20,000. The chosen travel live streamers were aged between 20 and 55 years, including both male and female.

The study was conducted at a second-level granularity, meaning that data were captured on a per-second basis (Lin et al., 2021). For each second of every travel live streaming, the study elicited structured and non-structured data, including the number of real-time live comments from viewers and voice data from travel live streamers. Viewers live comments were chosen because this provides a real-time, direct measure of viewer interaction and engagement during the travel live streaming (M. Li et al., 2025; J. Zhu & Cheng, 2025). Unlike “likes,” followers, or virtual gifts—which can be influenced by pre-existing popularity or external promotional activities—live comments reflect spontaneous viewer responses specific to the content presented during the live stream. Live comments effectively capture the dynamic and interactive nature of live streaming on platforms, such as Chinese TikTok, where immediate viewer feedback is crucial for assessing engagement levels (J. Zhu et al., 2025). As shown in Figure 1, the final data included two distinct components: (1) the voice segment and its corresponding live comments; and (2) the demographics of each travel live streamer.

Figure 1.

Live Comments During Travel Live Streaming.

Measures

The independent variables—speech voice cues—were measured by pace (speech rate), pitch (Hertz as Hz) and loudness (decibels as dB). The dependent variable—viewer in-consumption engagement—was measured by the number of live comments, whereby an increase in comments indicated higher levels of engagement. The control variables included: (1) gender as a dummy variable, represented as 1 for male and 0 for female; and (2) follower numbers on Chinese TikTok. As the number of followers on this platform ranged from 5,000 to 200,000, 1 represented more popular travel live streamers, with over 200,000 followers and 0 less popular travel live streamers, with under 200,000 followers.

Voice analytics

As shown in Figure 2, the content of travel live streaming from Chinese TikTok was transformed into an MP3 voice format. This study utilized the AudioSegment module from the third-party Python library pydub to convert MP3 files into AudioSegment objects within the pydub.

Figure 2.

Voice Analytics Process.

Based on the duration of the entire travel live streaming, measured in seconds, a Python “for loop” was implemented to segment the AudioSegment object. This process involved dissecting the audio data into individual segments, each lasting 1 second. Silent parts in the audio, which were the parts without sound from the travel live streamer, were removed, and then the audio converted transcript was sliced by seconds into WAV format. From this process, 20 hours of audio were obtained. Then, framing and windowing techniques for audio signal pre-processing were applied. Upon administering a short-time Fourier transform to the pre-processed signals, both the pitch (measured in Hz) and the amplitude (measured in dB) for each audio segment were extracted. The amplitude was subsequently adjusted to dB-scaled loudness (loudness), as shown in Formula 1, referencing the human auditory threshold (2 × 10⁻⁵ Pa). Finally, the mean values of pace, pitch, and loudness for all utterances within each audio file that analyzed vocal metrics were calculated.

Further, the audio into text was transcribed, using iFlytek to calculate the pace (speech rate) of the travel live streamer. Each audio file was transformed into audio signals, drawing from the Python library, librosa. Subsequently, the duration for each utterance was delineated (each sentence in seconds). The speech rate for sentence “I” was determined by dividing the word count by its respective duration, as shown in Formula 2:

Loudnessi = 20 \times \log_{10} \frac{Amplitudei}{reference}

(1)

SpeechRatei = \frac{wordcounti}{Durationi}

(2)

Poisson regression was used for hypothesis testing. Y represents the view engagement, measured by the number of live comments. X1 represents pitch, measured by the mean of pitch (Hz). X2 represents loudness, measured by the mean of loudness (dB) and β0 is the intercept term. Further, to account for the time lag between viewers receiving voice signals and sending live comments, the dataset was aggregated on a second-by-second basis into half-minute (30-sec) intervals for analysis. A total of 71,920 audio clips of travel live streaming were used for the analysis. An estimation of the hypothesized relationships is shown in Formula 3:

\begin{array}{l} y (numberoflivecomment) = β_{0} + β_{1} x_{1 i} + β_{2} x_{2 i} \\ + β_{3} x_{3}^{2} + π Control \end{array}

(3)

Descriptive analysis

The descriptive statistics are shown in Table 1. For the dependent variable, the average number of live comments was 12.14 per 30 seconds. For the independent variables, the average pace (speech rate) was 153.81, pitch was 162.75 Hz, and loudness was 66.55 dB, all measured per 30 seconds.

Table 1.

Descriptive Statistics of Variables.

Variable Type	Variables	Mean	SD	Min	Max
Dependent Variable	Viewer in-consumption engagement (# of live comments)	12.14	15.07	0	112
Independent Variables	Pace (speech rate)	153.81	28.63	33	243.64
	Pitch	162.75	25.59	122.02	226.64
	Loudness	66.55	5.1	52.98	86.15
Control Variables	Gender	0.75	0.43	0.00	1.00
Control Variables	Follower numbers	0.16	0.37	0.00	1.00

Empirical analysis

The Poisson regression statistics are shown in Table 2. These exhibit the distinct impacts of the speech voice cues on viewer in-consumption engagement. Importantly, the findings pointed to an inverted U-shaped relationship between the squared speech rate and viewer in-consumption engagement, supporting H1 (p < .05). Further, there was a negative relationship between pitch and viewer in-consumption engagement, supporting H2 (p < .01). Moreover, loudness had a negative relationship with viewer in-consumption engagement, supporting H3 (p < .01).

Table 2.

Poisson Regression.

Variables	Estimate	SE	Z Value	P Value
Constant	11.2508	0.113	99.633	.000***
Pace (speech rate)	−1.566e-06	7.56e-07	−2.072	.038**
Pitch	−0.0190	0.000	−39.801	.000***
Loudness	−0.0871	0.001	−74.458	.000***
Gender	−0.3025	0.030	−9.991	.000***
Follower number	0.7652	0.017	46.262	.000***

***

p < 0.01. **p < 0.05.

Robustness check

A robustness check was performed by employing a negative binomial regression model, due to the over-dispersion observed in the count-dependent variable. As shown in Table 3, the findings reaffirmed the hypothesized inverted U-shaped relationship between pace (speech rate) and viewer in-consumption engagement (β3 = −7.095e-06, p < .01). Pitch (p < .01) and loudness (p < .01) also produced significant negative effects, which further demonstrated the robustness of the results.

Table 3.

Negative Binomial Regression.

Variables	Estimate	SE	Z Value	P Value
Constant	11.7476	0.424	27.719	.000***
Pace (speech rate)	−7.095e-06	2.6e-06	−2.734	.006***
Pitch	−0.0277	0.002	−17.694	.000***
Loudness	−0.0668	0.004	−15.513	.000***
Gender	−0.7034	0.097	−7.229	.000***
Follower number	0.8070	0.063	12.884	.000***

***

p < 0.01.

In summary, findings demonstrated support for an inverted U-shaped relationship between pace (speech rate) and viewer in-consumption engagement (H1). This underlines that while a fast paced-speech rate may capture attention and encourage interaction, an overly rapid pace could overwhelm viewers, may overwhelm viewers and thereby diminish engagement (Y. Wang, Yang et al., 2024). Additionally, the research found negative associations between both pitch and loudness of travel live streamers with viewer in-consumption engagement (H2 and H3 respectively). This highlights that in travel live streaming, within the range audible to people (e.g. 20–20,000 Hz), viewers prefer a calmer and less strident speech voice (X. Wang et al., 2021). A lower pitched voice is perceived to be more pleasant and appeasing, furthering an inviting and engaging real-time environment for viewers (Guyer et al., 2019; Y. Wang, Yang et al., 2024). Similarly, the negative effect of loudness suggests preference for a softer and soothing speech voice, facilitating sustained viewer in-consumption engagement.

Study 2

Study 1 revealed broad trends and patterns through real-time data analysis. Study 2 set out to explore the underlying mechanisms of how speech voice cues influence viewer in-consumption engagement, probing the empirical insights identified in Study 1. To achieve this, semi-structured interviews are used to collect qualitative data that complements the quantitative data acquired in Study 1.

Data collection

The interviewees were viewers who engaged in travel live streaming on Chinese TikTok for over 2 hours each month. In the interviews, interviewees were asked to recount their most recent experience with watching travel live streaming. Then, they were invited to comment on how the speech voice cues of live streamers impacted their engagement. In total, 16 interviews were conducted with eight males and eight females, ranging from 20 to 55 years old (see Table 4). The interviews were stopped when saturation was reached—no new information emerged (M. Cheng & Wong, 2014). Each interview lasted approximately 20 minutes.

Table 4.

Demographic Profile of Interviewees.

	Number	Percentage (%)
Gender
Male	8	50
Female	8	50
Age
20–35	11	69
Over 35	5	31
Education
Bachelor’s degree or above	10	63
Bachelor below	6	37

Thematic analysis

A primary consideration of research rigor in qualitative content analysis is trustworthiness (Graneheim & Lundman, 2004). Trustworthiness comprises four key components: (1) credibility; (2) transferability; (3) dependability; and (4) confirmability (Lincoln & Guba, 1985). Each component underlines specific procedures and strategies that are required to produce credible findings. A standard thematic analysis was used. Following an initial coding, preliminary themes were generated, as shown in Appendix A. The derived themes were checked against the codes and original data, defined, and formally named (Braun & Clarke, 2006). This corroborates observations made by Braun and Clarke (2006) that the analysis of interview data is a recursive or iterative process, rather than a linear one. Adopting this approach meant that there was continuous comparison and analysis of data at various stages of the study. As suggested by Corbin (1998), the coding process was initiated by developing internal codes, with similar sequences of codes eventually organized into higher-level themes. The interpretation of the study results was then expanded and refined. Through multiple iterations, the study findings were established. This process reduces researcher bias and ascertains that interpretations are a result of the travel live streaming speech voice cues and the phenomena under investigation (M. Cheng & Wong, 2014).

From the thematic analysis, three underlying themes emerged that included piecemeal habits, voice-destination consistency, and perceived credibility. The interactions of pace, pitch, and loudness levels suggest a complex and dynamic interplay, each potentially altering viewer perception and their ensuing commentary during travel live streaming (Appendix B).

Findings

Piecemeal habits

Piecemeal habits describe the practice of addressing tasks or issues in small, incremental segments rather than in a comprehensive or continuous manner (Janiszewski & Laran, 2024). Due to the faster pace of life in China, viewers tend to use intermittent viewing periods to watch travel live streaming. As such, there is an increasing propensity among individuals for swift access to information (X. Liu et al., 2022), which leads to a desire for rapid information delivery in travel live streaming. The need for prompt and expedient information access necessitates travel live streamers to adeptly balance the speech rate of their communication. Interviewees reported their preference for a nuanced dynamic. This underlines that a soft and moderately paced speech rate facilitates efficient information transfer, whereas an excessively rapid paced speech rate hinders viewer comprehension and information retention. Conversely, a slow-paced speech rate does not align with viewer aspirations for quick knowledge acquisition about tourism destinations.

I like to watch travel live streaming in my intermittent viewing periods. . . . I hope to receive information quickly, but if it is too fast, I find it hard to understand the content. (Interviewee 12, F, 29)

I do not want to spend much time planning my trips, so I prefer to gather information and ask travel questions from travel live streamers during my intermittent viewing periods. This requires travel live streamers to speak at a reasonable pace, and speaking too loudly can give people headaches. (Interviewee 2, M, 25)

Voice-destination consistency

Voice-destination consistency advocates that when the voices projected by the travel live streamer and the tourism destination are congruent with each other, this reflects the genuineness and sincerity of the content (J. A. Lee & Eastin, 2021). Interviewees contended that an overly loud and high-pitched voice creates a dissonance between viewer perception of the tourism destination and the travel live streamer. As most outdoor travel live streaming tends to focus on nature, it follows that a peaceful and relaxing voice would be most appealing to viewers. On the contrary, excessive volume and harsh sounds can make viewers feel incongruent with the natural scenery. This imbalance detracts from the ability of viewers to assimilate the voice-visual content cohesively, ultimately diminishing their engagement and willingness to further engage.

Notably, distinctive contrasts were observed between live e-commerce and travel live streaming in relation to viewer in-consumption engagement. While live e-commerce often employs a rapid and heightened voice to incite purchase intention (Hilvert-Bruce et al., 2018; Meng et al., 2021), such an approach appears to be invasive, jarring, and less impactful in travel live streaming. This difference underscores the unique context of travel live streaming in destination marketing. Interviewees mentioned that watching travel live streaming affords an immersive experience, inviting viewers to learn more about tourism destinations and glean insights of their authenticity. This accentuates the need for live streamers to exhibit more nuanced speech voice cues in tourism environmental settings that stimulate viewer in-consumption engagement.

If I am shopping, I find that when live streamers speak fast and introduce products quickly, it makes me feel like buying impulsively. . . . But for travel live streaming, I am seeking an experience, wanting to see some real natural scenery. A low pitch, calming voice aligns well with the beauty of mountains and rivers. (Interviewee 1, M, 27)

Travel live streaming is meant to create a more immersive experience, and making it feel real is the most important part. I enjoy watching the live streaming of wildlife. . . . Animals are often very scared of loud noises. If a travel live streamer speaks in a very loud voice, I feel like the live streaming is generated by AI. So, it is important to maintain consistency between the scene and the speech voice cues. (Interviewee 3, M 24)

Perceived credibility

Perceived credibility considers the trustworthiness, reliability, and honesty of a source (Filieri et al., 2023). Interviewees cited a preference for a deeper pitched speech voice, associating a lower pitch with greater credibility and assurance. This supports prior literature that a high-pitched voice is perceived as indicative of apprehension and a lack of confidence (e.g., Guyer et al., 2019). Further, interviewees noted that excessive loudness in the travel live streamer’s voice detracts from the viewer experience, making it difficult to immerse themselves fully in the virtual exploration of the travel destination. Interviewees deemed that a modulated and softer speech pitch is more conducive as it enables viewers to quickly immerse themselves in the experience. However, interviewees conceded that it was crucial to maintain voice loudness at a level that is audible to viewers, making the content more accessible and easier to process. The findings suggested that a relatively low pitch and loudness enhances the credibility and appeal of the content, significantly influencing viewer ability to immerse themselves in the virtual tourism experience.

There is a travel live streamer . . . who has this deep, booming voice, which I really like it. It makes me believe that the places he talks about are real. But later, when he started selling tickets, his speaking speed and volume suddenly increased. I felt like I was being tricked. (Interviewee 11, F, 29)

If the voice is sharp and loud, I always feel like it is covering something up, like all the information is fake. (Interviewee 12, F, 29)

Discussion

This research empirically examined the impact of speech voice cues, namely, pace, pitch, and loudness on viewer in-consumption engagement in travel live streaming. To do so, the research employed voice analytics on Chinese TikTok data and semi-structured interviews. In Study 1, voice analytics quantitatively found an inverted U-shaped relationship between pace (speech rate) and real-time viewer in-consumption engagement (the number of live comments in travel live streaming). Both excessively rapid and overly slow speech rates by travel live streamers had negative effects on viewer in-consumption engagement. This indicated that a moderately paced speech rate facilitates efficient information transfer. Study 1 also noted negative relationships between pitch and loudness with real-time viewer in-consumption engagement. Extremely high pitch and marked loudness in the voices of travel live streamers had negative impacts on viewer in-consumption engagement. This inferred that within the range of sounds audible to viewers, voices with lower pitch and modulated loudness are more impactful in triggering viewer in-consumption engagement. In Study 2, semi-structured interviews qualitatively corroborated the empirical findings in Study 1. The interviews probed and identified three underlying mechanisms, namely, piecemeal habits, voice-destination consistency, and perceived credibility.

Theoretical Implications

This research advances existing tourism knowledge in two important ways. First, it extends signaling theory from traditional, static marketing contexts (e.g., C. Li et al., 2017; Smith & Font, 2014) into the dynamic, real-time environment of travel live streaming. Unlike pre-recorded advertisements or promotional videos, live streaming requires viewers to interpret speech voice cues instantaneously, highlighting the unique real-time interplay between streamers and viewers. This research specifically identifies an inverted U-shaped relationship for speech pace and negative relationships for pitch and loudness as influential signals affecting viewer in-consumption engagement. This provides deeper theoretical insights into how subtle voice variations can strategically influence viewer behaviors in the immediacy of a live environment.

Second, this research provides a novel theoretical perspective by clarifying the distinctions between speech voice cues used in travel live streaming versus live e-commerce contexts. While e-commerce streamers typically employ rapid, loud speech to stimulate immediate purchasing behaviors (Lin et al., 2021; L. Liu et al., 2023), travel live streamers engage viewers differently. Travel viewers seek immersive and authentic experiences rather than rapid sales pitches, prompting streamers to adopt a moderate pace, softer loudness, and deeper pitch to build credibility, maintain viewer attention, and enhance immersion (M. Li et al., 2023). By examining the underlying mechanisms—namely piecemeal habits, voice-destination consistency, and perceived credibility—this study explains precisely how and why these vocal strategies enhance real-time viewer engagement, further enriching theoretical understandings of travel live streaming.

Methodological Implications

This research is innovative in its methodological approach, which analyses the dynamics of travel live streaming. By considering real-time data from the speech voice cues of travel live streamers and the number of live comments from viewers, this approach allows for a more nuanced understanding of the immediacy and fluidity inherent in travel live streaming. Traditional methods commonly rely on post-event data, which often fail to capture real-time interactions (Barnes, 2024). In contrast, the novel use of voice analytics to assess viewer live comments in this research offers unique and dynamic insights into the real-time impacts of travel live streamers. Moreover, the meticulous methodological process for voice analytics detailed in the research proposes a baseline for subsequent dynamic voice research. This research method shows the way to the use of dynamic and multimodality data in future tourism research (M. Cheng, 2025).

Practical Implications

This research highlights three important practical implications. Due to the significance of voice dynamics in capturing viewer in-consumption engagement, the research vitally articulates how the development and integration of voice detection technology on live streaming platforms may be considerably refined. First, for travel live streamers, this technology offers more precise guidance, empowering them to adjust their speech dynamics of pace, pitch, and loudness in real-time. By moderating their speech voice cues, travel live streamers can expect to attract and maintain viewer in-consumption engagement.

Second, for platform operators, the precise analysis of speech voice cues provides critical insights into which travel live streaming sessions are most effective at engaging viewers. This enables platform operators to strategically select and support sessions to enhance the overall viewer experience. By directing resources that develop and promote popular sessions, platforms can expect to increase user engagement on their platform.

Third, destination marketing organizations could use the research findings to reframe and refine their promotion strategies. A first step may be recruitment of and collaboration with travel live streamers. As viewer in-consumption engagement is indicative of popularity (Guo et al., 2022; Holiday et al., 2023), advanced voice detection technology provides destination managers with criteria for selecting travel live streamers who can most effectively engage viewers. The findings have wider implications that extend beyond the context of travel live streaming. Destination managers may want to observe speech voice cues and their underlying mechanisms in the broader field of travel promotions, which can guide the vocal design of their travel advertisements and videos.

Limitations and Future Directions

This research is not without limitations. Non-verbal communication encompasses a multi-dimensional framework that considers appearance, kinesics, proxemics, and paralanguage (Sundaram & Webster, 2000). The research focuses on the the paralanguage of pace, pitch, and loudness because these are the three fundamental indicators of human speech voice. A future research agenda will need to take into account the interplay between these indicators and other non-verbal cues, to provide a more comprehensive understanding of communication dynamics.

While the influence of voice qualities has been substantiated in Mandarin (S. Liu et al., 2020), Dutch (De Waele et al., 2019), and English-speaking regions (Rodero, 2020), cultural norms are likely to shape the perception and interpretation of non-verbal signals (Islam & Kirillova, 2020). Further studies may want to extend beyond the confines of Chinese social media platforms to other Western and Eastern live streaming ones. Such expansion would allow for a critical examination of the role that vocal attributes have in viewer in-consumption engagement across diverse cultural and digital environments. This broader scope is crucial in understanding the impact of cultural norms on the perception and interpretation of non-verbal cues.

Methodologically, although this research used voice analytics to identify the prevalent voice features of pace, loudness, and pitch, it excluded other vocal attributes, such as dialects. The choice of words or diction interpreted differently between dialects has potential to skew viewer in-consumption engagement (X. Wang et al., 2023). Further, potential interactions between different voice characteristics, such as tone, dialect, and pause, were not explored (Van Zant & Berger, 2020). Future research could adopt a more inclusive approach by incorporating a wider array of dialect variations and nuanced prosodic elements (e.g., intonation, stress, rhythm, and tempo). This would provide a more comprehensive understanding of the promising multifaceted nature of voice speech cues and their impacts on viewer in-consumption engagement. Moreover, an exploration into the synergistic effects of various voice features could reveal intricate interactions that significantly shape speech voice responses.

In its focus on the influence of speech voice cues on viewer in-consumption engagement in travel live streaming, this research did not control for the attractiveness of tourism destinations and travel live streamers. Prior research suggests that attractiveness factors are significant influencers in the viewer in-consumption experience, impacting their perceptions of content and engagement levels (e.g. (Zhang & Prebensen, 2025)). This constraint limits the generalizability of the research findings. Going forward, future research may want to pursue the interactions between speech voice cues and attractiveness factors, investigating how they influence viewer in-consumption engagement differently across various types of content and amongst different viewer demographics. Moreover, employing experimental designs or longitudinal studies would help generate understanding about the causal relationships and dynamic changes in viewer in-consumption engagement over time (Zhu & Cheng, 2024).

Supplemental Material

sj-docx-1-jht-10.1177_10963480251352244 – Supplemental material for Decoding the Subtleties: Speech Voice Cues and Their Impacts on Viewer In-Consumption Engagement in Travel Live Streaming

Supplemental material, sj-docx-1-jht-10.1177_10963480251352244 for Decoding the Subtleties: Speech Voice Cues and Their Impacts on Viewer In-Consumption Engagement in Travel Live Streaming by Mengfan Li, Mingming Cheng and Vanessa Quintal in Journal of Hospitality & Tourism Research

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Mengfan Li

Mingming Cheng

Vanessa Quintal

Supplemental Material

Supplemental material for this article is available online.

Author Biographies

Mengfan Li (e-mail: mengfan.li@curtin.edu.au), is a PhD student at the Social Media Research Lab, School of Management and Marketing, Curtin University, Bentley, WA, Australia. Her research interest focuses on travel, live streaming, and social media marketing.

Dr. Mingming Cheng (e-mail: mingming.cheng@curtin.edu.au), is a Professor in Digital Marketing and Director of the Social Media Research Lab in the School of Management and Marketing at Curtin University, Bentley, WA, Australia.

Dr. Vanessa Quintal (e-mail: vanessa.quintal@cbs.curtin.edu.au), is an Associate Professor in Marketing at Curtin University, Bentley, WA, Australia. She has practitioner marketing experience in the meetings, hotels, food, fashion, entertainment, travel and education sectors in Asia-Pacific and Europe. Vanessa’s research interests lie in place personality, voluntourism motivation and tourist well-being.

References

Barnes

S. J.

(2024). Smooth talking and fast music: Understanding the importance of voice and music in travel and tourism ads via acoustic analytics. Journal of Travel Research, 63(5), 1070–1085. https://doi.org/10.1177/00472875231185882

Braun

Clarke

(2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101. https://doi.org/10.1191/1478088706qp063oa

Chattopadhyay

Dahl

D. W.

Ritchie

R. J.

Shahin

K. N.

(2003). Hearing voices: The impact of announcer speech characteristics on consumer response to broadcast advertising. Journal of Consumer Psychology, 13(3), 198–204. https://doi.org/10.1207/S15327663JCP1303_02

Chebat

J.-C.

Hedhli

K. E.

Gélinas-Chebat

Boivin

(2007). Voice and persuasion in a banking telemarketing context. Perceptual and Motor Skills, 104(2), 419–437. https://doi.org/10.2466/PMS.104.2.419-437

Chen

Shen

Wei

(2023). What reduces product uncertainty in live streaming e-commerce? From a signal consistency perspective. Journal of Retailing and Consumer Services, 74, Article 103441. https://doi.org/10.1016/j.jretconser.2023.103441

Chen

Tao

Zheng

Yang

(2025). What drives viewers’ engagement in travel live streaming: A mixed-methods study from perceived value perspective. International Journal of Contemporary Hospitality Management, 37(2), 418–443. https://doi.org/10.1108/IJCHM-01-2024-0115

Cheng

H. K.

Fan

Guo

Huang

Qiu

(2020). Can “gold medal” online sellers earn gold? The impact of reputation badges on sales. Journal of Management Information Systems, 37(4), 1099–1127. https://doi.org/10.1080/07421222.2020.1831776

Cheng

(2025). Social media and tourism geographies: Mapping future research agenda. Tourism Geographies. 27(3–4), 579-588. https://doi.org/10.1080/14616688.2024.2304782

Cheng

Wong

(2014). Tourism and Chinese popular nationalism. Journal of Tourism and Cultural Change, 12(4), 307–319. https://doi.org/10.1080/14766825.2014.914948

10.

Considine

Botti

Thomas

(2005). Design, format, validity and reliability of multiple choice questions for use in nursing research and education. Collegian, 12(1), 19–24. https://doi.org/10.1016/S1322-7696(08)60478-3

11.

Corbin

J. M.

(1998). The Corbin and Strauss chronic illness trajectory model: An update. Research and Theory for Nursing Practice, 12(1), 33–47.

12.

Creswell

J. W.

Creswell

J. D.

(2003). Research design: Qualitative, quantitative, and mixed methods approaches (2nd ed.). Sage Publications. https://books.google.com/books/about/Research_Design.html?id=nSVxmN2KWeYC

13.

De Waele

Claeys

A.-S.

Cauberghe

(2019). The organizational voice: The importance of voice pitch and speech rate in organizational crisis communication. Communication Research, 46(7), 1026–1049. https://doi.org/10.1177/0093650217692911

14.

Deng

Benckendorff

Wang

(2021). Travel live streaming: An affordance perspective. Information Technology & Tourism, 23(2), 189–207. https://doi.org/10.1007/s40558-021-00199-1

15.

Deng

Benckendorff

Wang

(2022). From interaction to relationship: Rethinking parasocial phenomena in travel live streaming. Tourism Management, 93, Article 104583. https://doi.org/10.1016/j.tourman.2022.104583

16.

Eisenhardt

K. M.

(1989). Building theories from case study research. Academy of Management Review, 14(4), 532–550. https://doi.org/10.5465/amr.1989.4308385

17.

Fan

Zhang

Rai

(2021). When should star power and eWOM be responsible for the box office performance? An empirical study based on signaling theory. Journal of Retailing and Consumer Services, 62, Article 102591. https://doi.org/10.1016/j.jretconser.2021.102591

18.

Filieri

Acikgoz

(2023). Electronic word-of-mouth from video bloggers: The role of content quality and source homophily across hedonic and utilitarian products. Journal of Business Research, 160, Article 113774. https://doi.org/10.1016/j.jbusres.2023.113774

19.

Filieri

Raguseo

Vitari

(2021). Extremely negative ratings and online consumer review helpfulness: The moderating role of product quality signals. Journal of Travel Research, 60(4), 699–717. https://doi.org/10.1177/0047287520916785

20.

Goodson

Phillimore

(2004). The inquiry paradigm in qualitative tourism research. In Phillimore

Goodson

(Eds.), Qualitative research in tourism: Ontologies, epistemologies and methodologies (pp. 48–63). Routledge. https://www.taylorfrancis.com/chapters/edit/10.4324/9780203642986-12

21.

Graneheim

U. H.

Lundman

(2004). Qualitative content analysis in nursing research: Concepts, procedures and measures to achieve trustworthiness. Nurse Education Today, 24(2), 105–112. https://doi.org/10.1016/j.nedt.2003.10.001

22.

Guo

Zhang

Wang

(2022). Way to success: Understanding top streamers’ popularity and influence from the perspective of source characteristics. Journal of Retailing and Consumer Services, 64, Article 102778. https://doi.org/10.1016/j.jretconser.2021.102778

23.

Guyer

J. J.

Fabrigar

L. R.

Vaughan-Johnston

T. I.

(2019). Speech rate, intonation, and pitch: Investigating the bias and cue effects of vocal confidence on persuasion. Personality and Social Psychology Bulletin, 45(3), 389–405. https://doi.org/10.1177/0146167218787805

24.

Hagtvedt

Brasel

S. A.

(2016). Cross-modal communication: Sound frequency influences consumer responses to color lightness. Journal of Marketing Research, 53(4), 551–562. https://doi.org/10.1509/jmr.14.0414

25.

Hall

J. A.

Horgan

T. G.

Murphy

N. A.

(2019). Nonverbal communication. Annual Review of Psychology, 70, 271–294. https://doi.org/10.1146/annurev-psych-010418-103145

26.

Hilvert-Bruce

Neill

J. T.

Sjöblom

Hamari

(2018). Social motivations of live-streaming viewer engagement on Twitch. Computers in Human Behavior, 84, 58–67. https://doi.org/10.1016/j.chb.2018.02.013

27.

Holiday

Hayes

J. L.

Park

Lyu

Zhou

(2023). A multimodal emotion perspective on social media influencer marketing: The effectiveness of influencer emotions, network size, and branding on consumer brand engagement using facial expression and linguistic analysis. Journal of Interactive Marketing, 58(4), 414–439. https://doi.org/10.1177/10949968231171104

28.

Islam

M. S.

Kirillova

(2020). Non-verbal communication in hospitality: At the intersection of religion and gender. International Journal of Hospitality Management, 84, Article 102326. https://doi.org/10.1016/j.ijhm.2019.102326

29.

Ivankova

N. V.

Creswell

J. W.

Stick

S. L.

(2006). Using mixed-methods sequential explanatory design: From theory to practice. Field Methods, 18(1), 3–20. https://doi.org/10.1177/1525822X05282260

30.

Janiszewski

Laran

(2024). A behaviorist perspective on how to address negative consumer behaviors. Consumer Psychology Review, 7(1), 98–115. https://doi.org/10.1002/arcp.1097

31.

Jiang

Pell

M. D.

(2017). The sound of confidence and doubt. Speech Communication, 88, 106–126. https://doi.org/10.1016/j.specom.2017.01.011

32.

Jung

H. S.

Yoon

H. H.

(2011). The effects of nonverbal communication of employees in the family restaurant upon customers’ emotional responses and customer satisfaction. International Journal of Hospitality Management, 30(3), 542–550. https://doi.org/10.1016/j.ijhm.2010.09.005

33.

Kirmani

Rao

A. R.

(2000). No pain, no gain: A critical review of the literature on signaling unobservable product quality. Journal of Marketing, 64(2), 66–79. https://doi.org/10.1509/jmkg.64.2.66.18000

34.

Kruger

Saayman

(2015). Consumer preferences of Generation Y: Evidence from live music tourism event performances in South Africa. Journal of Vacation Marketing, 21(4), 366–382. https://doi.org/10.1177/1356766715585903

35.

Lee

J. A.

Eastin

M. S.

(2021). Perceived authenticity of social media influencers: Scale development and validation. Journal of Research in Interactive Marketing, 15(4), 822–841. https://doi.org/10.1108/JRIM-06-2020-0112

36.

Lee

Y. H.

Lim

E. A. C.

(2010). When good cheer goes unrequited: How emotional receptivity affects evaluation of expressed emotion. Journal of Marketing Research, 47(6), 1151–1161. https://doi.org/10.1509/jmkr.47.6.1151

37.

Cui

Peng

(2017). The signaling effect of management response in engaging customers: A study of the hotel industry. Tourism Management, 62, 42–53. https://doi.org/10.1016/j.tourman.2017.03.009

38.

Cheng

Quintal

Cheah

(2023). From live streamer to viewer: Exploring travel live streamer persuasive linguistic styles and their impacts on travel intentions. Journal of Travel & Tourism Marketing, 40(8), 764–777. https://doi.org/10.1080/10548408.2023.2294071

39.

Cheng

Quintal

Cheah

(2025). Facial emotional expressions and real-time viewership in cycling travel live streaming: A mixed-methods approach. Journal of Hospitality and Tourism Management, 63, 223–235. https://doi.org/10.1016/j.jhtm.2025.04.006

40.

Gong

Gao

Yuan

(2021). Impacts of COVID-19 on tourists’ destination preferences: Evidence from China. Annals of Tourism Research, 90, Article 103258. https://doi.org/10.1016/j.annals.2021.103258

41.

Zhuang

Chen

(2019). A multi-stage hidden Markov model of customer repurchase motivation in online shopping. Decision Support Systems, 120, 72–80. https://doi.org/10.1016/j.dss.2019.03.005

42.

Liang

Huo

Luo

(2024). What drives impulsive travel intention in tourism live streaming? A chain mediation model based on SOR framework. Journal of Travel & Tourism Marketing, 41(2), 169–185. https://doi.org/10.1080/10548408.2024.2310175

43.

Lin

Yao

Chen

(2021). Happiness begets money: Emotion and engagement in live streaming. Journal of Marketing Research, 58(3), 417–438. https://doi.org/10.1177/00222437211002477

44.

Lincoln

Y. S.

Guba

E. G.

(1985). Naturalistic inquiry. Sage Publications. https://uk.sagepub.com/en-gb/eur/naturalistic-inquiry/book842

45.

Liu

Fang

Yang

Han

Hossin

M. A.

Wen

(2023). The power of talk: Exploring the effects of streamers’ linguistic styles on sales performance in B2B livestreaming commerce. Information Processing & Management, 60(3), Article 103259. https://doi.org/10.1016/j.ipm.2022.103259

46.

Liu

Gao

(2022). Which voice are you satisfied with? Understanding the physician–patient voice interactions on online health platforms. Decision Support Systems, 157, Article 113754. https://doi.org/10.1016/j.dss.2022.113754

47.

Liu

Zhang

Gao

Jiang

(2020). Physician voice characteristics and patient satisfaction in online health consultation. Information & Management, 57(5), Article 103233. https://doi.org/10.1016/j.im.2019.103233

48.

Liu

Zeng

Huang

(2022). Understanding consumers’ motivations to view travel live streaming: Scale development and validation. Tourism Management Perspectives, 44, Article 101027. https://doi.org/10.1016/j.tmp.2022.101027

49.

Chen

(2021). Live streaming commerce and consumers’ purchase intention: An uncertainty reduction perspective. Information & Management, 58(7), Article 103509. https://doi.org/10.1016/j.im.2021.103509

50.

Zhang

Yang

(2022). Exploring how live streaming affects immediate buying behavior and continuous watching intention: A multigroup analysis. Journal of Travel & Tourism Marketing, 39(1), 109–135. https://doi.org/10.1080/10548408.2022.2052227

51.

Mannell

R. C.

Iso-Ahola

S. E.

(1987). Psychological nature of leisure and tourism experience. Annals of Tourism Research, 14(3), 314–331. https://doi.org/10.1016/0160-7383(87)90105-8

52.

Mas

Bolls

Rodero

Barreda-Ángeles

Churchill

(2020). The impact of the sonic logo’s acoustic features on orienting responses, emotions and brand personality transmission. Journal of Product & Brand Management, 30(5), 740–753. https://doi.org/10.1108/JPBM-05-2019-2370

53.

Mavlanova

Benbunan-Fich

Lang

(2016). The role of external and internal signals in e-commerce. Decision Support Systems, 87, 59–68. https://doi.org/10.1016/j.dss.2016.04.009

54.

Meng

L. M.

Duan

Zhao

Lü

Chen

(2021). The impact of online celebrity in livestreaming e-commerce on purchase intention from the perspective of emotional contagion. Journal of Retailing and Consumer Services, 63, Article 102733. https://doi.org/10.1016/j.jretconser.2021.102733

55.

Miles

M. B.

Huberman

A. M.

(1994). Qualitative data analysis: An expanded sourcebook (2nd ed.). Sage Publications. https://vivauniversity.files.wordpress.com/2013/11/milesandhuberman1994.pdf

56.

Morgan

D. L.

(2014). Pragmatism as a paradigm for social research. Qualitative Inquiry, 20(8), 1045–1053. https://doi.org/10.1177/1077800413513733

57.

Naderi Varandi

Shaghaghi

Foroudi

Raghibdoust

Taghavifard

(2023). Memorable repetition: The role of letter repetition in brand names and social media marketing. New Media Studies, 9(33), 247–223. https://doi.org/10.22054/nms.2022.61553.1230

58.

Rodero

(2016). Influence of speech rate and information density on recognition: The moderate dynamic mechanism. Media Psychology, 19(2), 224–242. https://doi.org/10.1080/15213269.2014.1002942

59.

Rodero

(2020). Do your ads talk too fast to your audio audience? How speech rates of audio commercials influence cognitive and physiological outcomes. Journal of Advertising Research, 60(3), 337–349. http://doi.org/10.2501/JAR-2019-038

60.

Smith

V. L.

Font

(2014). Volunteer tourism, greenwashing and understanding responsible marketing using market signalling theory. Journal of Sustainable Tourism, 22(6), 942–963. https://doi.org/10.1080/09669582.2013.871021

61.

Sundaram

D. S.

Webster

(2000). The role of nonverbal communication in service encounters. Journal of Services Marketing, 14(5), 378–391. http://doi.org/10.1108/08876040010341008

62.

Tigue

C. C.

Borak

D. J.

O’Connor

J. J.

Schandl

Feinberg

D. R.

(2012). Voice pitch influences voting behavior. Evolution and Human Behavior, 33(3), 210–216. https://doi.org/10.1016/j.evolhumbehav.2011.09.004

63.

Gao

Zhang

L. Y.

Yang

J. Y.

(2023). The major development trends of China’s short video market in the next ten years. TVOVO.Com. https://www.tvoao.com/a/216236.aspx#

64.

Van Zant

A. B.

Berger

(2020). How the voice persuades. Journal of Personality and Social Psychology, 118(4), 661–682. https://doi.org/10.1037/pspi0000193

65.

Wang

Cheng

Jiang

(2023). The interaction effect of emoji and social media content on consumer engagement: A mixed approach on peer-to-peer accommodation brands. Tourism Management, 96, Article 104696. https://doi.org/10.1016/j.tourman.2022.104696

66.

Wang

Khamitov

Bendle

(2021). Audio mining: The role of vocal tone in persuasion. Journal of Consumer Research, 48(2), 189–211. https://doi.org/10.1093/jcr/ucab031

67.

Wang

Ruan

Yang

Qiu

Zhou

(2024). An auditory data analysis framework for tourism and hospitality research. Current Issues in Tourism, 27(6), 854–863. https://doi.org/10.1080/13683500.2023.2259571

68.

Wang

Yang

Wang

Zheng

Peng

(2024). How do voice characteristics affect tourism interpretation purchases? An empirical study based on voice mining. Journal of Travel Research, 63(2), 481–495. https://doi.org/10.1177/00472875221151070

69.

Zhang

Wang

Chen

(2020). Frontiers: In-consumption social listening with moment-to-moment unstructured data: The case of movie appreciation and live comments. Marketing Science, 39(2), 285–295. https://doi.org/10.1287/mksc.2019.1215

70.

Zhang

Prebensen

N. K.

(2025). Value co-creation in tourism live shopping. Journal of Business Research, 186, Article 114964. https://doi.org/10.1016/j.jbusres.2024.114964

71.

Zhu

Wang

(2022). How does government microblog affect tourism market value? The perspective of signaling theory. Information Processing & Management, 59(4), Article 102991. https://doi.org/10.1016/j.ipm.2022.102991

72.

Zhu

Cheng

(2025). What holds engagement: Conceptualisation of in-consumption engagement in pro-environmental tourism videos. Current Issues in Tourism, 28(5), 512–530. https://doi.org/10.1080/13683500.2025.2496330

73.

Zhu

Cheng

Wang

(2025). Viewer in-consumption engagement in pro-environmental tourism videos: A video analytics approach. Journal of Travel Research, 64(3), 716–735. https://doi.org/10.1177/00472875231219634

74.

Zhu

Cheng

(2024). Automatic videos analytics in tourism: A methodological review. Annals of Tourism Research, 108, 103800.

75.

Zougkou

Weinstein

Paulmann

(2017). ERP correlates of motivating voices: Quality of motivation and time-course matters. Social Cognitive and Affective Neuroscience, 12(10), 1687–1700. https://doi.org/10.1093/scan/nsx064