Sage Journals: Discover world-class research

Abstract

Travel and tourism advertising is critical in developing positive associations to attract visitor patronage and build sustainable post-pandemic tourism. An important part of an advertising message is delivered through voice and other aspects of audio, for example, music. However, how audio features impact viewers of travel and tourism advertisements remains unexplored in the research. This study implements advanced audio analytics to test how various features impact upon viewers. The results show that voice quality is important in developing positive affect; advertisement viewers prefer speakers with quieter voices (less mean intensity) that have a higher level of clarity (higher harmonics-to-noise ratio). This can be explained via the heuristic route of the heuristic-systematic model. Music tempo was found to be important in stimulating reactions from advertisements, with faster music being associated with a higher level of positive affect. The paper concludes with practical and theoretical implications, limitations, and suggestions for future research.

Keywords

audio analytics video analytics machine learning voice quality music tempo

Introduction

“The tongue can paint what the eyes can’t see.” Chinese Proverb.

Our voices are a powerful instrument for communication and persuasion. Of course, a large element of this is in communication using language. However, the voice is not limited to merely verbal communication, non-verbal communication in voices is also an important way to convey meaning in a message and persuade listeners (Van Zant & Berger, 2020). How a speaker says something may be as relevant as what is being said (Tracy et al., 2011). Since the human voice is one of the most important carriers of human emotions, such paralanguage is an essential element in affective communication. Individuals rely on nonverbal cues such as pitch, loudness, and aspects of voice quality to appraise a speaker’s emotion (Juslin & Laukka, 2003). In advertising, voice and music are pivotal in creating a persuasive message for radio listeners, television viewers, and of course, online viewers (Chattopadhyay et al., 2003; Dubey et al., 2018; Martín-Santana, Muela-Molina et al., 2015; Rodero et al., 2017; Wang et al., 2021). However, voice has been neglected as an area of research in marketing and management disciplines (Dubey et al., 2018; Gélinas-Chebat & Chebat, 1996; Hildebrand et al., 2020), partly due to the difficulty in analyzing voices and their effects in a realistic, scalable, and reliable way.

Travel and tourism advertising play an important role in developing a sustainable tourism economy, and this is particularly pertinent in the post-pandemic world as economies seek to rebuild. In the new e-tourism environment that in being created there is the potential to transform tourism (Fuchs, 2022; Gretzel et al., 2020) and refocus marketing more on priorities of sustainability than of profit (Jamrozy, 2007; Truong & Hall, 2017). Ideally, goals of business value and socio-economic value should be combined—a triple bottom line of people, planet and profit (Bhattacharya, 2016). Information and communications technologies in tourism firms are ubiquitous, but they have suffered from an ICT Productivity Paradox and new approaches are needed (Fuchs & Sigala, 2022). Evidence suggests that the pandemic has shifted travel and tourism advertising to digital, with a 70% digital advertising spend in 2023, high online sales of 32% (vs. 20% for retail as a whole), and advertising in 2023 moving back to pre-pandemic levels of around $19.6 billion in the US alone (Zenith, 2021). With the increasing dominance of digital advertising, their value for transformative e-tourism is stronger than ever (Gretzel et al., 2020). There is the potential to develop more responsible, social and ecological tourism (Bandyopadhyay & Ray, 2019; Higgins-Desbiolles, 2020). The corollary is that ads must be more effective than ever to enable the desired attitudinal and behavioral change for transformative e-tourism. The good news is that digital advertising and its effects can be captured as data in a variety of places, opening up the opportunity to analyze and improve offerings using new and more advanced methods.

Current advances in data analytics techniques, including those based on artificial intelligence and machine learning, have provided new impetus to the investigation of audio features and their effect on consumers (Hildebrand et al., 2020). Existing research based on human coding has recognized its limitations and there have been calls for more research utilizing machine coding of video content (Tellis et al., 2019). Notwithstanding, although heralded with enormous potential (Dawar, 2018), audio analytics is rarely used in travel and tourism research. Ngai and Wu (2022) go as far as to state that audio analytics are rarely used in the marketing discipline as a whole, and that available research is very limited, citing only two papers. Existing audio research is bedded in the use of experiments and field studies (Chattopadhyay et al., 2003; Huang & Labroo, 2020; Martín-Santana, Muela-Molina, et al., 2015; Martín-Santana, Reinares-Lara, et al., 2015; Rodero et al., 2017), rather than the examination of audio features from real, unstructured data and their relationships with actual human behavior. As yet, no known research has examined the relationships between the audio features of travel and tourism advertisements and individual outcomes using data analytics methods (J. Li et al., 2018).

This study creates and applies a novel methodology that combines machine learning algorithms, pre-trained deep learning models, and other statistical data analytics methods to test a research model for a number of existing and new theoretical relationships between voice, music, and positive affect from travel and tourism advertisements. In this research, based on the heuristic-systematic model and the musicscape model, a research model is developed and tested that examines a number of features of voice quality and of music in travel and tourism advertisements, while controlling for a variety of additional video features. This research uses YouTube video data from the monthly top-5 ads on AdForum since it began in 2013. The findings support a number of the theorized hypotheses and contribute directly to nascent theory regarding the impact of audio on individual behavior in travel and tourism research. From a practical perspective, the research identifies specific voice and music features that will have a greater impact on net positive affect among consumers from travel and tourism ads. Such features offer the opportunity to tourism marketers for developing more effective advertisements in the future. The contribution of the paper is discussed in more detail in the discussion and conclusions section.

The structure of this article is as follows. The next section examines the theoretical background for the study, introducing the research model and hypotheses. The third section explains the original research process employed in the investigation to extract and analyze the audio and other features examined. The penultimate section provides the results of testing the research model and hypotheses. The final section discusses the results and explores their implications for research and practice, noting study limitations, and making overarching conclusions on the study.

Literature Review and Conceptual Model

Heuristic-Systematic Model/Elaboration Likelihood Model

The elaboration likelihood model (ELM) posits that the persuasion of a consumer can occur through two distinct routes, the central route and the peripheral route (Petty & Cacioppo, 1986; Petty et al., 1983). The central route requires considerable cognitive effort on the part of the consumer to process the information contained in an advertisement, drawing on previous knowledge and experience, and elaborating on the information in the message conveyed. Processing may be either objective (evaluating a message on its actual merits) or biased (prior opinion influences attitude to be consistent, via support arguments, or counter arguments if not). In the peripheral route, when elaboration likelihood is low, individuals do not process message content to any great extent and instead seek aspects of a message that are non-content based. Such aspects can include music, voice, expertise, likability, quality, design elements, attractiveness, and so on.

The heuristic-systematic model (HSM) of persuasion is almost identical to the ELM but was developed slightly earlier. It is a dual-processing model whereby individuals process messages through either systematic processing or heuristic processing routes (Chaiken, 1980). Systematic processing involves cognitive effort via careful and deliberate processing of messages, similar to the central route in ELM. It values message content and source reliability in cognitive processing. This involves analysis and judgment from systematic processing. Heuristic processing relies on simple decision rules or heuristics to rapidly assess message content. This equivalent to peripheral processing in ELM. Since people tend to minimize the use of their cognitive resources, this route can be quite popular. In video ads, this might include heuristics for source attractiveness, credibility, video quality, length of video, music characteristics, voice characteristics, video production characteristics, and presence of various objects in a video, such as faces, animal or brand logos. This research adopts HSM as its underlying theory. In particular it is used to explain the importance of voice characteristics and music features—through the heuristic route—in influencing positive affect on viewers of advertisements.

Mehrabian-Russell and Musicscape Models

In addition to HSM, the research is influenced by the Mehrabian-Russell model and Musicscape model. The Mehrabian-Russell model (Mehrabian & Russell, 1974) is a classic model in environmental psychology and is better known as the Stimulus-Organism-Response (S-O-R) model. The three structures in the model, stimulus, organism, and response together determine behavioral outcomes for individuals. Stimulus represents the influence on an individual—external forces in the external environment that impact upon a person, including sensory, perceptual, affective, cognitive and physiological stimuli. Organism refers to the internal state of an individual, the emotional and psychological stability that may be influenced by a stimulus. Finally, response is the change in behavior exhibited as a result of the change in internal state of the person, the action that is taken as a result. Together the model explains how elements in an advertisement, such as music, may influence cognitive and affective outcomes for consumers of ads in the tourism industry. Indeed, Trompeta et al. (2022) recently conducted a substantive meta-analysis and found significant cognitive and affective influences of music (and different elements therein) on individuals in tourism and hospitality settings.

Bitner (1992) developed the servicescape model based on the S-O-R model. The “servicescape” refers to tangible and intangible elements of an environment with which employees and customers interact. Bitner’s (1992) servicescape model includes various characteristics in a consumption environment (stimuli) that impact upon consumers emotional and psychological state, subsequently determining their behavior, classified as approach or avoidance. The musicscape model transfers the notion of the servicescape into a music-centric environment. Music can have a significant effect on affective responses to advertising (North et al., 2004; Oakes & North, 2006). Oakes (2000) original model suggests an interaction between music compositional attributes (e.g., volume, tempo, and style) and moderators (e.g., age, gender, and social class), which in turn influence cognitive (e.g., expectations), affective (e.g., positive or negative emotion), and behavioral outcomes (e.g., spending, return intention, and length of stay). Further research has identified a larger range of attributes of music that may influences consumers in advertising settings, including, for example Oakes (2007) categorizations into mood, genre, repetition, score, association, tempo, valence, image, semantics, and timbre. A considerable body of research has supported the influence of music on consumers in various settings, including retail (Garlin & Owen, 2006; Roschk et al., 2017; Turley & Milliman, 2000), hotels (Magnini & Parker, 2009), and restaurants (Harrington et al., 2015; Milliman, 1986; North et al., 2003; Novak et al., 2010; Wilson, 2003). More recently, Trompeta et al. (2022) conducted a comprehensive meta-analysis that confirmed the affective, cognitive, and behavioral responses of consumers to characteristics of music in tourism and hospitality settings (Trompeta et al., 2022).

Voice Analytics

Voice analytics is a part of the broader field of audio analytics. Audio analytics examines soundwaves, which vary along four principal dimensions: time, frequency, amplitude, and spectrum (Sueur, 2018). Time refers to soundwave duration. Frequency (in Hertz) or pitch measures oscillations per second of soundwaves, ranging from high to low pitch. Humans are typically limited in their aural capabilities and tend to hear pitches from 20 Hz up to 20 kHz, varying over time with age. Amplitude evaluates sound intensity, sometimes also referred to as power or loudness, typically calculated in decibels (dB). Lastly, spectrum assesses acoustic perturbance or instability, measured using a variety of variables, which includes the evaluation of sound quality (e.g., harmonics-to-noise ratio (HNR), shimmer, jitter, and entropy, e.g., see Sueur, 2018). Each of these aspects may be measured using digital data from a video or audio advertisement. In this study, elements of each of these dimensions are measured and used in the analysis, focusing particularly on pitch, intensity, HNR, and duration (a control variable).

Hypotheses and Research Model

Voice Features

Pitch has been found to instill significant psychological reactions in people. Pitch in marketing communications can influence consumer perception (Hagtvedt & Brasel, 2016). Research has further shown that high-pitched voices are typically judged as less trustworthy, more nervous, less confident (Jiang & Pell, 2017), and less forceful or potent (Apple et al., 1979). On the other hand, in the political context, politicians with lower-pitched voices have an advantage in election campaigns (Tigue et al., 2012). This is typically because lower-pitched voices are perceived with greater persuasiveness, trustworthiness, and competence (Guyer et al., 2019; Oleszkiewicz et al., 2017). Similarly, low-pitched, dull, and smooth voices that are often associated with males, tend to be more persuasive, significantly driving perceptions of speaker warmth, attitude toward the speaker, and behavioral intentions (Zoghaib, 2019). Low-pitched voices also tend to induce distinctive, positive associations, attitudes, and higher brand recall, regardless of gender (Zoghaib, 2017). Based on the above, it is posited that:

H1. The mean pitch of voiced frames in an ad will be negatively related to positive affect.

It is important to mention that in this research we use the likes of a video as a measurable indicator of the positive affect of the video on the viewer. The neuropsychology literature states that “‘Liking’ refers to the hedonic impact of positive events, with underlying mechanisms that include hedonic hotspots in limbic brain structures that amplify’ liking’ reactions” and the authors “use the term ‘liking’ to refer to positive hedonic impact or core affective process within pleasure that can be objectively measured” (Nguyen et al., 2021). Thus, following this reasoning, we use likes as an indicator of positive affect from viewing the tourism advertisement,

In addition to mean pitch, another aspect of the pitch of a speaker that has been found to provide insight into the emotion and personality of a speaker is pitch variability, which is typically measured using the standard deviation of pitch. High pitch variability has been found to show consistently a relationship with high arousal emotions (Hildebrand et al., 2020). High pitch variability is associated empirically with many positive emotions such as happiness (Belin et al., 2008; Breitenstein et al., 2001; Jiang et al., 2015; Whiteside, 1999a, 1999b), joy (van Bezooijen, 1984; Waaramaa et al., 2010), elation (Whiteside, 1999a, 1999b), pleasure (Belin et al., 2008), and interest (van Bezooijen, 1984; Whiteside, 1999a, 1999b) in a speaker. This finding has been demonstrated consistently over many studies (Juslin & Laukka, 2003; Kamiloğlu et al., 2020). In some studies, pitch variability has shown associations with positive personality characteristics such as sociability (Burgoon et al., 1990), persuasiveness (Burgoon et al., 1990), and tenderness (Juslin & Laukka, 2003; Waaramaa et al., 2010). Notwithstanding, there are various studies that have found that some negative emotions can also be attributed through pitch variation, such as anger and fear (Breitenstein et al., 2001; Johnstone & Scherer, 2000; K. Scherer, 2003; K. R. Scherer & Giles, 1979). However, such emotions are unlikely to appear in travel and tourism advertisements. Thus, in this research it is hypothesized that:

H2. The standard deviation of pitch of voiced frames in an ad will be positively related to positive affect.

In terms of the psychological impact of voice, the acoustic power of a sound has an important effect on the establishment of consumer preferences and influences individuals’ emotional states. Biswas et al. (2019) find that ambient music that is less loud is likely to influence relaxation levels and a greater predilection for healthy foods. In obverse, sounds that are increasing in intensity tend to be perceived as approaching objects that demand increased vigilance due to the prospect of threat or danger (Bannister, 2020). These findings appear to ally with those of with Leongómez et al. (2021), whose results suggest that persons may modulate their voices strategically to convey nonverbal communication and individualities such as formidability and threat. More generally, research suggests that louder voices tend to be associated with aggression and anger, which is likely to evoke negative emotions in the receiver (Juslin & Laukka, 2003). Indeed, voices with greater sound intensity are construed as having greater extraversion and exerting greater dominance (K. R. Scherer & Giles, 1979; Tusing & Dillard, 2000). Thus, we would expect voice intensity to influence individuals’ affective response to an ad. The following is posited:

H3. The mean intensity of voiced frames in an ad will be negatively related to positive affect.

Voice quality has been measured in many ways in the scientific acoustic literature. One of the most common ways of measuring voice quality is the harmonics-to-noise ratio (HNR). In technical terms, HNR refers to the ratio between a speaker’s periodic (harmonic) and non-periodic (noise) parts of speech (Fernandes et al., 2018). The general idea is that a voice with more harmonic characteristics and less noise is characterized as being of a higher vocal quality. A meta-analytic study by Maryn et al. (2009) found that not only is the HNR the most commonly investigated parameter for voice quality in the literature, but it also explained a high degree of perceptual judgments of voice quality (ranging from r = .45 to r = .87, depending on the type of measures used in studies of sustained vowels). However, there is only a small amount of research that examines the relationship between HNR and emotions. A recent literature review by Kamiloğlu et al. (2020) found that higher HNR measured in a number of previous studies is associated with positive emotions of happiness, pleasure and interest. Based on on previous research, it is hypothesized that:

H4. The mean harmonics-to-noise ratio of voiced frames in an ad will be positively related to positive affect.

Music Features

Different features of music have been shown to have both direct and mediated effects on individual affective, cognitive, and behavioral outcomes in a variety of contexts (Trompeta et al., 2022). The tempo of background music in a store or in an ad provides a classic example of the use of the heuristic route of HSM in customer persuasion via sensory cues (Zhu & Meyers-Levy, 2005). Tempo refers to the beats per minute (BPM) of music. Physiologically, faster music has been found to have an impact on the human body; for example, it can lead to increases in heart rate, breathing rate, and blood pressure (Lundin, 1985). Research in a retail setting has shown that the tempo of background music influences one’s affective states and stimulates a larger positive affective response (Spangenberg et al., 2005). However, this is not always the case (Trompeta et al., 2022). Oakes (2003) finds that in festival event context, faster tempo music induces greater arousal and pleasure in contrast to slower music, which, as a result can influence a higher degree of impulse purchasing. Spence et al. (2019) reviewed the impact of music tempo on food and drink consumption and concluded that tempo of background music in bars and restaurants exhibits a form of “entrainment.” Indeed, Milliman (1986) found that faster music resulted in a 30% increase in spending as compared to slower music. Further studies have also confirmed this, and research has even found that faster music can alleviate some negative elements in the environment context (Knoeferle et al., 2017; Oakes & North, 2008). Pantoja and Borges (2021) examined video restaurant advertisements and found that faster music creates positive arousal which evokes positive taste expectations and greater purchase intentions in consumers. In sum, it is believed that tempo will exert a positive influence on positive affect. Therefore, it is stated that:

H5. The tempo of music during an ad will be positively related to positive affect.

Music tempo is only one feature of music that may have an association for individuals on an affective level (e.g., Caldwell & Hibbert, 2002; Feng et al., 2014). The effectiveness of music may be linked to the fit with the advertising message in other ways (Martín-Santana, Muela-Molina, et al., 2015; Martín-Santana, Reinares-Lara, et al., 2015; Zander, 2006). Research has demonstrated that musical genre can induce different types of emotions, including joy, amazement, power, and sadness (Zentner et al., 2008). One aspect in which there may be an important influence on positive affect is the key is which a piece of music is written. At the simplest level, we may be able to distinguish music that is written in major keys and minor keys. Kellaris and Kent (1992) found that time appears to move more slowly in music with a major key, which would appear to support the notion of slow tourism perceptions (Oh et al., 2016). Minor keys tend to be associated with negative emotion, particularly since they have learned associations with funerals and other grief-stricken contexts, dysphoria, and sad imagery (Handayani & Korstanje, 2018). On the other hand, major keys tend to be associated with positive emotions such as joy, happiness, hope, and triumph (Hevner, 1935). Therefore, it is posited that, in the context of travel and tourism advertisements:

H6. The major (as opposed to minor) key of music during an ad will be positively related to positive affect.

The hypotheses are combined into the research model shown in Figure 1. The hypotheses follow the heuristic route in the HSM model. The music features are further couched in the Musicscape Model. In addition to the specific hypotheses, the research model controls for several variables that might have an impact on positive affect, including the duration of each advertisement, the quality of the video (e.g., lighting and blurriness), the consistency of scenes in the ad, and socialness. All of these variables have been found to be significantly related to positive affect in previous advertising research.

Figure 1.

Research model.

Methodology

The specific steps undertaken in the research are delineated in Figure 2. The steps include: identifying the sample of ads for investigation, extracting video features, extracting audio files, pre-processing audio files for further analysis, extracting voice and music features, combining data sets, data preparation and calculation of the dependent variable, power analysis and examination of outliers, testing the research model via regression and examination of model diagnostics, and formal testing of the research hypotheses. We now detail each of the steps in the research process in turn.

Figure 2.

Summary of Research Process.

Sample Selection and Collection of Data

The initial step in the research process involved selecting a sample of travel and tourism advertisements for the study. The videos in the sample were considered best practice advertisements for travel and tourism and were sourced from Ad Forum (adforum.com). The potential sample included all ads in the database from October 2013 to November 2021 (n = 395). Videos were matched to YouTube via their title and content. A total of 291 matching videos were identified for the sample. Video data (e.g., likes and views) were downloaded using YT Tracker (as of the 4th of February, 2022). No videos had dislikes. Additional video characteristics (for the control variables) were analyzed using the Google Colab Notebook developed by Schwenzow et al. (2021). This additional analysis measured a number of features including: the quality of the videos via the focus of the frames using a Laplace filter to detect edges, which is related to proper lighting and blurriness (Pech-Pacheco et al., 2000); the proportion of frames with faces in the video measured using multi-task cascaded convolutional networks (Zhang et al., 2016); and mean scene similarity (e.g., colors, shapes, objects, textures, sizes and locations), from on the middle frame in each scene via the Siamese network approach and ResNet-152, a pretrained deep convolutional neural network based on ImageNet (He et al., 2016).

Data Pre-processing: Audio Data Format and Splitting

WAV files have far superior audio quality to MP3 files and are considered truer and more accurate to sound sources, although involving much larger files sizes. Therefore, the MP3 audio files were converted to WAV format to enable analysis of a richer set of audio data. Each audio sample was also converted from stereo to mono since the processing software applied in the study (Soundgen; Anikin, 2019) requires the audio files to be in a single-channel format. Voice and instrumental (music) characteristics of each ad were split using Ultimate Vocal Remover (Takahashi & Mitsufuji, 2017) and the UVR GUI package (https://github.com/Anjok07/ultimatevocalremovergui). Two videos were removed at this stage due to identification of a duplicate video or due to being too large for processing (n = 289).

Data Processing: Extraction of Audio Features

The tempo (in beats per minute or BPM) and the key (major or minor) of the music (non-voice) for each ad was calculated using Tunebat (http://tunebat.com/). Tunebat is used by many music professionals due to its high accuracy. The company also curates a large database of processed music files from Spotify as a source of reference.

Audio features were extracted using Soundgen (Anikin, 2019). Spectrograms of voice and music for a sample ad (“100% Pure New Zealand: Where One Journey Leads to Another”) are shown in Figures 3 and 4. The ad is 2 min long, as shown on the x-axis, Time. The pitch is shown in the y-axis (Frequency in kHz), from low to high. Voice frequency (Figure 3) has been capped at 4 kHz, below which all significant features appear, to show details more clearly. The upper (colorful) portion of each figure displays pitch, while the lower portion (black) details amplitude or loudness. The spectrogram illustrates important audio features via colors and symbols. The overall measured audio signal is shown by the colored pixels in the upper portion of the figure, with loudness ranging from blue (lowest) to red (highest). The green dots show pitch calculated via autocorrelation, whilst the red crosses denote the lowest dominant frequency. The final calculated pitch is indicated by the black dotted line.

Figure 3.

Spectrogram of voice in an ad.

Figure 4.

Spectrogram of music/accompaniment in an ad.

The spectrogram for the music of the ad is illustrated via the same coded features. As illustrated in the amplitude graph in the lower section of Figure 4, music appears throughout the whole ad, with very little variation in loudness.

Combining Data Sources, Data Preparation and Calculation of Dependent Variable

Each advertisement was coded with a unique identity number that was used to merge and join the various data sources. Thus, data on the voice features, music features, and other features were merged into a single data file.

The calculation of positive affect considered the total number of views of an ad, as well as the number of ad likes and dislikes (positive and negative votes). An examination of the data set found that the videos only had positive votes (likes), so consideration of negative votes was unnecessary. Since the distribution of the resulting variable was found to be highly positively skewed, a natural log transformation was also used. Positive affect for a given ad i is calculated as:

PosAf f_{i} = \ln (\frac{λ_{i}}{γ_{i}})

(1)

where $λ_{i}$ refers to the number of like votes for video i, and $γ_{i}$ refers to the number of views of the video since it was posted.

Descriptive statistics for the overall sample are shown in Table 1. This reports the raw data before standardization and data after standardization. The percentage of voiced frames ranged from 1.8% to 69.4%, with an average of 28.4% of voiced frames in an ad (SD = 0.15). The mean intensity of voice was 0.07 (SD = 0.04), with a mean Harmonics-to-Noise-Ratio of 9.29 (SD = 0.02). Mean pitch was 308.93 Hz (SD = 178.55 Hz), while mean standard deviation of pitch was 166.55 Hz (SD = 130.08 Hz). Regarding music features, the calculated tempo varied considerably from a minimum of 50 BPM to a maximum of 156 BPM, with a mean of 90 BPM (SD = 26.65 BPM), whilst 63% of music was in a major key. The length of videos ranged between 14.49 and 382.69 s, with a mean of 92.63 s or one-and-a-half minutes (SD = 64.16 s). The proportion of frames with faces ranged between 0% and 96.2%, with a mean of 33.7% (SD = 21.6%). Mean video quality was quite variable with a mean of 158.23 and a standard deviation of 118.81, while mean scene similarity was quite static (M = 72.7, SD = 2.3). After standardization of the independent variables, we can see that all variable means are 0 and all standard deviations are 1.

Table 1.

Descriptive Statistics of Sample.

Variable	N	Min.	Max.	Mean	Std. Dev.
Descriptive variables
Views	289	2	65,438,449	1,465,472.88	6,506,429.88
Likes	281	0	579,190	5,484.69	40,843.96
Proportion of voice frames	287	0.018	0.694	0.284	0.147
Dependent variable
Positive affect (Natural Log)	262	−12.392	−3.374	−5.619	1.592
Positive affect (Transformed)	289	−2.040	2.315	0.000	1.000
Raw variables
Pitch (mean)	287	131.658	1,774.821	308.926	178.548
Pitch (standard deviation)	287	25.608	1,023.258	166.552	130.082
Intensity (mean)*	287	0.000	0.204	0.065	0.043
Harmonics-to-noise-ratio (mean)	287	4.808	16.950	9.285	1.733
BPM	289	50	156	99.040	26.650
Key (Major/Minor)	289	0	1	0.630	0.484
Length of video	287	14.490	382.690	92.363	64.159
Video quality	287	13.287	662.513	158.233	118.812
Scenes similarity (mean)	275	0.576	0.806	0.727	0.023
Proportion of frames with faces	287	0	0.962	0.337	0.216
Standardized variables
Pitch (mean)	287	−0.993	8.210	0	1
Pitch (standard deviation)	287	−1.084	6.586	0	1
Intensity (mean)	287	−1.500	3.244	0	1
Harmonics-to-noise-ratio (mean)	287	−2.584	4.423	0	1
BPM	289	−1.840	2.137	0	1
Key (Major/Minor)	289	−1	1	0	1
Length of video	287	−1.214	4.525	0	1
Video quality	287	−1.220	4.244	0	1
Scenes Similarity (mean)	275	−7.324	2.461	0	1
Proportion of frames with faces	287	−1.558	2.891	0	1

Root mean square amplitude per frame.

The mean view count for the ads was 1.47 million (SD = 6.51 million), with a mean like count of5,484.69 (SD = 40,843.96). The natural log of net positive affect had a mean of −5.62 (SD = 1.59). The transformed dependent variable has a mean of 0 and a standard deviation of 1. The distribution of the regression standardized residual of the dependent variable is shown in Figure 5. The results are approximately normal in terms of skewness and kurtosis.

Figure 5.

Histogram of net positive affect (natural log) variable (outliers included).

The independent variables were standardized before use in the regression model to avoid any bias from the different units used. As a further safeguard for normality, the dependent variable was transformed using a Box-Cox transformation procedure with negatives allowed (Hawkins & Weisberg, 2017) and missing values were imputed with the mean, since less than 10% of the variables were missing (Peng et al., 2006).

Outliers and Power Analysis

Regression results may be vulnerable to bias from outliers, so it is prudent to conduct analysis with and without outliers and consider their impact on the research results (Barnett & Lewis, 1994). To examine outliers, Mahalanobis (1936) distance was calculated for the regression. A total of eight observations had a Mahalanobis distance with a probability less than .001 (J. Hair et al., 1998; Tabachnick & Fidell, 1996). Robustness tests were conducted further with and without the detected outliers.

A power analysis for t-tests in G*Power 3.1.9.7 (α = .05, power = 0.95) found that a sample of 265 (outliers removed) yields an effect size of 0.041, whilst a sample size of 273 (outliers not removed) has an explainable effect size of 0.040. Thus, the sample can explain small to moderate effect sizes for the variables in the model. An F-test power analysis found effect sizes of 0.093 (outliers not removed) and 0.095 (outliers removed) using the same parameters, again in the moderate to low range, indicating an acceptable sample size for testing the research model.

Examination of Model Diagnostics

Standard model diagnostics were applied to scrutinize the robustness of the final regression model. A normal P-P Plot of the regression standardized residual for positive affect (natural log) for the model shows that the residuals closely follow the diagonal line, with no noticeable pattern of strong deviation (see Figure 6a). Second, a scatterplot of the standardized residual against the standardized predicted value reveals no pattern in the residuals, indicating homoscedasticity and suggesting that the regression model is a good fit to the data (see Figure 6b).

Figure 6.

Model Diagnostics (outliers included). (a) Normal P-P Plot of Regression Standardized Residual of ln(Net Positive Affect). (b) Scatterplot of Regression Standardized Residual against Regression Standardized Predicted Value.

In order to assess possible multicollinearity problems in the data set, the variance inflation factor (VIF) was calculated and examined (Tables 3 and 4). In all cases, the VIF is below 10, suggesting multicollinearity is not a problem. As a further scrutinization of possible multicollinearity, condition indices of all variables in the model were determined. J. F. Hair et al. (2013) states that any condition indices above 30 must be inspected to determine if any pairs of variance decomposition proportions (VDP) appear above 0.9. No variables met this criterion, with the highest condition index in the model being 3.187 (outliers included) and 2.750 (outliers removed).

Results

In line with good practice, the regression analysis was performed with and without outliers to test the robustness of the results. ANOVA tests and the coefficients of determination for models with and without outliers are shown in Table 2. In the model with outliers included, 12.6% of the variance in PosAff is explained by the independent variables in the research model from Figure 1 (n = 273, F = 3.773, p < .001). The model without outliers had very similar results, with 12.5% of variance in the dependent variable explained (n = 254, F = 3.629, p < .001).

Table 2.

Model Comparison (With and Without Outliers).

Model	R-Squared	F-value	p-Value	n
1. Outliers not removed	.126	3.773	<.001	273
2. Outliers removed	.125	3.629	<.001	265

The results of testing the research model on the full sample are examined in blocks of variables for voice, music, and other features (Table 3). In terms of voice, two core variables being tested in the research model, the mean pitch and standard deviation of pitch had non-significant relationships with positive affect (β = −.011, p = .908 and β = −.046, p = .620, respectively). Notwithstanding, the two key voice quality variables, mean voice intensity and mean voice harmonic-to-noise ratio (HNR) are both significant determinants in the model: mean voice intensity is significantly negatively related to PosAff at the 1% level (β = −.214, p = .001), while the mean voice HNR is significantly positively related to PosAff at the 5% level (β = .150, p = .015). In terms of music, tempo was found to be significantly related to PosAff at the 5% level (β = .125, p = .039), although the specific key used was non-significant in the model. Finally, with respect to other features used as control variables, the length of video was found to be positively and significantly related to PosAff (β = .186, p = .003). However, video quality, mean scene similarity, and the proportion of frames with faces were not found to have a significant relationship with positive affect.

Table 3.

Results of Model Testing (Outliers Not Removed).

Variables	β	Std. Error	Std. β	t-Value	p-Value	VIF
Constant	−.014	0.059		−0.244	.807
Voice
Pitch (mean)	−.011	0.096	−.011	−0.116	.908	2.708
Pitch (standard deviation)	.046	0.093	.046	0.497	.620	2.551
Intensity (mean)	−.214	0.064	−.207	−3.329	.001	1.158
Harmonics-to-noise-ratio (mean)	.150	0.062	.148	2.437	.015	1.111
Music
BPM	.125	0.060	.123	2.074	.039	1.061
Key (Major/Minor)	.108	0.060	.107	1.786	.075	1.066
Other features
Length of video	.186	0.062	.184	2.98	.003	1.144
Video quality	.073	0.060	.073	1.216	.225	1.087
Scenes similarity (mean)	.068	0.063	.067	1.079	.282	1.172
Proportion of frames with faces	.010	0.062	.010	0.161	.872	1.081

The research model is re-examined with the outliers removed (see Table 4). The significance of voice variables in the model remains unchanged and the relationships are significant at similar levels: mean intensity (β = −.210, p = .002) and mean voice HNR (β = .166, p = .010). Tempo remains significant in the model (β = .128, p = .039). Mean pitch and standard deviation of pitch remain non-significant in the model. In terms of control variables, key, video quality, mean scenes similarity and the proportion of frames with faces remain non-significant. Length of the video remains significant in the model (β = .179, p = .005).

Table 4.

Results of Model Testing (Outliers Removed).

Variables	β	Std. error	Std. β	t-Value	p-Value	VIF
Constant	−.013	0.061		−0.209	.834
Voice
Pitch (mean)	−.012	0.147	−.007	−0.081	.936	2.068
Pitch (standard deviation)	.053	0.119	.036	0.447	.655	1.922
Intensity (mean)	−.210	0.067	−.202	−3.127	.002	1.213
Harmonics-to-noise-ratio (mean)	.166	0.064	.160	2.593	.010	1.102
Music
BPM	.128	0.062	.126	2.070	.039	1.072
Key (Major/Minor)	.101	0.062	.099	1.630	.104	1.063
Other features
Length of video	.179	0.063	.176	2.816	.005	1.137
Video quality	.078	0.062	.076	1.251	.212	1.081
Scenes similarity (mean)	.100	0.082	.079	1.214	.226	1.231
Proportion of frames with faces	−.006	0.068	−.005	−0.086	.932	1.184

Overall, the results of the analysis offer support for H3 (the mean intensity of voiced frames in an ad will be positively related to positive affect), H4 (the mean harmonics-to-noise ratio of voiced frames in an ad will be negatively related to positive affect), and H5 (the tempo of music during an ad will be positively related to positive affect), but not for H1 (the mean pitch of voiced frames in an ad will be negatively related to positive affect), H2 (the standard deviation of pitch of voiced frames in an ad will be negatively related to positive affect), or H6 (the major key of music during an ad will be positively related to positive affect). Positive affect is determined by key voice quality characteristics, mean voice intensity and mean HNR, but not by pitch. The implications of this finding are examined in the next section.

Gender was not considered a variable of interest in the study, rather the focus is on the mean pitch of an advertisement. However, as an additional robustness test, the moderating effect of the gender of voice on the relationship between mean pitch and positive affect was examined. Latinus and Taylor (2012) suggest that individuals are able to process and discriminate the gender of voice. To examine the moderating effect of gender on the relationship between mean pitch and positive affect, every ad was coded according to whether it had a distinct single gender for voice. Ads with no voice (music only) or an indistinct or mixed voice were coded as missing. The results of the test did not find a significant moderating effect of gender on the relationship between mean pitch and positive affect (full sample: R² = .02, moderator: β = .05, p = .899; without outliers: R² = .02, moderator: β = −.08, p = .850). As a result, gender was not considered further in the analysis.

Discussion and Conclusions

Audio advertising have become a vital part of marketing strategies for travel and tourism companies. In the post-pandemic world successful advertising has the potential to transform tourism (Fuchs, 2022; Gretzel et al., 2020) and combine goals of business value and socio-economic value—focusing on a triple bottom line of people, planet and profit (Bhattacharya, 2016). The ultimate aim must be to create more responsible, social and ecological tourism (Bandyopadhyay & Ray, 2019; Higgins-Desbiolles, 2020).

From the perspective of voice analytics, each video comes with a combination of various voice features, music background, and varied content. Travel and tourism brands leverage specific combinations of features to shape their unique identities. Notwithstanding, there is a dearth of studies exploring the efficacy of audio advertising features that employ advanced data analytics methods. This research provides empirical evidence to support theoretical relationships that have not been previously tested via actual travel and tourism video advertising data (or advertising video data more broadly). The findings affirm the beneficial outcomes that speakers with quieter voices (less intensity) and higher voice clarity (higher harmonics-to-noise ratio) bring to businesses through their positive affect from travel and tourism advertisements. The research has also demonstrated that ads with higher tempo music also tend to stimulate a greater level of positive affect from consumers. Notwithstanding, voice pitch (mean and standard deviation) and the major or minor key of music in an ad did not appear to have an impact on the positive affect of consumers.

Implications for Theory and Research

This research makes several theoretical and research contributions. First, this research contributes via the establishment of a novel methodology for conducting audio analytics research in travel and tourism, combining machine learning algorithms, pre-trained deep learning models, and other statistical methods. The comprehensive research process specified may be used in future research examining features of voice and audio in travel and tourism advertising. Contemporary developments in data analytics have brought considerable new opportunities for the analysis of a wide variety of types of unstructured data (Balducci & Marinova, 2018). Recent decades have seen a plethora of research examining textual data in marketing (Berger et al., 2020; Herhausen et al., 2020; Villarroel Ordenes et al., 2019). Notwithstanding, there is a paucity of research examining a broader range of media and modes—including audio, image, and video as well as text and numeric heuristics—to understand and predict consumers’ behaviors (Grewal et al., 2022). Several recent studies have begun to push boundaries by applying data analytics methods to videos in marketing (X. Li et al., 2019; Liu et al., 2018). However, in a recent review of machine learning research in marketing, Ngai and Wu (2022) go as far as to state that audio analytics are rarely used in the marketing discipline and that available research is very limited, citing only two papers. This research answers such recent calls for new audio analytics research to provide novel evidence of the effect of voice quality on net positive affect in travel and tourism advertisements and marketing more generally. It provides a clear blueprint for future studies of audio analytics in travel and tourism research.

Second, the research findings provide supporting evidence for the critical role that acoustic features of voices (e.g., intensity and harmonic-to-noise ratio) play in influencing consumer responses (Biswas et al., 2019; Hagtvedt & Brasel, 2016). Such features influence consumers via the heuristic route in the HSM model. We do so by analyzing the pitch, intensity, and quality of the actual audio in travel and tourism advertisements, and by examining data on consumer interactions with video ads. Louder voices (with greater sound intensity) tend to be associated more with negative emotions and perceptions in the literature. For example, greater sound intensity tends to be associated with aggression and anger (Juslin & Laukka, 2003; Moriyama & Ozawa, 2001), and has been linked to perceptions of greater dominance (K. R. Scherer & Giles, 1979; Tusing & Dillard, 2000). Louder voices are not a suitable element for inclusion in travel and tourism advertisements. This research has found that louder voices stimulate negative responses to travel and tourism ads. Research has shown that smoother voices are associated with more positive emotional responses in individuals. Kamiloğlu et al. (2020) found that higher HNR is associated with positive emotions of happiness, pleasure and interest. This research has found that smoother voices can contribute to developing a more positive response from a tourism advertisement. Overall, this study shows that travel advertisements with a lower intensity (less loudness) and greater harmonics-to-noise ratio produce more positive affect. This is an original contribution to knowledge. The corollary of this is that such advertisements are also likely to have further influences on behavior.

Third, the research demonstrates that the tempo of music can have an influence on consumer responses to advertisements via the musicscape model (and the heuristic route in the HSM model). Prior research acknowledges the impacts of music tempo on advertisement effectiveness. In additional to physiological effects (such as breathing and heart rates) (Lundin, 1985), faster music influences an individual’s affective states and stimulates a larger positive affective response (Spangenberg et al., 2005). The greater levels of arousal and pleasure in faster music can encourage a higher degree of expectations, consumption, purchasing, and impulse purchasing, via a form of “entrainment” (Pantoja & Borges, 2021; Spence et al., 2019). Moreover, music with a higher tempo can alleviate some negative elements in the environment context (Knoeferle et al., 2017; Oakes & North, 2008). This research supports prior research that has found a significant relationship between higher tempo and positive affect.

Implications for Practice

The findings of this research enable some strong practical recommendations for advertisers in the travel and tourism industry. First, the findings of the research have identified the important role of specific voice features related to the quality of the speaker in an advertisement, and these features enable advertisers to design more impactful advertisements that customers favor. For instance, a lower intensity, quieter voice, and a clearer voice with a higher harmonics-to-noise ratio will contribute positively to positive affect. Consumers tend to prefer less loudness and greater harmonics-to-noise ratio. These findings are useful to practitioners who may increase the positive affect among consumers by selecting voice actors for travel and tourism advertisements that are quieter and that have a better-quality voice with less additive noise, that is, very clear and relaxed speakers.

Second, this research established a relationship between music tempo and net positive affect. The higher the number of beats per minute for a travel and tourism advertisement, the greater the net positive affect from consumers. Faster music creates physiological effects in consumers (such as heart rate and breathing rate) and can generate more pleasure and arousal as compared to slower music and can stimulate an increased positive affective response (Spangenberg et al., 2005). This in turn can influence purchase intentions (Pantoja & Borges, 2021). When advertisers are trying to identify the most appropriate music to use in an ad, it is important to consider the use of faster music at an appropriate tempo. Trial and error may be able to identify the specific tempo of music that is best matched to particular brands and types of travel and tourism advertisement.

Ultimately, the hope is that powerful advertising underpinned by new analytics methods can be a tool for transformative advertising (Fuchs, 2022; Gretzel et al., 2020) that combines business value and socio-economic value (Bhattacharya, 2016). Advanced approaches to advertising may be able to break the ICT Productivity Paradox (Fuchs & Sigala, 2022) and build a more social, ecologically-mindful and responsible form of future tourism (Bandyopadhyay & Ray, 2019; Higgins-Desbiolles, 2020). For instance, by designing advertisements that include particular types of voices and music, it may be possible to persuade consumers to choose particular choices of travel and tourism products and services that are more in tune with transformative environmental and social trends. For example, an ad with a smooth, quiet, and clear speaker with faster music is likely to generate positive affect that may encourage consumers to choose vacations that immerse them within the arousal and excitement of a particular culture, such as understanding the worldviews of residents in favelas in Brazil or promoting an understanding of a Buddhism in a pilgrimage to Sri Lanka, pushing tourists out of their traditional “comfort zone” and enhancing the understanding of different cultures. Similarly, designing ads with the requisite characteristics may be able to entice visitors to make choices that are more ecologically friendly, such as the choice of a virtual reality experience of the Great Barrier Reef or Machu Pichu as opposed to a physical visit to the site.

Limitations and Future Research

Even though this research has established meaningful practical and theoretical implications, there is significant scope for future research to explore further the benefits of applying audio analytics methods. Several limitations of the current research are noteworthy. First, this study focused on travel and tourism advertisements as a holistic unit. Weng et al. (2021) have examined the differential effects of destination types (cultural vs. natural destinations) and the moderating influence of advertising format, such as video. The sample in this research was not large enough for such an analysis. Future research may examine whether the research findings are applicable for various subtypes of travel and tourism advertising, and if the effectiveness of voice quality in videos varies according to the nature of the travel and tourism advertisement.

This research focuses on specific acoustic features of voice. Many more features of voice can be measured using voice analytics and the methods outlined in this paper. Thus, future research is encouraged to investigate the effects of other dimensions of soundwaves. These studies may focus on other features of time, such as the mean length of an utterance (e.g., word), further features of spectrum, such as jitter and shimmer (Sueur, 2018), and additional potential features of voice quality, including cepstral peak prominence or relative mean perturbation (Maryn et al., 2009). More broadly, the combinations of different voice features may stimulate various images and perceptions in the mind of the consumer that may be linked to particular brands (Kim et al., 2021). Examining the gender of the speakers may also be an interesting avenue for future research (Latinus & Taylor, 2012). The corollary is that the creation of a unique “brand voiceprint” may be able to kindle consumer preferences toward the brand/product. This is likely to be a fruitful area for future research and it is recommended to explore the extent to which brand voiceprints that are developed are able to segment consumers and predict their preferences toward different product categories.

Footnotes

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Stuart J. Barnes

Author Biography

Stuart J. Barnes is a Professor of Marketing at Newcastle University Business School, Newcastle University in the United Kingdom. His research spans across academic disciplines, particularly those that combine technology, analytics, marketing, tourism, and/or sustainability. His work has been published more than 200 times, including in leading outlets such as the Journal of Travel Research, Tourism Management, Social Science and Medicine, Annals of Tourism Research, and the Information Systems Journal, among many others. Professor Barnes is a Fellow of the Royal Society of the Arts and the Royal Statistical Society. Stuart is rated in the top Business & Management researchers according to Research.com (#66 in UK, #405 in the World). He is reviewer for more than 10 international grant awarding bodies.

References

Anikin

(2019). Soundgen: “an open-source tool for synthesizing nonverbal vocalizations. Behavior Research Methods, 51(2), 778–792.

Apple

Streeter

L. A.

Krauss

R. M.

(1979). Effects of pitch and speech rate on personal attributions. Journal of Personality and Social Psychology, 37(5), 715–727.

Balducci

Marinova

(2018). Unstructured data in marketing. Journal of the Academy of Marketing Science, 46(4), 557–590.

Bandyopadhyay

Ray

(2019). Responsible Marketing: Can Social Enterprises Show the Way? Journal of Nonprofit & Public Sector Marketing, 31(2), 164–183.

Bannister

(2020). A vigilance explanation of musical chills? Effects of loudness and brightness manipulations. Music and Science, 3, 1–7. https://doi.org/10.1177/2059204320915654

Barnett

Lewis

(1994). Outliers in Statistical Data. John Wiley and Sons.

Belin

Fillion-Bilodeau

Gosselin

(2008). The Montreal Affective Voices: A validated set of nonverbal affect bursts for research on auditory affective processing. Behavior Research Methods, 40, 531–539.

Berger

Humphreys

Ludwig

Moe

W. W.

Netzer

Schweidel

D. A.

(2020). Uniting the tribes: Using text for marketing insight. Journal of Marketing, 84(1), 1–25.

Bhattacharya

C. B.

(2016). Responsible marketing: doing well by doing good. GfK Marketing Intelligence Review, 8(1), 8–17.

10.

Biswas

Lund

Szocs

(2019). Sounds like a healthy retail atmospheric strategy: Effects of ambient music and background noise on food sales. Journal of the Academy of Marketing Science, 47(1), 37–55.

11.

Bitner

M. J.

(1992). Servicescapes: The impact of physical surroundings on customers and employees. Journal of Marketing, 56(2), 57–71.

12.

Breitenstein

Lancker

D. V.

Daum

(2001). The contribution of speech rate and pitch variation to the perception of vocal emotions in a German and an American sample. Cognition and Emotion, 15(1), 57–79.

13.

Burgoon

J. K.

Birk

Pfau

(1990). Nonverbal behaviors, persuasion, and credibility. Human Communication Research, 17(1), 140–169.

14.

Caldwell

Hibbert

S. A.

(2002). The influence of music tempo and musical preference on restaurant patrons' behavior. Psychology and Marketing, 19(11), 895–917.

15.

Chaiken

(1980). Heuristic versus systematic information processing and the use of source versus message cues in persuasion. Journal of Personality and Social Psychology, 39(5), 752–766.

16.

Chattopadhyay

Dahl

D. W.

Ritchie

R. J. B.

Shahin

K. N.

(2003). Hearing voices: The impact of announcer speech characteristics on consumer response to broadcast advertising. Journal of Consumer Psychology, 13(3), 198–204.

17.

Dawar

(2018). Marketing in the age of Alexa. Harvard Business Review, 96(3), 80–86.

18.

Dubey

Farrell

Ang

(2018). How accent and pitch affect persuasiveness in radio advertising. In Cauberghe

Hudders

Eisend

(Eds.), Advances in advertising research IX (pp. 117–130). European Advertising Academy. Springer Gabler.

19.

Feng

Suri

Bell

(2014). Does classical music relieve math anxiety? Role of tempo on price computation avoidance. Psychology and Marketing, 31(7), 489–499.

20.

Fernandes

Teixeira

Guedes

Junior

Teixeira

J. P.

(2018). Harmonic to noise ratio measurement - selection of window and length. Procedia Computer Science, 138, 280–285.

21.

Fuchs

(2022). Reflecting on the ideology of a mechanist economic science: From ontological inconsistencies towards a transformative tourism science. In Femenia-Serra

Alzua-Sorzabal

Xiang

(Eds.), Transitioning towards the future of tourism destinations: Resilient, smart, and green development (pp. 45–79). Thomson Reuters.

22.

Fuchs

Sigala

(2022). Strategic use of Information Technologies in Tourism: A review and Critique. In Xiang

P. Z.

Fuchs

Gretzel

Höpken

(Eds.), (2022) Handbook of e-Tourism (pp. 1109–1145). Springer Nature.

23.

Garlin

F. V.

Owen

(2006). Setting the tone with the tune: A meta-analytic review of the effects of background music in retail settings. Journal of Business Research, 59(6), 755–764.

24.

Gélinas-Chebat

Chebat

J. C.

(1996). Voice and advertising: Effects of intonation and intensity of voice on source credibility, attitudes toward the advertised service and the intent to buy. Perceptual and Motor Skills, 83(1), 243–262.

25.

Gretzel

Fuchs

Baggio

Hoepken

Law

Neidhardt

Pesonen

Zanker

Xiang

(2020). E-Tourism beyond COVID-19: A call for transformative research. Information Technology & Tourism, 22, 187–203.

26.

Grewal

Herhausen

Ludwig

Villarroel Ordenes

(2022). The future of digital communication research: Considering dynamics and multimodality. Journal of Retailing, 98(2), 224–240.

27.

Guyer

J. J.

Fabrigar

L. R.

Vaughan-Johnston

T. I.

(2019). Speech rate, intonation, and pitch: Investigating the bias and cue effects of vocal confidence on persuasion. Personality and Social Psychology Bulletin, 45(3), 389–405.

28.

Hagtvedt

Brasel

S. A.

(2016). Cross-modal communication: Sound frequency influences consumer responses to color lightness. Journal of Marketing Research, 53(4), 551–562.

29.

Hair

Anderson

Tatham

Black

(1998). Multivariate data analysis (5th ed.). Prentice Hall.

30.

Hair

J. F.

Black

W. C.

Babin

B. J.

Anderson

R. E.

(2013). Multivariate data analysis: Advanced diagnostics for multiple regression [Online supplement]. Pearson. http://www.mvstats.com/Downloads/Supplements/Advanced_Regression_Diagnostics.pdf

31.

Handayani

Korstanje

M. E.

(2018). Virtual dark tourism: The role of sound branding and augmented reality for death sites. International Journal of Computational Methods in Heritage Science, 2(2), 1–17. https://doi.org/10.4018/IJCMHS.2018070101

32.

Harrington

R. J.

Ottenbacher

M. C.

Treuter

(2015). The musicscape model: Direct, mediating, and moderating effects in the casual restaurant experience. International Journal of Hospitality & Tourism Administration, 16(2), 99–121.

33.

Hawkins

D. M.

Weisberg

(2017). Combining the Box-Cox power and generalised log transformations to accommodate nonpositive responses in linear and mixed-effects linear models. South African Statistical Journal, 51, 317–328.

34.

Zhang

Ren

Sun

(2016, June 27–30). Deep residual learning for image recognition [Conference session]. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 770–778). IEEE.

35.

Herhausen

Emrich

Grewal

Kipfelsberger

Schoegel

(2020). Face forward: How employees’ digital presence on service websites affects customer perceptions of website and employee service quality. Journal of Marketing Research, 57(5), 917–936.

36.

Hevner

(1935). The affective character of the major and minor modes in music. The American Journal of Psychology, 47(1), 103–118.

37.

Higgins-Desbiolles

(2020). Socialising tourism for social and ecological justice after COVID-19. Tourism Geographies, 22(3), 610–623.

38.

Hildebrand

Efthymiou

Busquet

Hampton

W. H.

Hoffman

D. L.

Novak

T. P.

(2020). Voice analytics in business research: Conceptual foundations, acoustic feature extraction, and applications. Journal of Business Research, 121, 364–374.

39.

Huang

Labroo

A. A.

(2020). Cueing morality: The effect of high-pitched music on healthy choice. Journal of Marketing, 84(6), 130–143.

40.

Jamrozy

(2007). Marketing of tourism: A paradigm shift toward sustainability. International Journal of Culture Tourism and Hospitality Research, 1(2), 117–130.

41.

Jiang

Paulmann

Robin

Pell

M. D.

(2015). More than accuracy: Nonverbal dialects modulate the time course of vocal emotion recognition across cultures. Journal of Experimental Psychology Human Perception & Performance, 41, 597–612.

42.

Jiang

Pell

M. D.

(2017). The sound of confidence and doubt. Speech Communication, 88, 106–126.

43.

Johnstone

Scherer

K. R.

(2000). Vocal communication of emotion. In Lewis

Haviland

(Eds.), Handbook of emotion (2nd ed., pp. 220–235). New York.

44.

Juslin

P. N.

Laukka

(2003). Communication of emotions in vocal expression and music performance: Different channels, same code? Psychological Bulletin, 129(5), 770–814.

45.

Kamiloğlu

R. G.

Fischer

A. H.

Sauter

D. A.

(2020). Good vibrations: A review of vocal expressions of positive emotions. Psychonomic Bulletin & Review, 27(2), 237–265.

46.

Kellaris

J. J.

Kent

R. J.

(1992). The influence of music on consumers’ temporal perceptions: Does time fly when you’re having fun? Journal of Consumer Psychology, 1, 365–376.

47.

Kim

H. J.

Wang

Ding

(2021). Brand voiceprint. Customer Needs and Solutions, 8, 123–136.

48.

Knoeferle

K. M.

Paus

V. C.

Vossen

(2017). An upbeat crowd: Fast in-store music alleviates negative effects of high social density on customers’ spending. Journal of Retailing, 93, 541–549.

49.

Latinus

Taylor

M. J.

(2012). Discriminating male and female voices: Differentiating pitch and gender. Brain Topography, 25(2), 194–204.

50.

Leongómez

J. D.

Pisanski

Reby

Sauter

Lavan

Perlman

Varella Valentova

(2021). Voice modulation: From origin and mechanism to social impact. Philosophical Transactions of the Royal Society of London, 376, 20200386. https://doi.org/10.1098/rstb.2020.0386

51.

Tang

Wang

(2018). Big data in tourism research: A literature review. Tourism Management, 68, 301–323.

52.

Liu

Shi

S. W.

Teixeira

Wedel

(2018). Video content marketing: The making of clips. Journal of Marketing, 82(4), 86–101.

53.

Shi

Wang

(. (2019). Video mining: Measuring visual information using automatic methods. International Journal of Research in Marketing, 36(2), 216–231.

54.

Lundin

R. W.

(1985). An objective psychology of music. Ronald Press.

55.

Magnini

V. P.

Parker

E. E.

(2009). The psychological effects of music: Implications for hotel firms. Journal of Vacation Marketing, 15(1), 53–62.

56.

Mahalanobis

P. C.

(1936). On the generalized distance in statistics. Proceedings of the National Institute of Sciences, 2, 49–55.

57.

Martín-Santana

J. D.

Muela-Molina

Reinares-Lara

Rodríguez-Guerra

(2015). Effectiveness of radio spokesperson's gender, vocal pitch and accent and the use of music in radio advertising. Business Research Quarterly, 18(3), 143–160.

58.

Martín-Santana

J. D.

Reinares-Lara

Muela-Molina

(2015). Music in radio advertising: Effects on radio spokesperson credibility and advertising effectiveness. Psychology of Music, 43(6), 763–778.

59.

Maryn

Roy

De Bodt

Van Cauwenberge

Corthals

(2009). Acoustic measurement of overall voice quality: A meta-analysis. The Journal of the Acoustical Society of America, 126(5), 2619–2634.

60.

Mehrabian

Russell

J. A.

(1974). An Approach to environmental psychology. MIT Press.

61.

Milliman

R. E.

(1986). The influence of background music on the behavior of restaurant patrons. Journal of Consumer Research, 13(2), 286–289.

62.

Moriyama

Ozawa

(2001). Measurement of human vocal emotion using fuzzy control. Systems and Computers in Japan, 32, 59–68.

63.

Ngai

E. W. T.

(2022). Machine learning in marketing: A literature review, conceptual framework, and research agenda. Journal of Business Research, 145, 35–48.

64.

Nguyen

Naffziger

E. E.

Berridge

K. C.

(2021). Positive affect: Nature and brain bases of liking and wanting. Current Opinion in Behavioral Sciences, 39, 72–78.

65.

North

A. C.

Hargreaves

D. J.

MacKenzie

L. C.

Law

R. M.

(2004). The effects of musical and voice ‘fit; on responses to advertisements. Journal of Applied Social Psychology, 34(8), 1675–1708.

66.

North

A. C.

Shilcock

Hargreaves

D. J.

(2003). The effect of musical style on restaurant customers’ spending. Environment and Behavior, 35(5), 712–718.

67.

Novak

C. C.

La Lopa

Novak

R. E.

(2010). Effects of sound pressure levels and sensitivity to noise on mood and behavioral intent in a controlled fine dining restaurant environment. Journal of Culinary Science & Technology, 8(4), 191–218.

68.

Oakes

(2000). The influence of the musicscape within service environments. Journal of Services Marketing, 14(7), 539–556.

69.

Oakes

(2003). Demographic and sponsorship considerations for jazz and classical music festivals. Service Industries Journal, 23(3), 165–178.

70.

Oakes

(2007). Evaluating empirical research into music in advertising: A congruity perspective. Journal of Advertising Research, 47, 38–50.

71.

Oakes

North

A. C.

(2006). The impact of background musical tempo and timbre congruity upon ad content recall and affective response. Applied Cognitive Psychology, 20(4), 505–520.

72.

Oakes

North

A. C.

(2008). Using music to influence cognitive and affective responses in queues of low and high crowd density. Journal of Marketing Management, 24(5-6), 589–602.

73.

Assaf

A. G.

Baloglu

(2016). Motivations and goals of slow tourism. Journal of Travel Research, 55(2), 205–219.

74.

Oleszkiewicz

Pisanski

Lachowicz-Tabaczek

Sorokowska

(2017). Voice-based assessments of trustworthiness, competence, and warmth in blind and sighted adults. Psychonomic Bulletin & Review, 24(3), 856–862.

75.

Pantoja

Borges

(2021). Background music tempo effects on food evaluations and purchase intentions. Journal of Retailing and Consumer Services, 63, 102730. https://doi.org/10.1016/j.jretconser.2021.102730

76.

Pech-Pacheco

J. L.

Cristóbal

Chamorro-Martínez

Fernández-Valdivia

(2000, September 3–7). Diatom Autofocusing in Brightfield Microscopy: A Comparative Study [Conference session]. Proceedings of the 15th International Conference on Pattern Recognition, Barcelona, Spain, Vol. 3 (pp. 314–317). IEEE Computer Society.

77.

Peng

C. Y. J.

Harwell

Liou

S. M.

Ehman

L. H.

(2006). Advances in missing data methods and implications for educational research. In Sawilowsky

(Ed.), Real Data Analysis (pp. 31–78). Information Age.

78.

Petty

R. E.

Cacioppo

J. T.

(1986). Communication and persuasion: Central and peripheral routes to attitude change. Springer-Verlag.

79.

Petty

R. E.

Cacioppo

J. T.

Schumann

(1983). Central and peripheral routes to advertising effectiveness: The moderating role of involvement. Journal of Consumer Research, 10, 135–148.

80.

Rodero

Potter

R. F.

Prieto

(2017). Pitch range variations improve cognitive processing of audio messages. Human Communication Research, 43(3), 397–413.

81.

Roschk

Loureiro

S. M. C.

Breitsohl

(2017). Calibrating 30 years of experimental research: A meta-analysis of the atmospheric effects of music, scent, and color. Journal of Retailing, 93(2), 228–240.

82.

Scherer

(2003). Vocal communication of emotion: A review of research paradigms . Speech Communication, 40(1-2), 227–256.

83.

Scherer

K. R.

Giles

(1979). Social markers in speech. Cambridge University Press.

84.

Schwenzow

Hartmann

Schikowsky

Heitmann

(2021). Understanding videos at scale: How to extract insights for business research. Journal of Business Research, 123, 367–379.

85.

Spangenberg

E. R.

Grohmann

Sprott

D. E.

(2005). It’s beginning to smell (and sound) a lot like Christmas: The interactive effects of ambient scent and music in a retail setting. Journal of Business Research, 58(11), 1583–1589.

86.

Spence

Reinoso-Carvalho

Velasco

Wang

Q. J.

(2019). Extrinsic auditory contributions to food perception and consumer behaviour: An interdisciplinary review. Multisensory Research, 32(4-5), 275–318.

87.

Sueur

(2018). Sound analysis and synthesis with R. Springer.

88.

Tabachnick

B. G.

Fidell

L. S.

(1996). Using multivariate statistics (3rd ed.). Harper Collins.

89.

Takahashi

Mitsufuji

(2017, October 15–18). Multi-Scale multi-band densenets for audio source separation [Conference session]. Proceedings of the 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE.

90.

Tellis

G. J.

MacInnis

D. J.

Tirunillai

Zhang

(2019). What drives virality (sharing) of online digital content? The critical role of information, emotion, and brand prominence. Journal of Marketing, 83(4), 1–20.

91.

Tigue

C. C.

Borak

D. J.

O'Connor

J. J. M.

Schandl

Feinberg

D. R.

(2012). Voice pitch influences voting behavior. Evolution and Human Behavior, 33(3), 210–216.

92.

Tracy

D. K.

O’Daly

Michalopoulou

Lloyd

L. C.

Dimond

Matsumoto

Shergill

S. S.

(2011). It’s not what you say but the way that you say it: An fMRI study of differential lexical and non-lexical prosodic pitch processing. BMC Neuroscience, 12(1), 1. https://doi.org/10.1186/1471-2202-12-128

93.

Trompeta

M.-A.

Karantinou

Koritos

Bijmolt

T. H. A.

(2022). A meta-analysis of the effects of music in tourism and hospitality settings. Journal of Business Research, 138, 130–145.

94.

Truong

V. D.

Hall

C. M.

(2017). Corporate social marketing in tourism: To sleep or not to sleep with the enemy? Journal of Sustainable Tourism, 25(7), 884–902.

95.

Turley

L. W.

Milliman

R. E.

(2000). Atmospheric effects on shopping behavior: A review of the experimental evidence. Journal of Business Research, 49, 193–211.

96.

Tusing

K. J.

Dillard

(2000). The sounds of dominance. Vocal precursors of perceived dominance during interpersonal influence. Human Communication Research, 26, 148–171.

97.

van Bezooijen

(1984). Characteristics and recognizability of vocal expressions of emotion. Foris.

98.

Van Zant

A. B.

Berger

(2020). How the voice persuades. Journal of Personality and Social Psychology, 118(4), 661–682.

99.

Villarroel Ordenes

Grewal

Ludwig

Ruyter

K. D.

Mahr

Wetzels

(2019). Cutting through content clutter: How speech and image acts drive consumer sharing of social media brand messages. Journal of Consumer Research, 45(5), 988–1012.

100.

Waaramaa

Laukkanen

A. M.

Airas

Alku

(2010). Perception of emotional valences and activity levels from vowel segments of continuous speech. Journal of Voice, 24, 30–38.

101.

Wang

X. I.

Khamitov

Bendle

(2021). Audio mining: The role of vocal tone in persuasion. Journal of Consumer Research, 48(2), 189–211.

102.

Weng

Huang

Bao

(2021). A model of tourism advertising effects. Tourism Management, 85, 104278. https://doi.org/10.1016/j.tourman.2020.104278

103.

Whiteside

S. P.

(1999a). Acoustic characteristics of vocal emotions simulated by actors. Perceptual and Motor Skills, 89, 1195–1208.

104.

Whiteside

S. P.

(1999b). Note on voice and perturbation measures in simulated vocal emotions. Perceptual and Motor Skills, 88, 1219–1222.

105.

Wilson

(2003). The effect of music on perceived atmosphere and purchase intentions in a restaurant. Psychology of Music, 31(1), 93–112.

106.

Zander

M. F.

(2006). Musical influences in advertising: How music modifies first impressions of product endorsers and brands. Psychology of Music, 34(4), 465–480.

107.

Zenith. (2021). Business Intelligence – Travel. Zenith.

108.

Zentner

Grandjean

Scherer

K. R.

(2008). Emotions evoked by the sound of music: Characterization, classification, and measurement. Emotion, 8(4), 494–521.

109.

Zhang

Qiao

(2016). Joint face detection and alignment using multi-task cascaded convolutional networks. IEEE Signal Processing Letters, 23(10), 1499–1503.

110.

Zhu

Meyers-Levy

(2005). Distinguishing between the meanings of music: When background music affects product perceptions. Journal of Marketing Research, 42(3), 333–345.

111.

Zoghaib

(2017). The contribution of a brand spokesperson’s voice to consumer-based brand equity. Journal of Product & Brand Management, 26(5), 492–502.

112.

Zoghaib

(2019). Persuasion of voices: The effects of a speaker’s voice characteristics and gender on consumers’ responses. Recherche et Applications en Marketing, 34(3), 83–110.

Smooth Talking and Fast Music: Understanding the Importance of Voice and Music in Travel and Tourism Ads via Acoustic Analytics

Abstract

Keywords

Introduction

Literature Review and Conceptual Model

Heuristic-Systematic Model/Elaboration Likelihood Model

Mehrabian-Russell and Musicscape Models

Voice Analytics

Hypotheses and Research Model

Voice Features

Music Features

Methodology

Sample Selection and Collection of Data

Data Pre-processing: Audio Data Format and Splitting

Data Processing: Extraction of Audio Features

Combining Data Sources, Data Preparation and Calculation of Dependent Variable

Outliers and Power Analysis

Examination of Model Diagnostics

Results

Discussion and Conclusions

Implications for Theory and Research

Implications for Practice

Limitations and Future Research

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iD

Author Biography

References