Abstract
Interpersonal interaction through vocal music language has become an important channel for cross-cultural social life. How to skillfully use vocal music language to stimulate the audience’s emotional arousal and trigger their immersive experience has become a research focus in the field of music. At present, the academic research on vocal music language is diversified, and has explored the path of vocal music occurrence and its development direction under specific circumstances. However, the process of immersive experience arousal from the perspective of “music” language and “lyrics” language is a hidden code study. In order to comprehensively consider various styles of vocal music, we selected the music of three different singing styles including folk style singing, bel canto and popular singing as experimental materials to study the arousal process of the audiences’ immersive experience of vocal music language when listening to music. The results indicate that both “music” and “lyrics” language perceptions exert a significantly positive impact on the audience’s emotional arousal and immersive experience, and in addition, arousal plays a significant mediating role between vocal music language and immersive experience. Therefore, by means of clarifying the internal logic of the audience’s immersive experience from the perspectives of “music” and “lyrics” in the vocal music language, this study provides new theoretical insights for the applicable boundary of telepresence in the vocal music field, facilitate operators to design on-the-scene vocal music art environment to convey the emotional connotation of vocal music, and further promote the development of the vocal music field.
Plain language summary
This paper conducts a study on the arousal process of the audiences’ immersive experience of vocal music language when listening to music through PLS-SEM analysis. The findings indicate that both “music” and “lyrics” language perceptions exert a significantly positive impact on the audience’s emotional arousal and immersive experience, and in addition, arousal plays a significant mediating role between vocal music language and immersive experience. We believe that our study significantly contributes to current literature because it provides new theoretical insights for the applicable boundary of telepresence in the vocal music field: This study aims to clarify the internal logic of the audience’s immersive experience from the perspective of “music” and “lyrics” in the vocal music language through analyzing the path mechanism in a bid to explore the audience’s emotional arousal and flow experience regularities of vocal music, and confirm the existence and applicability of arousal and immersive experience in the vocal music language.
Keywords
Introduction
At present, the vocal music language, which is composed of music tones and lyrics, has been highly integrated into human society. Along with individuals’ learning, work, entertainment and social interaction (C. Liu et al., 2021; Sihvonen et al., 2021; Sterne & Razlogova, 2019; Weiss et al., 2021), it has become an indispensable part of human life. The development of vocal music language, on the one hand, has enriched the spiritual world of humanity and promoted the spread and development of art (Jarvis, 2019); on the other hand, it has also built a brand-new space for cultural and emotional communication for mankind (Cowen & Keltner, 2021). Interpersonal interaction through music has become an important channel for cross-cultural social life, and social connections are promoted in a larger scale (Keller et al., 2017; Savage et al., 2020). Vocal music involves a variety of affective and cognitive functions, which can not only stimulate the generation of memory and emotion, but also directly affect the audience’s attention allocation and understanding during the processing of tones and other things (Fitzroy & Sanders, 2015). The nonverbal signals of sound and musical instruments are important media in emotional communication, which can express emotional information and coordinate the sound interaction between individuals (Filippi, 2016). Moreover, as an integral component of vocal music, as well as a vital means of conveying information and emotional interaction, language has built a bridge between singers and audience (Taylor-Neu, 2018). Therefore, the vocal music language can create an on-the-scene immersive experience for the audience through the expression of human voice tones with emotion as the core and the language signs of lyrics that convey the connotation.
At present, the academic research on vocal music language is diversified. For example, Correia et al. (2022) demonstrated that vocal music training can enhance a variety of non-musical abilities, including speech perception and emotional recognition. Besides, Merrill and Larrouy-Maestri (2017) better explained the vocal music expression in songs and languages by comparing the similarities and differences between languages and songs. It can be seen from the existing studies that the research of vocal music language has explored the path of vocal music occurrence and its development direction under specific circumstances. However, the complete path of the formation of its internal mechanism is a hidden code research. The process of immersive experience arousal from the perspective of “music” language and “lyrics” language needs to be explored urgently. Based on this, this study breaks through the previous thinking and systematically constructs the path mechanism of the immersive experience arousal of vocal music language from the dual perspective of “music” and “lyrics.”
Therefore, this study took the vocal music language as an external stimulus to stimulate the audience to produce vocal music telepresence (i.e., “music” language perception and “lyrics” language perception). According to the research paradigm of stimulus, organism and response in psychology, this study introduced arousal and immersive experience as the two variables, and measured the four indicators in line with the mature scale developed in previous studies. Second, 63 valid questionnaires were collected before the formal experiment, and the items were tested and corrected to ensure the validity of the follow-up experiments. Finally, the formal experiments were conducted in the context of three different vocal music styles. A total of 371 questionnaires were distributed, of which 343 valid ones were obtained except for those not filled in seriously. The three experimental groups were set up with 113, 111, and 119 respondents respectively.
This study aims to clarify the internal logic of the audience’s immersive experience from the perspective of “music” and “lyrics” in the vocal music language through analyzing the path mechanism in a bid to explore the audience’s emotional arousal and flow experience regularities of vocal music, and confirm the existence and applicability of arousal and immersive experience in the vocal music language to further provide new theoretical insights for the applicable boundary of telepresence in the vocal music field. Furthermore, the singers should be able to rely on their own performance and vocal music competence to let the audience experience the spirit and emotion in the music, resonate with the music, and convey the same emotion as the composer in the singing process to make the audience have immersive experience, which is of practical significance for creating an immersive vocal music art environment.
In order to discuss the language characteristics of “music” and “lyrics” in vocal music, this study selected the music of three different vocal styles including folk style singing, bel canto singing, and popular singing to carry out the empirical analysis. The specific procedures were as follows. First, it applied the one-way ANOVA to verify that there exist differences in the telepresence of the three vocal music styles to ensure the effectiveness of the experiment. Second, during the formal experiments, the respondents were instructed to listen to the vocal music and click a link to browse the music lyrics, and then they were asked to fill in the online questionnaire according to their real feelings in the process of listening and browsing, and the integrity of their browsing the music information was ensured. In addition, in line with the empirical analysis, this study verified the mediating effect of audience’s emotional arousal between vocal music language (“music” language and “lyrics” language) and immersive experience to explore the effects of the arousal process in the realization of an immersive experience induced by vocal music language.
Literature Review and Research Hypotheses
Vocal Music Language and Its Telepresence
Telepresence is used to describe the convincing sense of reality in the virtual environment (Cowan & Ketron, 2019; Han et al., 2020; Held, 1992). In this study, it refers to the audience’s sense of reality in the artistic conception of vocal music. The generation of arousal emotion is usually affected by telepresence, that is, when the audience perceives high “on-the-scene” telepresence, they will unconsciously generate arousal emotion and feel excited and happy (Lai et al., 2009). Therefore, in this study, telepresence is taken as a vital antecedent factor of arousal, and arousal is measured as an observable variable.
The academia has reached a consensus regarding the relationship between vocal music telepresence and emotional arousal. First, the rehabilitation physiotherapy research demonstrates that vocal music can be used as a tool for emotional regulation of patients and it can provide listeners with a sense of spatial telepresence (Buche et al., 2021). Second, the research of detecting the effects of stereo field on psychological response indicates that the improvement of vocal music features (such as the sense of reality of the rhythm) will enhance the audience’s emotional changes and feelings of music (Ooishi et al., 2021). Third, the research on driving suggests that the existence of slow-paced music optimizes the emotional potency and arousal of urban driving, and reduces the psychological needs (Karageorghis et al., 2021). Finally, the research on the role of lyrics in the recognition process of musical emotion demonstrates that the lyrics have achieved better performance in arousal and potency, that is, psychological and physiological changes can be generated through the lyrics so that the psychological state of the audience can be awakened, which is of great significance for arousing emotions (Malheiro et al., 2018). In brief, both “music” and “lyrics” can affect individuals’ emotional changes to reach the psychological state of telepresence, and the perception of music tones and lyrics can convey and evoke emotions (Bonin et al., 2016; Mori & Iwanaga, 2021). Accordingly, this study proposes the following research hypotheses:
Flow Language and Immersive Experience in Vocal Music
Immersive experience (flow experience) was first proposed by Csikzentmihalyi (1975). It is defined as a state in which individuals can focus on an activity for a long time and completely immerse in it. Once this state is formed, individuals will feel excited and happy, thus they are willing to make efforts to maintain a relationship with the fascinating object (Csikzentmihalyi, 1975; Tonietto & Barasch, 2021). Subsequently, scholars combined vocal music with immersive experience and defined the complete immersive state driven by vocal music (including temporary loss of self-consciousness or even loss of time and space) as music transcendence or absorption (Cardona et al., 2022), which triggered scholars’ exploration on the relationship between vocal music language and immersive experience. The survey through the online users’ experience demonstrated that music can enhance the intention of network behavior by enhancing the sense of immersion (Cuny et al., 2015); Besides, the research on the effects of music on the sense of immersion in narrative confirmed that the addition of music elements can bring direct and indirect effects on the immersion experience (Budzynski-Seymour et al., 2021). In addition, with the continuous deepening of vocal music language in academic circles, the research on immersive experience of vocal music has gradually expanded to other fields. For example, in medical research, music is used as a tool to create a sense of immersion to stabilize the emotional state of patients (Buche et al., 2021); In tourism research, birdsong accompanied by insects and running water can enhance participants’ sense of participation and immersion, which is one of the ways to relieve tourists’ psychological pressure (Y. Liu et al., 2019). Therefore, there has reached a basic consensus that the flow language in vocal music has a positive effect on immersive experience. This study expands such flow language into “music” language and “lyrics” language to further study the effects of their perceptions on immersive experience. Based on this, it proposes the following research hypotheses:
Immersive Experience of Language Arousal
Arousal is a kind of subconscious emotional state, which originated from the PAD affective model proposed by Mehrabian and Russell (1974). It is defined as the emotional state in which an individual changes from a relatively relaxed, bored or sleepy state to a stimulated, excited and awakened state when facing a situational stimulus (Wang et al., 2020). The immersive experience is the peak performance of the psychological state. An individual needs to go through the arousal stage from calm to excitement and from boredom to stimulation, and be in a completely focused mental state to maximize internal motivation and enjoyment in order to obtain the best experience (Shoshani & Yaari, 2022). In the prior related literature, there have been studies on the three manifestations of the immersive experience process, namely high concentration, reduced self-consciousness, and a blurred sense of time, which believe that emotional arousal plays a core role in these three manifestations (Philippe et al., 2022), incorporating arousal as a factor into the immersive design framework (Matovu et al., 2023), and emphasizing that emotional changes, namely psychological arousal, can have a positive effect on the audience’s immersive experience. Therefore, arousal is generally considered as a key attribute or important driver of immersive experience (Pelet et al., 2017).
The academia also integrates arousal into the studies on vocal music language and immersive experience. From the perspective of singers, previous studies have shown that musicians can deliver strong emotional experience to the audience, that is, stimulate the audience’s emotional arousal through multiple auditory channels such as music, human voice or voice rhythm (Nolden et al., 2017), and integrate the syntactic structure, compositional characteristics and related emotional expression of vocal music works (Marin & Bhattacharya, 2013) to trigger positive emotions and high concentration to further promote the immersive experience of the musicians and audience (de Manzano et al., 2010). From the perspective of the audience, when the psychological state of the audience is at a specific level, the music tones and voices can affect the surrounding environment and atmosphere, and mobilize the emotions of the audience. The lyrics language can enable the audience to understand the emotion and connotation of the music story through the narrative process in a bid to arouse and enhance the shared experience between the audience and the singers to reach the spiritual state of the audience’s immersive experience (Chang et al., 2021; Dolan et al., 2018). Moreover, immersive experience, as a subjective experience, is usually affected by the telepresence in the media environment (Cadet & Chainay, 2020), and the telepresence is an important antecedent factor of emotional arousal. Therefore, this study regards arousal as the mediating factor between telepresence and immersive experience, that is, when the music tone or lyrics language causes the psychological changes of the audience, the audience’s emotional state will be aroused, who become the “persons in the music.” Based on this, this study proposes the following research hypotheses:
Based on the above hypotheses, a theoretical model diagram of the immersive experience arousal process of vocal music language is constructed, as shown in Figure 1.

Theoretical model diagram.
Research Design
Research Measurement Criteria
In order to explore the immersive experience and arousal process of vocal music language, from the perspective of telepresence and in line with the research paradigm of stimulus, organism, and response in psychology, the current study takes vocal music language as an external stimulus to stimulate the audience to produce the telepresence of vocal music (i.e., “music” language perception and “lyrics” language perception). The telepresence of vocal music may contribute to realizing immersive experience in an indirect manner through the organism variable, namely, arousal. Therefore, this study introduces arousal as a mediating variable of vocal music telepresence affecting immersive experience in order to better explain the relationship between vocal music telepresence and immersive experience.
In order to test the effectiveness of the independent variable manipulation and the design rationality of the measurement items, this study conducted a pre-test before the formal experiments. 72 respondents were selected and randomly divided into three groups. They were asked to score each measurement item after browsing the experimental materials. The measurement items were selected based on the reference of foreign mature scales, with appropriate modification in line with the research context. In particular, four items relating to the vocal music telepresence were selected by referring to the research of Verhagen et al. (2014), which were adapted in combination with “music” language and “lyrics” language; four items relating to arousal were selected by referring to the research of Novak et al. (2000); five items relating to immersive experience were selected by referring to the research of Richard and Chebat (2016). All the measurements were conducted on the Likert 5-point scale (1 = disagree strongly, 5 = agree strongly).
Research Procedure
After years of practice and continuous innovation, the young singer’s competition has become an important platform for carrying forward national art, popularizing music knowledge, discovering and launching vocal talents, and leading and promoting the development and prosperity of China’s vocal music industry. It has provided many excellent music talents for the Chinese music circle (Chen, 2021). And since the second contest in 1986, the young singers Grand Prix was set up in three departments in accordance with the singing styles: “Folk Style Singing,”“Bel canto Singing,” and “Popular Singing” (Chen, 2021). These three singing methods have distinct characteristics, among which folk-style music has strong national characteristics (Li & Maneewattana, 2022), bel-canto music has the characteristics of gorgeous and smooth timbre (Huang, 2020), and popular music has the characteristics of simple and easy to sing, and convenient (Zeng, 2020). Therefore, in this study, three singing methods of vocal music were adopted and the music of different vocal styles were played to the respondents to conduct three vocal music experiments.
The first experiment selected a folk-style music—the studio version of “Like a Dream,” sung by Cai Qin. The lyrics are from the classical works of Li Qingzhao, a Chinese poet and essayist during the Song Dynasty and the representative of the graceful and restrained poetic genre. The lyrics are implicit and soft, the language is mellow and beautiful. The singer takes abdominal breathing as the core, with even and stable breath support, showing perfect control in the tone quality and fluidity of the prolonged sound, to realize the smoothness of the voice and the extension of the emotion. The chest resonance makes the voice more calm and solid, and the head resonance applied in the higher register gives the sound brightness and depth. In terms of melody, the notes are closely connected with each other, the rhythm is smooth and steady, and the range is relatively narrow, mostly focusing on the middle and lower registers.
The second experiment selected a bel-canto music—the studio version of “You Raise Me Up,” sung by Martin Hurkens, mainly performed with the bel canto tune. The tone can be soft or strong, with great penetration and fullness. The singer performed the high notes of the song with tension, through solid abdominal breath control and head resonance, while precisely controlling of the airflow gives the sound a dramatic and ritualistic feeling. The melody builds to a climax as the emotion progresses, with a wide range and significant spans, demonstrating the singer’s mastery of complex melodies and reflecting the spiritual core of the song.
The third experiment selected a popular music—the studio version of “Sunny Day,” sung by Jay Chou, narrative expression of lyrics changes according to the strong rhythm of the music. Adopting the iconic and simple melodic design of pop songs, it features close and smooth note articulation. The singer’s voice is mainly nasal and oral resonance, with soft and transparent timbre, natural and spontaneous breath, and high affinity for emotional expression. The entire track has a moderate range, which is centered on the middle and high registers, complementing the youthful mood of the narrative.
In order to test the effectiveness of the independent variable manipulation and the rationality of the measurement item design, 63 valid questionnaires were collected in the pre-test. Through one-way ANOVA, it was found that there were indeed differences in the telepresence of the three vocal music styles, indicating that the independent variable manipulation was successful. The study also conducted exploratory factor analysis on the predicted data of each variable by using SPSS 26.0 software. The results showed that the factor load of all variable measurement indicators was greater than 0.7, the Cronbach’s a value of each variable was above .8, and the KMO coefficient was above 0.7, indicating that the measured variables had high reliability and validity and were suitable for formal experiments.
The formal experiments were carried out in the music laboratories. In order to ensure the experimental quality, all the laboratories selected can accommodate 100 to 120 persons, and the experimental materials were randomly distributed in each laboratory. The experimental process was divided into three parts. First, after written informed consent was obtained from all participants and/or their legal guardians, the experimental requirements and instructions were explained to the respondents who were asked to cooperate seriously. Second, the respondents were asked to listen to the vocal music, open a link to browse the lyrics of the vocal music, and then fill in the electronic questionnaire according to their real feelings during listening and browsing through mobile phones. In order to ensure that the respondents browse the complete lyrics information of the vocal music, one question related to the vocal music information in the experimental materials was designed in the questionnaire to help screen out the valid questionnaires. A total of 371 questionnaires were distributed in the formal experiments, of which 343 valid ones were obtained except for those not filled in seriously. The three experiments were set up with 113, 111, and 119 respondents respectively. In addition, experiment was conducted in compliance with the Declaration of Helsinki and institutional approval, and all methods were performed in accordance with the relevant guidelines and regulations.
Sub-Study 1: Immersive Experience Arousal Process of Folk-Style Singing
Descriptive Statistics of Samples
In this sub-study, “Like a Dream” was played in front of the respondents, and 113 effective samples were obtained, including 25 males and 88 females, accounting for 22.1% and 77.9% of the total respectively, with a gender ratio of nearly 1:3. In terms of age, the respondents were mainly between 21 and 40 years old, accounting for 46.0% of the total. From the perspective of educational background, the respondents with junior college education and above were the majority, accounting for 75.2% of the total. In terms of occupation, there were 70 students, accounting for 61.9% of the total.
Findings
The analysis results of the measurement model are as follows. First, the PLS-SEM analysis indicates that the factor distribution of each dimension conforms to the expected setting of the scale. In order to ensure the reliability of the data, the required indicators for the components were all above 0.7. The output results suggest that the data structure was reasonable, and the required four indicators received favorable feedback. Second, the standardized results were used for analysis, and Cronbach’s α was applied as the indicator for the reliability test, as shown in Table 1, Cronbach’s Alpha coefficients of “music” language perception, “lyrics” language perception, arousal and immersive experience were all greater than .8. The validity test included convergence validity and discriminant validity. The convergence validity test took the combinatory reliability (CR) and the average variance extracted (AVE) as the evaluation indicators. The CR of each variable was greater than 0.8 and the AVE was greater than 0.5, indicating that the convergence validity of each variable is satisfactory. The discriminant validity test indicated that the correlation coefficient between each variable and other variables in this study was less than the square root of the AVE of each variable, suggesting that the discriminant validity of each variable is satisfactory. Third, in the model testing process, the R2 value of arousal and immersive experience was above .6 respectively, and the NFI of the model was .760, indicating that the model’s goodness of fit in Sub-study 1 was acceptable.
Sub-study 1: Reliability and Validity Tests of Constructs.
In order to further test the mechanism path of vocal music telepresence (i.e., “music” language perception and “lyrics” language perception) on immersive experience, and verify the rationality of all the hypotheses, this study conducted structural equation model analysis as shown in Table 2. First, both “music” language perception and “lyrics” language perception have a significantly positive effect on arousal (β“music” language = 0.263, p = .015; β“lyrics” language = 0.610, p < .001), which is consistent with H1 and H2. Second, both “music” language perception and “lyrics” language perception have a significantly positive effect on immersive experience (β“music” language = 0.199, p = .027; β“lyrics” language = 0.268, p = .010), thus H3 and H4 are verified. Third, arousal has a significantly positive effect on immersive experience (β = .481, p < .001), consistent with H5. Fourth, arousal plays a significant mediating role between the “music” language perception and immersive experience (p = .035), thus H6 is verified. Fifth, arousal plays a significant mediating role between the “lyrics” language perception and immersive experience (p = .002), thus H7 is verified.
Sub-study 1: Path Coefficient and Specific Indirect Effects Test.
Discussion
Sub-study 1 explored the mechanism of the audience’s vocal music telepresence on immersive experience in the context of folk-style music, which provided preliminary support for the theoretical model in this study. The findings suggest that when listening to the folk-style vocal music, both the audience’s “music” language perception and “lyrics” language perception enhanced their arousal emotions, thus triggered immersive experience. However, because the folk-style singing pays attention to the naturalness of the singing voice, emphasizes the organic combination of tune and articulation, advocates the basic view that only when the “lyrics” are correctly pronounced can the mellow and full “tune” be realized, and pursues the traditional practice of “clear articulation” and “correct rhyme,” the “lyrics” language perception plays a stronger role in arousal and immersive experience than the “music” language perception.
Sub-Study 2: Immersive Experience Arousal Process of Bel Canto Singing
Descriptive Statistics of Samples
In this sub-study, the song “You Raise Me Up” was played in front of the respondents, and 105 effective questionnaires were obtained, of which 27.0% were male and 73.0% are female; In terms of age, respondents under 40 years old were the majority, accounting for 79.2%; In terms of educational background, most of the respondents received junior college education or a bachelor’s degree, respectively including 25 and 39 persons. In terms of occupation, there were 61 students, accounting for 55.0% of the total.
Findings
The analysis results of the measurement model are as follows. First, the PLS-SEM analysis suggested that the factor distribution of each dimension conforms to the expected setting of the scale. In order to ensure the reliability of the data, the required indicators for the components were all above 0.7. The output results indicated that the data structure was reasonable, and the required four indicators received favorable feedback. Second, the standardized results were applied for analysis as shown in Table 3. The Cronbach’s Alpha coefficients of “music” language perception, “lyrics” language perception, arousal and immersive experience were all greater than .8, the combinatory reliability (CR) was greater than 0.8, and the average variance extracted (AVE) was greater than 0.5, indicating that the convergence validity of each variable was satisfactory. And the correlation coefficient between each variable and other variables was less than the AVE square root of each variable, indicating that the discriminant validity of the variable was satisfactory. Third, in the model testing process, the R2 values of arousal and immersive experience were above .5, and the NFI of the model was .759, indicating that the model’s goodness of fit of Sub-study 2 was acceptable.
Sub-study 2: Reliability and Validity Test of Constructs.
The results of structural model analysis were shown in Table 4. First, both “music” language perception and “lyrics” language perception have a significantly positive effect on arousal (β“music” language = 0.420, p < .001; β“lyrics” language = 0.403, p = .001), which is consistent with H1 and H2. Second, both “music” language perception and “lyrics” language perception have a significantly positive effect on immersive experience (β“music” language = 0.191, p = .041; β“lyrics” language = 0.222, p = .009), thus H3 and H4 are verified. Third, arousal has a significantly positive effect on immersive experience (β = .579, p < .001), which is consistent with the H5. Fourth, arousal plays a significant mediating role between the “music” language perception and the immersive experience (p = .001), thus H6 is verified. Fifth, arousal plays a significant mediating role between “lyric” language perception and immersive experience (p = .002), thus H7 is verified.
Sub-study 2: Path Coefficients and Specific Indirect Effects Test.
Discussion
In Sub-study 2, bel canto vocal music replaced the folk-style vocal music in Sub-study 1 to explore the mechanism of the audience’s vocal music telepresence on immersive experience, which provided further support for the theoretical model of this study. The results demonstrated that when bel canto was played, the audience’s “music” language perception and “lyrics” language perception enhanced their arousal emotions, and thus triggered immersive experience. Moreover, because bel canto is famous for its mellow and full tone and gorgeous and smooth timbre, which is a relatively complete and systematic style, the “music” language perception and the “lyrics” language perception have similar effects on arousal and immersive experience.
Sub-Study 3: Immersive Experience Arousal Process of Popular Singing
Descriptive Statistics of Samples
In this sub-study, the song “Sunny Day” was played in front of the respondents, and 119 valid questionnaires were finally obtained. In terms of gender, the number of men and women was basically the same, of which 50.4% were male and 49.6% were female; In terms of age, respondents aged 21 to 40 were the majority, accounting for 89.1%; In terms of educational background, the number of respondents with junior college education or a bachelor’s degree was the largest, with a total of 90 people, accounting for 75.7%; In terms of occupation, there were 67 students, accounting for 56.3%.
Findings
The analysis results of the measurement model are as follow. First, the PLS-SEM analysis suggested that the factor distribution of each dimension conforms to the expected setting of the scale. In order to ensure the reliability of the data, the required indicators for the components were all above 0.7. The output results showed that the data structure was reasonable, and the required four indicators received favorable feedback. Second, the standardized results were applied for analysis as shown in Table 5. The Cronbach’s Alpha coefficients of “music” language perception, “lyrics” language perception, arousal and immersive experience were all greater than .8, the combinatory reliability (CR) was greater than 0.8, and the average variance extracted (AVE) was greater than 0.5, indicating that the convergence validity of each variable was satisfactory. And the correlation coefficient between each variable and other variables was less than the AVE square root of each variable, indicating that the discriminant validity of the variable was satisfactory. Third, in the model testing process, the R2 values of arousal and immersive experience were above .6, and the NFI of the model was .777, indicating that the model’s goodness of fit in Sub-study 3 was acceptable.
Sub-study 3: Reliability and Validity Test of Constructs.
The results of structural model analysis were shown in Table 6. First, both “music” language perception and “lyrics” language perception have a significantly positive effect on arousal (β“music” language = 0.570, p < .001; β“lyrics” language = 0.298, p = .005), which is consistent with H1 and H2. Second, both “music” language perception and “lyrics” language perception have a significantly positive effect on immersive experience (β“music” language = 0.296, p = .002; β“lyrics” language = 0.174, p = .043), thus H3 and H4 are verified. Third, arousal has a significantly positive effect on immersive experience (β = .482, p < .001), consistent with H5. Fourth, arousal plays a significant mediating role between the “music” language perception and immersive experience (p = .002), thus H6 is verified. Fifth, arousal plays a significant mediating role between “lyrics” language perception and immersive experience (p = .019), thus H7 is verified.
Sub-study 3: Path Coefficients and Specific Indirect Effects Test.
Discussion
Sub-study 3 explored the mechanism of audience’s vocal music telepresence on immersive experience in the context of popular singing, which provided strong support for the theoretical model of this study. The results demonstrated that when listening to popular music, the audience’s “music” language perception and “lyrics” language perception enhanced their arousal emotions, and thus triggered the immersive experience. However, as popular singing pays more attention to feeling, emphasizes the importance of musical sense and imitation in singing, pursues the individuality and uniqueness of voice, and the “oral” style of singing, the “music” language perception plays a stronger role in arousal and immersive experience than the “lyrics” language perception.
Sub-Study 4: Re-verify the Immersive Experience Arousal Process of the Three Singing Methods
In order to verify the results of the above three studies again, this paper selected different experimental materials, and then followed the same steps to conduct experiments on folk-style singing, bel canto singing, and popular singing scenes respectively. The researches were as follows:
First of all, this paper selected the live version of “Qinyuan Spring Snow” by Liao Changyong, vice chairman of the Chinese Musicians Association, to create a scene of folk-style singing, and conducted experiments on 68 samples (male 47.1%, female 52.9%). The results of structural model analysis were shown in Table 7. The output results suggested that both “music” language perception and “lyrics” language perception have significantly positive effects on arousal (β“music” language = 0.344, p = .001; β“lyrics” language = 0.608, p < .001), the “lyrics” language perception plays a stronger role in arousal than the “music” language perception; both “music” language perception and “lyrics” language perception have significantly positive effects on immersive experience (β“music” language = 0.278, p = .008; β“lyrics” language = 0.280, p = .044); arousal has a significantly positive effect on immersive experience (β = .444, p < .001); arousal plays a significant mediating role between the “music” language perception, “lyrics” language perception and immersive experience (p“music” language = 0.013, p“lyrics” language = .001). This is consistent with the conclusions of Study 1.
Sub-study 4: Path Coefficients and Specific Indirect Effects Test in Folk-style Singing Scene.
Secondly, this study selected the live version of “’O sole mio” by Pavarotti, one of the world’s famous three tenors, as the experimental material to create a scenario of bel canto singing (sung in Italian, and then matched the lyrics to the Chinese translation), and conducted experiments on 61 samples (male 45.9%, female 54.1%). The results of structural model analysis were shown in Table 8. The findings suggested that both “music” language perception and “lyrics” language perception have significantly positive effects on arousal (β“music” language = 0.457, p = .001; β“lyrics” language = 0.516, p < .001), and “music” language perception and “lyrics” language perception have similar effects on arousal; both “music” language perception and “lyrics” language perception have significantly positive effects on immersive experience (β“music” language = 0.260, p = .040; β“lyrics” language = 0.263, p = .040), and “music” language perception and “lyrics” language perception have similar effects on immersive experience; arousal has a significantly positive effect on immersive experience (β = .470, p = .004); arousal plays a significant mediating role between the “music” language perception, “lyrics” language perception and immersive experience (p“music” language = 0.044, p“lyrics” language = .012). This is consistent with the conclusions of Study 2.
Sub-study 4: Path Coefficients and Specific Indirect Effects Test in Bel Canto Singing Scene.
Finally, this study selected the live version of “Love Story” by the famous American pop singer Taylor Swift as the experimental material to create a scenario of popular singing (sung in English, and then matched the lyrics to the Chinese translation), and conducted experiments on 72 samples (male 52.8%, female 47.2%). The results of structural model analysis were shown in Table 9. The output results showed that both “music” language perception and “lyrics” language perception have significantly positive effects on arousal (β“music” language = 0.648, p < .001; β“lyrics” language = 0.297, p = .003), the “music” language perception plays a stronger role in arousal than the “lyrics” language perception; both “music” language perception and “lyrics” language perception have significantly positive effects on immersive experience (β“music” language = 0.307, p = .047; β“lyrics” language = 0.222, p = .016); arousal has a significantly positive effect on immersive experience (β = .461, p < .001); arousal plays a significant mediating role between the “music” language perception, “lyrics” language perception and immersive experience (p“music” language = 0.001, p“lyrics” language = .027). This is consistent with the conclusions of Study 3.
Sub-study 4: Path Coefficients and Specific Indirect Effects Test in Popular Singing Scene.
Research Conclusions
Summary
Based on the two perspectives of “music” and “lyrics,” this study explored the relationship between the telepresence of vocal music language and the immersive experience of the audience. Through the experiments, the differences of the effects that different vocal music styles exert on the telepresence were tested, and then the arousal path of immersive experience of vocal music telepresence was further revealed. The specific conclusions are as follows. First, the presentation modes of language perception in different vocal music styles have different effects on arousal, which can be explained by the theory of mental image. Mental image is one kind of knowledge representation of non-existent objects, and its information is stored in the mind in the form of images. That is, when individuals are in a music scene, they will form a mental image of the music scene in the mind with the help of clues in vocal music, such as “music” language and “lyrics” language, and then form a virtual on-the-spot experience. Generally speaking, the more specific and vivid the clue information, the easier it is to arouse the individual’s mental image. Second, both “music” language perception and “lyrics” language perception are important driving factors for the audience to be aroused. The vivid form of vocal music can provide a real feeling that “the scene depicted by vocal music seems to be right in front of us” to make the audience have the telepresence of being personally on the scene. Such telepresence will easily promote the audience to generate arousal emotions, shorten the psychological distance and value perception between the audience and the singers, and thus more easily stimulate the audience to generate excited and joyful arousal emotions. Third, the direct effect of “music” language perception and “lyrics” language perception on immersive experience is significant, and arousal plays a significant mediating role. Emotions can be transmitted to the audience through various channels of human voice, phonological rhythm and other auditory means. In addition, the syntactic structure and composition features of vocal music can also bolster immersive experience. Moreover, it produces an on-the-scene experience, and arouses emotions through excitement and stimulation, which can be further transformed into the optimal experience of enjoyment, pleasure, and immersion.
Theoretical Significance
Theoretically, the findings of this study have great significances in the following aspects. First, it has verified the existence and applicability of immersive experience in vocal music language, and confirmed that vocal music tone and lyric language can “describe” the emotion and connotation in the music stories and mobilize the audience’s emotions (Vidas et al., 2020), while the audience generates immersive experience due to the stimulation of the external environment (Shin, 2018). Second, it has expanded the research on the influence of vocal music telepresence on arousal. Previous studies have mainly focused on the impact of voice perception (Kuang & Liberman, 2018; Yoo & Bidelman, 2019) and emotional recognition (Correia et al., 2022; Morningstar et al., 2018) on the audience. However, this study applied three vocal styles including folk singing, bel canto and popular singing to conduct experiments, verifying the relationship between vocal music telepresence and audience’s emotional arousal, which can enrich the existing research results to a certain extent. Third, this study took arousal as a mediating factor, and based on the double perspectives of implicit “music” language and explicit “lyrics” language, demonstrating the influence of vocal music telepresence on the audience’s immersive experience, which can provide a new perspective for understanding the relationship between vocal music language and the audience’s immersive experience, and enrich the applicable boundary of telepresence in the field of vocal music. Finally, this study further tested the applicability of arousal and immersive experience in vocal music environment, and the results were consistent with previous studies. According to previous studies, arousal emotion can have a significantly positive effect on the audience’s immersive experience (Ding et al., 2018; Hu et al., 2021), and in the vocal music environment of this study, it is verified that arousal emotion has a significant effect on the audience’s immersive experience.
Practical Significance
From a practical point of view, the findings of this study have the following significances. First, vocal music is a comprehensive art blending both “music” language and “lyrics” language. The singers must have superb performance skills and vocal music competence to perfectly display the artistic features of vocal music performance. In addition, the singers must constantly strengthen their stage performance consciousness in the daily training process so that the performance effect of vocal music performance can be recognized and affirmed by the audience. Second, by means of strengthening the intervention in the presentation of vocal music language, and enhancing the vividness and visibility of vocal music language, the more vivid the vocal music language, the more obvious the inducing effect of emotional arousal. Therefore, it is necessary for the singers to pay full attention to the singing contents, attach importance to guiding the audience, and drive the audience to appreciate the vocal music language information to enable the audience to experience the spirit and emotion in the music, resonate with the vocal music, and form emotional arousal. Finally, the role of arousal should be emphasized in the formation of immersive experience of vocal music language perception. Both “music” language perception and “lyrics” language perception can exert influences on the immersive experience through arousal, which indicates that the immersive experience is controllable to a certain extent and can improve the audience’s “immersive experience index” through arousal. Therefore, in the future, the vocal music performance should be advertised with more perceptual rendering to arouse the optimal experience of potential audience.
Limitations and Prospect
Based on the two perspectives of “music” and “lyrics,” this study has explored the arousal process of vocal music language relating to the audience’s immersive experience. However, due to various subjective or objective reasons, this study has several limitations. First, this study has only analyzed the mediating role of arousal between the telepresence of vocal music and immersive experience, however, this path is not unique, the quality of singing, the differences between the singer, the audience’s own circumstance and cultural identity will also have an impact on the path. Therefore, subsequent studies can attempt to explore this mechanism from different theoretical perspectives. Second, this study has only focused on the flow experience brought by the telepresence from the angle of individual psychological perception. In fact, telepresence can also affect visual, auditory and other sensory experiences. Therefore, it is necessary to further improve the experiment in the future to test whether telepresence in the context of vocal music will affect the audience’s sensory experience, which further affects the audience’s behavior or behavioral intention.
Footnotes
Acknowledgements
We appreciate the time and effort that reviewers put into reviewing our manuscript and providing us with valuable comments and suggestions.
Ethical Considerations
All procedures performed in studies involving human participants were in accordance with the ethical standards of the Institutional and/or National Research Committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Consent to Participate
Informed consent was obtained from all participants included in the study.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
All data relevant to the study are included in the article. In addition, the data that support the findings of this study are available from the corresponding author on reasonable request, but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available.
