Sage Journals: Discover world-class research

Abstract

This paper combines Long Short-Term Memory (LSTM) and Self-Attention mechanisms to predict melody direction in vocal performances, exploring its application in folk music innovation to enhance expressiveness and creative diversity. Using the MAESTRO dataset, which includes audio and MIDI data, the continuous note sequence is divided into fixed-length segments (20 notes each) via a sliding window. The LSTM model captures temporal dependencies in the sequence, while Self-Attention assigns varying weights to inputs across time steps to better capture global context. To overcome the limitations of traditional pitch prediction methods, 10-fold cross-validation is employed to evaluate model performance. Experimental results show that with a window size of 20, the Mean Squared Error (MSE) is 0.023, and training takes 86 minutes, yielding the most balanced results across all configurations. Compared to Bi-LSTM and MusicTransformer, the proposed model excels in pitch prediction accuracy, achieving an average Mean Absolute Error (MAE) of 0.022, R² of 0.895, and Pearson Correlation Coefficient (PCC) of 0.925. The model is also tested in the creation of six types of folk music, demonstrating that while it reduces creation time, the harmony consistency slightly lags behind manually created melodies. This model has significant potential in pitch prediction and folk music creation, offering a valuable tool for music composition with high practical application value.

Keywords

vocal performance melody prediction folk music innovation long short-term memory attention mechanism

Get full access to this article

View all access options for this article.

References

Zhu

Yuan

, et al. Pop music generation: From melody to multi-style arrangement. ACM Trans Knowl Discov Data 2020; 14(5): 1–31.

Vuust

Heggli

Friston

, et al. Music in the brain. Nat Rev Neurosci 2022; 23(5): 287–305.

Briot

J-P

Pachet

. Deep learning for music generation: challenges and directions. Neural Comput Appl 2020; 32(4): 981–993.

Janicke-Bowles

De Leeuw

RNH

, et al. The melody to inspiration: The effects of awe-eliciting music on approach motivation and positive well-being. Media Psychol 2021; 24(3): 305–331.

Yang

Chai

Qiu

, et al. The relationship between empathy and emotion recognition in Chinese folk music: Evidence from ERP. Acta Psychol Sin 2022; 54(10): 1181.

Zhou

. National music cultural inheritance and school music education. Fujian Tea 2020; 42(1): 238–239.

Zhao

. A brief discussion on the inheritance and development of national music in primary school music teaching. Digit Des 2020; 9(12): 295.

Liu

. The reform of college music education from the perspective of national music cultural inheritance--A review of “Research on Chinese National Music Cultural Inheritance and College Music Education”. Educ Dev Res 2020; 40(2): 86.

Warrenburg

. Choosing the right tune: A review of music stimuli used in emotion research. Music Perception 2020; 37(3): 240–258.

10.

Marion

Di Liberto

Shamma

. The music of silence: part I: responses to musical imagery encode melodic expectations and acoustics. J Neurosci 2021; 41(35): 7435–7448.

11.

Supiarza

Sarbeni

. Teaching and learning music in digital era: creating keroncong music for gen z students through interpreting poetry. Harmonia 2021; 21(1): 123–139.

12.

Abdumutalibovich

. The role of the system of authorities and the historical formation of shashmaqom in the teaching of music to students of higher education. Academicia Globe 2022; 3(02): 121–127.

13.

Guo

Liu

Ding

, et al. Research on the algorithm of generating music melody from digital images based on LabVIEW. J Xi’an Univ Technol 2020; 36(1): 65–71.

14.

Zhang

, et al. Global digital compact: A mechanism for the governance of online discriminatiory and misleading content generation. Int J Hum Comput Interact 2024; 2(3): 1–28. DOI: 10.1080/10447318.2024.2314350.

15.

Yuan

. Analysis of music in the movie “La La Land”. Cultural and Art Innov·Int Acad Forum 2024; 3(7): 37–39.

16.

Zhang

Runqiang

. A review of popular digital audio prediction technologies. Frontiers of Data and Computing Development 2021; 3(4): 81–92.

17.

Liu

. Music trend prediction based on improved LSTM and random forest algorithm. J Sens 2022; 1: 6450469.

18.

Sams

Zahra

. Multimodal music emotion recognition in Indonesian songs based on CNN-LSTM, XLNet transformers. Bulletin EEI 2023; 12(1): 355–364.

19.

Zhang

Gong

. Quantitative influence analysis of the development scale of market economy on the level of music innovation. Discrete Dynam Nat Soc 2022; 1: 4524811.

20.

. Chord-based music generation using long short-term memory neural networks in the context of artificial intelligence. J Supercomput 2024; 80(5): 6068–6092.

21.

Srivastava

Canales

. Conditional LSTM-GAN for melody generation from lyrics. ACM Trans Multimed Comput Commun Appl 2021; 17(1): 1–20.

22.

Mirza

Gürsoy

Baykaş

, et al. Residual LSTM neural network for time dependent consecutive pitch string recognition from spectrograms: a study on Turkish classical music makams. Multimed Tool Appl 2024; 83(14): 41243–41271.

23.

Jia

. Music emotion classification method based on deep learning and improved attention mechanism. Comput Intell Neurosci 2022; 1: 5181899.

24.

Gan

. Music feature classification based on recurrent neural networks with channel attention mechanism. Mob Inf Syst 2021; 1: 7629994.

25.

. Emotional expression ability and training in vocal performance. Cultural and Art Innov·Int Acad Forum 2024; 3(1): 90–92.

26.

Jia

. Musical image creation in contemporary vocal performance. Educ Theory Res 2024; 2(17): 83–85.

27.

Chabin

Pazart

Gabriel

. Vocal melody and musical background are simultaneously processed by the brain for musical predictions. Ann N Y Acad Sci 2022; 1512(1): 126–140.

28.

Scharinger

Wagner

Knoop

, et al. Melody in poems and songs: Fundamental statistical properties predict aesthetic evaluation. Psychol Aesthet Creat Arts 2023; 17(2): 163.

29.

Gao

Zhang

. Vocal melody extraction via hrnet-based singing voice separation and encoder-decoder-based f0 estimation. Electronics 2021; 10(3): 298.

30.

Qiang

Zhengbiao

Guan

. A piano fingering generation method based on deep music score feature fusion. J Intell Syst 2023; 18(6): 1287–1294.

31.

Kang

Bai

, et al. Application of deep learning in primary school music appreciation teaching - a case study of flipped classroom. Modern Education Exploration 2024; 5(6): 77–79.

32.

Liu

Chen

Zhou

, et al. Music-driven conductor movement generation based on self-supervised audio-action synchronization learning. J Comput Sci Technol 2022; 37(3): 539–558.

33.

Wick

Puppe

. Experiments and detailed error-analysis of automatic square notation transcription of medieval music manuscripts using CNN/LSTM-networks and a neume dictionary. J N Music Res 2021; 50(1): 18–36.

34.

Jitendra

MSNV

Radhika

. An ensemble model of CNN with Bi-LSTM for automatic singer identification. Multimed Tool Appl 2023; 82(25): 38853–38874.

35.

Thao

HTP

Balamurali

Roig

, et al. Attendaffectnet–emotion prediction of movie viewers using multimodal fusion with self-attention. Sensors 2021; 21(24): 8356.

36.

Zhang

X-P

. Environmental sound classification via time–frequency attention and framewise self-attention-based deep neural networks. IEEE Internet Things J 2021; 9(5): 3416–3428.

Melody prediction of vocal performance using LSTM and attention mechanism and its application in folk music innovation

Abstract

Keywords

Get full access to this article

References