Sage Journals: Discover world-class research

Abstract

Existing music sequence generation methods often struggle with long-range dependencies, leading to gradient vanishing or exploding, which compromises their ability to capture intricate musical structures. This study integrates the Transformer model, leveraging its self-attention mechanism and position encoding to incorporate global information at each time step. This approach enables the flexible modeling of long-range dependencies, resulting in more natural and harmonically coherent music sequences. First, serialized encoding is employed to transform musical events. A multi-scale attention mechanism is then introduced, applying different attention strategies at each layer to capture distinct musical elements across various time scales. Additionally, periodic position encoding is incorporated to enhance the recognition of recurrent musical patterns. Experimental results demonstrate that compared to state-of-the-art models such as MuseNet, MelodyRNN, and MusicTransformer, the proposed method reduces average perplexity by 5.4%, 11.7%, and 9.5% on one dataset and 5.9%, 12.1%, and 10.3% on another. These findings highlight the model’s superior capability in generating high-quality music with enhanced melodic consistency and diversity. By outperforming existing mainstream approaches across multiple evaluation metrics, this research advances the field of music sequence generation, offering new insights for AI-driven music composition and arrangement.

Keywords

Music sequence generation transformer model multi-scale attention mechanism periodic position encoding long-range dependency

Get full access to this article

View all access options for this article.

References

Yin

Reuben

Stepney

. Deep learning’s shallow gains: a comparative evaluation of algorithms for automatic music generation. Mach Learn 2023; 112(5): 1785–1822.

Ferreira

Limongi

Favero

. Generating music with data: application of deep learning models for symbolic music composition. Appl Sci 2023; 13(7): 4543–4561.

Mediakov

Vysotska

Uhryn

, et al. Information technology for generating lyrics for song extensions based on transformers. Int J Mod Educ Comput Sci 2024; 16(1): 23–36.

Yang

. Double linear transformer for background music generation from videos. Appl Sci 2022; 12(10): 5050–5062.

Wang

Xing

YL& X

, et al. Style-conditioned music generation with Transformer-GANs. Front Inform Technol Electron Eng 2024; 25(1): 106–120.

Yang

Luo

. A survey on deep learning for symbolic music generation: representations, algorithms, evaluations, and challenges. ACM Comput Surv 2023; 56(1): 1–39.

Min

Liu

Lei

, et al. Music generation system for adversarial training based on deep learning. Processes 2022; 10(12): 2515–2529.

Ding

Cui

. MuseFlow: music accompaniment generation based on flow. Appl Intell 2023; 53(20): 23029–23038.

Zhu

Yuan

, et al. Pop music generation: from melody to multi-style arrangement. ACM Trans Knowl Discov Data 2020; 14(5): 1–31.

10.

Liu

. Literature survey of multi-track music generation model based on generative confrontation network in intelligent composition. J Supercomput 2023; 79(6): 6560–6582.

11.

. Chord-based music generation using long short-term memory neural networks in the context of artificial intelligence. J Supercomput 2024; 80(5): 6068–6092.

12.

Lim

Y-Q

Chan

Loo

. ClaviNet: generate music with different musical styles. IEEE MultiMedia 2020; 28(1): 83–93.

13.

Hsu

J-L

Chang

S-J

. Generating music transition by using a transformer-based model. Electronics 2021; 10(18): 2276–2293.

14.

Agarwal

Sultanova

. Music generation through transformers. ijdsaa 2024; 6(6): 302–306.

15.

. Enhancing music generation with a semantic-based sequence-to-music transformer framework. Int J Semantic Web Inf Syst 2024; 20(1): 1–19.

16.

Qin

Xie

Ding

, et al. Bar transformer: a hierarchical model for learning long-term structure and generating impressive pop music. Appl Intell 2023; 53(9): 10130–10148.

17.

Cong

Wang

Liu

, et al. A transformer-based model for multi-track music generation. Int J Multimed Data Eng Manag 2020; 11(3): 36–54.

18.

Zhang

, et al. Global digital compact: a mechanism for the governance of online discriminatiory and misleading CONTENT generation. Int. J. Hum.-Comput. Interact 2024; 2(3): 1–28.

19.

Jinshan

, et al. TARREAN: a novel transformer with a gate recurrent unit for stylized music generation. Sensors 2025; 25(2): 386–400.

20.

Lisena

Merono-Penuela

Troncy

. MIDI2vec: learning MIDI embeddings for reliable prediction of symbolic music metadata. Semant Web 2022; 13(3): 357–377.

21.

Han

Shen

, et al. Dance2MIDI: dance-driven multi-instrument music generation. Comput Vis Media 2024; 10(4): 791–802.

22.

Haugsdal

Aune

Ruocco

. Persistence initialization: a novel adaptation of the transformer architecture for time series forecasting. Appl Intell 2023; 53(22): 26781–26796.

23.

Chen

Wang

Liu

, et al. Resformer: combine quadratic linear transformation with efficient sparse Transformer for long-term series forecasting. Intell Data Anal 2023; 27(6): 1557–1572.

24.

Ahmed

Nielsen

Tripathi

, et al. Transformers in time-series analysis: a tutorial. Circ Syst Signal Process 2023; 42.12: 7433–7466.

25.

Lee

Hong

Liu

, et al. TS-fastformer: fast transformer for time-series forecasting. ACM Trans Intell Syst Technol 2024; 15(2): 1–20.

26.

Wang

Jinlong

. Automatic composition based on collaborative generation of multiple Transformer networks. Information Technology & Network Security/Xinxi Jishu yu Wangluo Anquan 2022; 41(5): 51–58.

27.

Wang

Jin

Xiaobing

, et al. Transformer-based multi-track music generative adversarial network. J Comput Appl 2021; 41(12): 3585–3589.

28.

Zheng

Cao

. A comparative study of LSTM and transformer models in music melody generation. Journal of Global Arts Studies (JGAS) 2023; 1(1): 1–10.

29.

Keerti

Vaishnavi

Mukherjee

, et al. Attentional networks for music generation. Multimed Tools Appl 2022; 81: 5179–5189.

30.

Sung

. MelodyDiffusion: chord-conditioned melody generation using a transformer-based diffusion model. Mathematics 2023; 11(8): 1915–1929.

31.

Foumani

Tan

Webb

, et al. mproving position encoding of transformers for multivariate time series classification. Data Min Knowl Discov 2024; 38(1): 22–48.

32.

Wang

Gan

Cao

, et al. MFANet: multi-scale feature fusion network with attention mechanism. Vis Comput 2023; 39(7): 2969–2980.

33.

Pal

Saha

Anita

. Musenet: music generation using abstractive and generative methods. Int J Innovative Technol Explor Eng 2020; 9(6): 784–788.

34.

Sun

. Study of generation based on recurrent neural networks. International Core Journal of Engineering 2023; 9(10): 163–168.

35.

Zhang

, et al. An automatic music generation method based on RSCLN_Transformer network. Multimed Syst 2024; 30(1): 1–13.

Music sequence generation and arrangement based on transformer model

Abstract

Keywords

Get full access to this article

References