Abstract
Existing music sequence generation methods often struggle with long-range dependencies, leading to gradient vanishing or exploding, which compromises their ability to capture intricate musical structures. This study integrates the Transformer model, leveraging its self-attention mechanism and position encoding to incorporate global information at each time step. This approach enables the flexible modeling of long-range dependencies, resulting in more natural and harmonically coherent music sequences. First, serialized encoding is employed to transform musical events. A multi-scale attention mechanism is then introduced, applying different attention strategies at each layer to capture distinct musical elements across various time scales. Additionally, periodic position encoding is incorporated to enhance the recognition of recurrent musical patterns. Experimental results demonstrate that compared to state-of-the-art models such as MuseNet, MelodyRNN, and MusicTransformer, the proposed method reduces average perplexity by 5.4%, 11.7%, and 9.5% on one dataset and 5.9%, 12.1%, and 10.3% on another. These findings highlight the model’s superior capability in generating high-quality music with enhanced melodic consistency and diversity. By outperforming existing mainstream approaches across multiple evaluation metrics, this research advances the field of music sequence generation, offering new insights for AI-driven music composition and arrangement.
Keywords
Get full access to this article
View all access options for this article.
