Abstract
The significance of instrument recognition based on audio signals lies in its broad potential across diverse fields such as music education, intelligent technology, and even medical diagnosis. In this study, we focus on tackling the challenges posed by the OpenMIC-2018 (Open Museum Identification Challenge-2018) dataset. Leveraging the powerful nonlinear modeling capabilities of convolutional neural networks (CNNs), the research aims to address common interference factors in instrument recognition, such as background noise, pitch variations, and volume fluctuations during performances. By effectively mitigating these challenges, this approach enhances the accuracy and precision of instrument recognition, offering valuable contributions to both academic research and practical applications. Firstly, FFT (fast Fourier transform) is used to extract MFCC (Mel-frequency cepstral coefficient), chroma, time-domain, and spectral energy from each audio frame. Initial CNN models with 10 different hyperparameters are trained, and the prediction results are merged with the original features. Then, one-dimensional convolutional layers are used for spectral feature convolution operations, and pooling layers are used for downsampling. ReLU (rectified linear unit) is used as the activation function and nonlinearity is applied. Dropout layers are added between convolutional layers and a portion of neurons are randomly set to zero. Next, the learning rate is adjusted based on cross-validation. For multi-label classification tasks, a binary cross-entropy loss is used and parameters are updated through backpropagation algorithm. Finally, a secondary CNN model is used. The merged features are input and the prediction results are obtained. The research results indicate that the innovative method used achieves an average accuracy of 0.92 on OpenMIC-2018, with an accuracy of 0.97 for accordion, flute, and organ, realizing relatively precise instrument recognition.
Keywords
Get full access to this article
View all access options for this article.
