Sage Journals: Discover world-class research

Abstract

The significance of instrument recognition based on audio signals lies in its broad potential across diverse fields such as music education, intelligent technology, and even medical diagnosis. In this study, we focus on tackling the challenges posed by the OpenMIC-2018 (Open Museum Identification Challenge-2018) dataset. Leveraging the powerful nonlinear modeling capabilities of convolutional neural networks (CNNs), the research aims to address common interference factors in instrument recognition, such as background noise, pitch variations, and volume fluctuations during performances. By effectively mitigating these challenges, this approach enhances the accuracy and precision of instrument recognition, offering valuable contributions to both academic research and practical applications. Firstly, FFT (fast Fourier transform) is used to extract MFCC (Mel-frequency cepstral coefficient), chroma, time-domain, and spectral energy from each audio frame. Initial CNN models with 10 different hyperparameters are trained, and the prediction results are merged with the original features. Then, one-dimensional convolutional layers are used for spectral feature convolution operations, and pooling layers are used for downsampling. ReLU (rectified linear unit) is used as the activation function and nonlinearity is applied. Dropout layers are added between convolutional layers and a portion of neurons are randomly set to zero. Next, the learning rate is adjusted based on cross-validation. For multi-label classification tasks, a binary cross-entropy loss is used and parameters are updated through backpropagation algorithm. Finally, a secondary CNN model is used. The merged features are input and the prediction results are obtained. The research results indicate that the innovative method used achieves an average accuracy of 0.92 on OpenMIC-2018, with an accuracy of 0.97 for accordion, flute, and organ, realizing relatively precise instrument recognition.

Keywords

instrument recognition audio signal convolutional neural network fast Fourier transform Mel-frequency cepstral coefficient

Get full access to this article

View all access options for this article.

References

Hendrik

Tuomas

, et al. Deep learning for audio signal processing. IEEE Journal of Selected Topics in Signal Processing 2019; 13(2): 206–219.

Siphocly

NNJ

El-Horbaty El-Sayed

Salem Abdel-Badeeh

. Top 10 artificial intelligence algorithms in computer music composition. International Journal of Computing and Digital Systems 2021; 10(01): 373–394.

Tang

Francisco

, et al. Deep cross-modal correlation learning for audio and lyrics in music retrieval. ACM Trans Multimed Comput Commun Appl 2019; 15(1): 1–16.

Gabriel

S-L

Juan-Carlos

G-H

Roberto

R-R

. Automatic Parkinson disease detection at early stages as a pre-diagnosis tool by using classifiers and a small set of vocal features. Biocybern Biomed Eng 2020; 40(1): 505–516.

Tan

Yong

Shi-Xiong

, et al. Audio-visual speech separation and dereverberation with a two-stage multimodal network. IEEE J Sel Top Signal Process 2020; 14(3): 542–553.

Fen

Liu

Yuejin

, et al. Feature extraction and classification of heart sound using 1D convolutional neural networks. EURASIP J Appl Signal Process 2019; 2019(1): 1–11.

Qinghe

Penghui

Yang

, et al. Spectrum interference-based two-level data augmentation method in deep learning for automatic modulation classification. Neural Comput Appl 2021; 33(13): 7723–7745.

Jessica

R-T

Cordova

EDM

. Analysis of speech separation methods based on deep learning. Res Comput Sci 2019; 148(9): 21–29.

Birajdar Gajanan

Patil Mukesh

. Speech/music classification using visual and spectral chromagram features. J Ambient Intell Humaniz Comput 2020; 11(1): 329–347.

10.

Xin

Zhang

. Music genre classification based on auditory image, spectral and acoustic features. Multimed Syst 2022; 28(3): 779–791.

11.

Anusha

Valiveti

Kumar

. Feature extraction algorithms to improve the speech emotion recognition rate. Int J Speech Technol 2020; 23(1): 45–55.

12.

Renato

Ricardo

Pedro

. Audio features for music emotion recognition: a survey. IEEE Trans Affect Comput 2020; 14(1): 68–88.

13.

Arun

Pandey

. Music instrument recognition using deep convolutional neural networks. Int J Inf Technol 2022; 14(3): 1659–1668.

14.

Prabavathy

Rathikarani

Dhanalakshmi

. Classification of musical instruments using SVM and KNN. Int J Innovative Technol Explor Eng 2020; 9(7): 1186–1190.

15.

Saranga Kingkor

Rahman

KAFU

Partha

. Deep neural network for musical instrument recognition using mfccs. Comput Sist 2021; 25(2): 351–360.

16.

Snigdha

Kavitha

Neginhal

, et al. Music genre classification using machine learning algorithms: a comparison. Int Res J Eng Technol 2019; 6(5): 851–858.

17.

Remi

Geoffroy

. An analysis of the effect of data augmentation methods: experiments for a musical genre classification task. Trans Int Soc Music Inf Retr 2019; 2(1): 97–110.

18.

Makarand

Amod

Parag

. Melodic pattern recognition in Indian classical music for Raga identification. Int J Inf Technol 2021; 13(1): 251–258.

19.

Rongfeng

Qin

. Audio recognition of Chinese traditional instruments based on machine learning. Cognitive Computation and Systems 2022; 4(2): 108–115.

20.

Kai

. Specifying the perceptual relevance of onset transients for musical instrument identification. J Acoust Soc Am 2019; 145(2): 1078–1087.

21.

Chai Xiu

Lee Choo

Lai

. Classifying musical instrument using spatiotemporal features with deep neural networks. Sensors & Transducers 2023; 263(4): 119–130.

22.

Sangeetha

Nalini

. Recognition of musical instrument using deep learning techniques. Int J Inf Retr Res (IJIRR) 2021; 11(4): 41–60.

23.

Tang

Chen

. Combining CNN and broad learning for music classification. IEICE Trans Info Syst 2020; 103(3): 695–701.

24.

Prabavathy

Rathikarani

Dhanalakshmi

. Classification of musical instruments sound using Pre-trained model with machine learning techniques. Asian Journal of Electrical Sciences 2020; 9(1): 45–48.

25.

Serhat

Serdar

Zekeriya

. Music emotion recognition using convolutional long short term memory deep neural networks. Eng Sci Technol Int J 2021; 24(3): 760–767.

26.

Jiangyan

Lin

Ashutosh

. An automatic instrument recognition approach based on deep convolutional neural network. Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) 2021; 14(6): 660–670.

27.

Zhang

. Extraction and recognition of music melody features using a deep neural network. J vibroeng 2023; 25(4): 769–777.

28.

Lekshmi

Rajan

. Multiple predominant instruments recognition in polyphonic music using Spectro/Modgd-gram fusion. Circ Syst Signal Process 2023; 42(6): 3464–3484.

29.

Rajib

Choudhury

Dutta

, et al. Recognition of emotion in music based on deep convolutional neural network. Multimed Tool Appl 2020; 79(1): 765–783.

30.

Kim

Lee

Nam

. Comparison and analysis of SampleCNN architectures for audio classification. IEEE J Sel Top Signal Process 2019; 13(2): 285–297.

31.

Yagya Raj

Joonwhoan

. Deep learning-based late fusion of multimodal information for emotion classification of music video. Multimed Tool Appl 2021; 80(2): 2887–2905.

32.

Remzi

GUI

. Classification of instrument sounds with image classification algorithms. International Journal of 3D Printing Technologies and Digital Industry 2023; 7(3): 513–519.

33.

Joshi

Pareek

Ambatkar

. Comparative study of Mfcc and Mel spectrogram for Raga classification using CNN. Indian J Sci Technol 2023; 16(11): 816–822.

34.

Florian

Khaled

Gerhard

. Dynamic convolutional neural networks as efficient pre-trained audio models. In: IEEE/ACM transactions on audio, speech, and language processing, 24 October, 2023. DOI: 10.1109/TASLP.2024.3376984.

35.

Humphrey

Simon

Brian

. OpenMIC-2018: an open data-set for multiple instrument recognition. In: ISMIR, 1 september, 2018, pp. 438–444. DOI: 10.5281/zenodo.1432912.

Implementation of instrument recognition based on audio signals using convolutional neural networks

Abstract

Keywords

Get full access to this article

References