Acoustic scene classification method based on Mel-spectrogram separation and LSCNet

doi:10.11918/202104081

Home > Archive>Volume 54, Issue 5, 2022 >124-130. DOI:10.11918/202104081

Acoustic scene classification method based on Mel-spectrogram separation and LSCNet
DOI:
                        10.11918/202104081
                    
CSTR:
                        
Author:
                        
Affiliation:(1.School of Mechanical Engineering, Jiangnan University, Wuxi 214122, Jiangsu, China;2.Jiangsu Key Laboratory of Advanced Food Manufacturing Equipment and Technology (Jiangnan University), Wuxi 214122, Jiangsu, China)
Clc Number:TP391.42
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

When the existing spectrogram separation methods are used for acoustic scene classification research, the classification accuracy of these methods is not high. To solve the problem, an acoustic scene classification method based on Mel-spectrogram separation and long-distance self-calibration convolutional neural network (LSCNet) was proposed. Firstly, the working principles of spectrogram harmonic/percussive-source separation were presented. A Mel-spectrogram separation algorithm was proposed, which can separate the Mel-spectrogram into harmonic components, percussive source components, and residual components. Then, LSCNet was designed combining self-calibration convolutional network and residual enhancement mechanism. The model adopts frequency domain self-correction algorithm and long-distance enhancement mechanism to retain the original information of the feature map, strengthens the correlation between deep and shallow features through residual enhancement mechanism and channel attention enhancement mechanism, and combines multi-scale feature fusion module to further extract the effective information of the output layer in model training. Finally, acoustic scene classification experiments were conducted on Urbansound8K and ESC-50 datasets. Experimental results show that the Mel-spectrogram residual components (MSRC) could specifically reduce the influence of background noise, thereby indicating a better classification performance. The LSCNet could realize the attention to the frequency domain information in the feature map, and its best classification accuracy reached 90.1% and 88% respectively, which verified the effectiveness of the proposed method.

Reference

Cited by

Get Citation

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:April 19,2021
Revised:
Adopted:
Online: April 25,2022
Published:

Publication Statement

Journal Subscription

Get Citation

Related Videos

Share

Article Metrics

History

Article QR Code