Abstract:When the existing spectrogram separation methods are used for acoustic scene classification research, the classification accuracy of these methods is not high. To solve the problem, an acoustic scene classification method based on Mel-spectrogram separation and long-distance self-calibration convolutional neural network (LSCNet) was proposed. Firstly, the working principles of spectrogram harmonic/percussive-source separation were presented. A Mel-spectrogram separation algorithm was proposed, which can separate the Mel-spectrogram into harmonic components, percussive source components, and residual components. Then, LSCNet was designed combining self-calibration convolutional network and residual enhancement mechanism. The model adopts frequency domain self-correction algorithm and long-distance enhancement mechanism to retain the original information of the feature map, strengthens the correlation between deep and shallow features through residual enhancement mechanism and channel attention enhancement mechanism, and combines multi-scale feature fusion module to further extract the effective information of the output layer in model training. Finally, acoustic scene classification experiments were conducted on Urbansound8K and ESC-50 datasets. Experimental results show that the Mel-spectrogram residual components (MSRC) could specifically reduce the influence of background noise, thereby indicating a better classification performance. The LSCNet could realize the attention to the frequency domain information in the feature map, and its best classification accuracy reached 90.1% and 88% respectively, which verified the effectiveness of the proposed method.