Please submit manuscripts in either of the following two submission systems

    ScholarOne Manuscripts

  • ScholarOne
  • 勤云稿件系统

  • 登录

Search by Issue

  • 2025 Vol.32
  • 2024 Vol.31
  • 2023 Vol.30
  • 2022 Vol.29
  • 2021 Vol.28
  • 2020 Vol.27
  • 2019 Vol.26
  • 2018 Vol.25
  • 2017 Vol.24
  • 2016 vol.23
  • 2015 vol.22
  • 2014 vol.21
  • 2013 vol.20
  • 2012 vol.19
  • 2011 vol.18
  • 2010 vol.17
  • 2009 vol.16
  • No.1
  • No.2

Supervised by Ministry of Industry and Information Technology of The People's Republic of China Sponsored by Harbin Institute of Technology Editor-in-chief Yu Zhou ISSNISSN 1005-9113 CNCN 23-1378/T

期刊网站二维码
微信公众号二维码
Related citation:Liyan Zhang,Jiaxin Du,Shuang Chen,Jiayan Li.Improved MFCC Features and TWM Model for Speech Emotion Recognition[J].Journal of Harbin Institute Of Technology(New Series),2025,32(6):38-46.DOI:10.11916/j.issn.1005-9113.24051.
【Print】   【HTML】   【PDF download】   View/Add Comment  Download reader   Close
←Previous|Next→ Back Issue    Advanced Search
This paper has been: browsed 48times   downloaded 17times 本文二维码信息
码上扫一扫!
Shared by: Wechat More
Improved MFCC Features and TWM Model for Speech Emotion Recognition
Author NameAffiliation
Liyan Zhang School of Computer and Communication Engineering, Dalian Jiaotong University, Dalian 116028, Liaoning, China 
Jiaxin Du School of Computer and Communication Engineering, Dalian Jiaotong University, Dalian 116028, Liaoning, China
School of Information EngineeringZhengzhou University of Industrial Technology,Zhengzhou 451100, Henan,China 
Shuang Chen School of Computer and Communication Engineering, Dalian Jiaotong University, Dalian 116028, Liaoning, China 
Jiayan Li School of Computer and Communication Engineering, Dalian Jiaotong University, Dalian 116028, Liaoning, China 
Abstract:
To solve the problem that traditional Mel Frequency Cepstral Coefficient (MFCC) features cannot fully represent dynamic speech features, this paper introduces first-order and second-order difference on the basis of static MFCC features to extract dynamic MFCC features, and constructs a hybrid model (TWM,TIM-NET(Temporal-aware Bi-directional Multi-scale Network) WGAN-GP(Wasserstein Generative Adversarial Network with Gradient Penalty) multi-head attention) combining multi-head attention mechanism and improved WGAN-GP on the basis of TIM-NET network. Among them, the multi-head attention mechanism not only effectively prevents gradient vanishing, but also allows for the construction of deeper networks that can capture long-range dependencies and learn from information at different time steps, improving the accuracy of the model; WGAN-GP solves the problem of insufficient sample size by improving the quality of speech sample generation. The experiment results show that this method significantly improves the accuracy and robustness of speech emotion recognition on RAVDESS and EMO-DB datasets.
Key words:  dynamic features  speech emotion recognition  multi-head attention mechanism  generative adversarial networks
DOI:10.11916/j.issn.1005-9113.24051
Clc Number:TP183
Fund:
Descriptions in Chinese:
  

改进的MFCC特征与TWM模型的语音情感识别

张丽艳1,杜佳欣1,2,陈爽1,李佳艳1
(1.大连交通大学 轨道智能工程学院,大连 116028;

2.郑州工业应用技术学院 信息工程学院,郑州 451100)

摘要:为解决传统MFCC特征无法充分表征语音动态特征的问题,本文在静态MFCC特征基础上引入一阶和二阶差分以提取动态MFCC特征,并在TIM-NET网络的基础上构建了融合多头注意力机制和改进的Wasserstein生成对抗网络(WGAN-GP)的混合模型(TWM)。其中,多头注意力机制不仅有效防止了梯度消失,还允许构建更深层的网络,能够捕获长距离的依赖关系,并从不同时间步长的信息中学习,提高了模型的准确率;WGAN-GP则通过改进语音样本的生成质量解决了样本数量不足的问题。实验结果表明,该方法在RAVDESS和EMO-DB数据集上显著提升了语音情感识别的准确性和鲁棒性。

关键词:动态特征,语音情感识别,多头注意力机制,生成对抗网络

LINKS