嵌入注意力机制并结合层级上下文的语音情感识别

程艳芬; 陈垚鑫; 陈逸灵; 杨益

期刊检索

关键词检索

新闻公告MORE

【03-25】投稿请提供保密审查证明
【05-04】论文版权转让协议
【07-05】出版伦理声明
【04-04】告作者书
【07-11】审稿人的职责
【11-26】《哈尔滨工业大学学报》入选中国科技期刊卓越行动计划领军期刊
【10-17】《哈工大学报》入选“第5届中国精品科技期刊”
【12-30】《哈工大学报》入选“世界学术影响力Q2期刊”
【01-03】《哈工大学报》入选“2018中国国际影响力优秀学术期刊”
【11-01】哈工大学报荣获2016、2018、2020年度“中国高校百佳科技期刊奖”
【03-24】哈工大学报10篇论文入选中国精品科技期刊顶尖学术论文
【12-05】哈工大学报2024优秀审稿专家
【12-18】哈工大学报2023优秀审稿专家
【12-24】哈工大学报2022优秀审稿专家
【12-21】哈工大学报2021优秀审稿专家
【12-10】哈工大学报2020优秀审稿专家

主管单位 中华人民共和国
工业和信息化部 主办单位 哈尔滨工业大学主编李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码

微信公众号二维码

引用本文:	程艳芬,陈垚鑫,陈逸灵,杨益.嵌入注意力机制并结合层级上下文的语音情感识别[J].哈尔滨工业大学学报,2019,51(11):100.DOI:10.11918/j.issn.0367-6234.201905193
	CHENG Yanfen,CHEN Yaoxin,CHEN Yiling,YANG Yi.Speech emotion recognition with embedded attention mechanism and hierarchical context[J].Journal of Harbin Institute of Technology,2019,51(11):100.DOI:10.11918/j.issn.0367-6234.201905193

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

过刊浏览高级检索

本文已被：浏览 1607次下载 1106次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
嵌入注意力机制并结合层级上下文的语音情感识别
程艳芬¹,陈垚鑫²,陈逸灵¹,杨益¹
(1.武汉理工大学计算机科学与技术学院,武汉 430063; 2.湖北工业大学计算机学院,武汉 430068)

摘要:

由于情感语料问题、情感与声学特征之间关联问题、语音情感识别建模问题等因素,语音情感识别一直充满挑战性.针对传统基于上下文的语音情感识别系统仅局限于特征层造成标签层上下文细节丢失以及两层级差异性被忽略的缺陷,本文提出嵌入注意力机制并结合层级上下文学习的双向长短时记忆（BLSTM）网络模型.模型分3个阶段完成语音情感识别任务,第1阶段提取情感语音特征全集后采用SVM-RFE特征排序算法降维得到最优特征子集,并对其进行注意力加权；第2阶段将加权后的特征子集输入BLSTM网络学习特征层上下文获得最初情感预测结果；第3阶段利用情感标签值对另一独立BLSTM网络训练学习标签层上下文信息并据此在第2阶段输出结果基础上完成最终预测.模型嵌入注意力机制使其自动学习调整对输入特征子集的关注度,引入标签层上下文使其联合特征层上下文实现层级上下文信息融合提高鲁棒性,提升了模型对情感语音的建模能力,在SEMAINE和RECOLA数据集上实验结果表明:与基线模型相比RMSE和CCC均得到较好改善.

关键词: 语音情感识别注意力机制上下文双向长短时记忆网络

DOI：10.11918/j.issn.0367-6234.201905193

分类号:TN912.34

文献标识码:A

基金项目:国家自然科学基金(51179146)

Speech emotion recognition with embedded attention mechanism and hierarchical context

CHENG Yanfen¹,CHEN Yaoxin²,CHEN Yiling¹,YANG Yi¹

(1.School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063, China; 2.School of Computer, Hubei University of Technology, Wuhan 430068, China)

Abstract:

A challenging task remains with regarding to speech emotion recognition due to issues such as emotional corpus problems, association between emotion and acoustic features, and speech emotion recognition modeling. Conventional context-based speech emotion recognition system risks of losing the context details of the label layer and neglecting the difference of the two-level due to solely limited to the feature layer. This paper proposed a Bidirectional Long Short-Term Memory (BLSTM) network with embedded attention mechanism combined with hierarchical context learning model. The model completed the speech emotion recognition task in three phases. The first phase extracted the feature set from the emotional speech, then used the SVM-RFE feature-sorting algorithm to reduce the feature in order to obtain the optimal feature subset and assigned attention weights. The second phase, the weighted feature subset was input into the BLSTM network learning feature layer context to obtain the initial emotional prediction result. The third phase used the emotional value to train another independent BLSTM network for learning label layer context information. According to the information, the final prediction was completed based on the output result of the second phase. The model embedded the attention mechanism to automatically learn to adjust the attention to the input feature subset, introduced the label layer context to associate the feature layer context so as to achieve the hierarchical context information fusion and improve the robustness, and improved the model's ability to model the emotional speech. The experimental results on the SEMAINE and RECOLA datasets showed that both RMSE and CCC were significantly improved than the baseline model.

Key words: speech emotion recognition attention mechanism context BLSTM

期刊检索

关键词检索

新闻公告MORE

友情链接LINKS