PAD三维情感空间中的语音情感识别

陈逸灵; 程艳芬; 陈先桥; 王红霞; 李超

期刊检索

关键词检索

新闻公告MORE

【03-25】投稿请提供保密审查证明
【05-04】论文版权转让协议
【07-05】出版伦理声明
【04-04】告作者书
【07-11】审稿人的职责
【11-26】《哈尔滨工业大学学报》入选中国科技期刊卓越行动计划领军期刊
【10-17】《哈工大学报》入选“第5届中国精品科技期刊”
【12-30】《哈工大学报》入选“世界学术影响力Q2期刊”
【01-03】《哈工大学报》入选“2018中国国际影响力优秀学术期刊”
【11-01】哈工大学报荣获2016、2018、2020年度“中国高校百佳科技期刊奖”
【03-24】哈工大学报10篇论文入选中国精品科技期刊顶尖学术论文
【12-05】哈工大学报2024优秀审稿专家
【12-18】哈工大学报2023优秀审稿专家
【12-24】哈工大学报2022优秀审稿专家
【12-21】哈工大学报2021优秀审稿专家
【12-10】哈工大学报2020优秀审稿专家

主管单位 中华人民共和国
工业和信息化部 主办单位 哈尔滨工业大学主编李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码

微信公众号二维码

引用本文:	陈逸灵,程艳芬,陈先桥,王红霞,李超.PAD三维情感空间中的语音情感识别[J].哈尔滨工业大学学报,2018,50(11):160.DOI:10.11918/j.issn.0367-6234.201806131
	CHEN Yiling,CHENG Yanfen,CHEN Xianqiao,WANG Hongxia,LI Chao.Speech emotionestimation in PAD 3D emotion space[J].Journal of Harbin Institute of Technology,2018,50(11):160.DOI:10.11918/j.issn.0367-6234.201806131

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

过刊浏览高级检索

本文已被：浏览 1822次下载 1414次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
PAD三维情感空间中的语音情感识别
陈逸灵,程艳芬,陈先桥,王红霞,李超
(武汉理工大学计算机科学与技术学院, 武汉 430063)

摘要:

离散情感描述模型将人类情感标注为离散的形容词标签, 该类模型只能表示有限种类的、单一明确的情感类型, 而维度情感模型从情感的多个维度量化了复杂情感的隐含状态.另外, 常用的语音情感特征梅尔频率倒谱系数(MFCC)存在因分帧处理引起相邻帧谱特征之间相关性被忽略问题, 容易丢失很多有用信息.为此本文提出改进方法, 从语谱图中提取时间点火序列特征、点火位置信息特征对MFCC进行补充, 将这三种特征分别用于语音情感识别, 根据识别结果从PAD维度情感模型的三个维度(Pleasure-displeasure愉悦度、Arousal-nonarousal激活度、Dominance-submissiveness优势度)进行相关性分析得到特征的权重系数, 加权融合后获得情感语音的最终PAD值, 将其映射至PAD三维情感空间中.实验表明, 增加的时间点火序列、点火位置信息不但能探测说话人的情感状态, 同时考虑了相邻频谱间的互相关信息, 与MFCC特征形成互补, 在提升基本情感类型离散识别效果的基础上, 将识别结果表示为PAD三维情感空间中的坐标点, 采用量化的方法揭示情感空间中各种情感的定位与联系, 展示出情感语音中糅杂的情感内容, 为后续复杂的语音情感分类识别奠定研究基础.

关键词: PAD三维情感模型语音情感识别梅尔频率倒谱系数时间点火序列点火位置信息相关性分析

DOI：10.11918/j.issn.0367-6234.201806131

分类号:TN912.34

文献标识码:A

基金项目:国家自然科学基金(51179146)

Speech emotionestimation in PAD 3D emotion space

CHEN Yiling,CHENG Yanfen,CHEN Xianqiao,WANG Hongxia,LI Chao

(School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063, China)

Abstract:

The discrete emotional description model labels human emotions as discrete adjectives. The model can only represent limited types of single and explicit emotion. The dimensional emotional model quantifies the implied state of complex emotions from the multiple dimensions. In addition, conventional speech emotion feature, Mel Frequency Cepstral Coefficient (MFCC), has the problem of neglecting the correlation between the adjacent frame spectral features due to frame division processing, making it susceptible to loss of much useful information. To solve this problem, this paper proposes an improved method, which extracts the time firing series feature and the firing position information feature from the spectrogram to supplement the MFCC, and applies them in speech emotion estimation respectively. Based on the predicted values, the proposed method calculates the correlation coefficients of each feature from three dimensions, P (Pleasure-displeasure), A (Arousal-nonarousal), and D (Dominance-submissiveness), as feature weights and obtains the final values of PAD in emotion speech after the weighted fusion, and finally maps it to PAD 3D emotion space. The experiments showed that the two added features could not only detect the emotional state of the speaker, but also consider the correlation between the adjacent frame spectral features, complementing to MFCC features. On the basis of improving the effect of discrete estimation of basic emotional types, this method represents the estimation results as coordinate points in PAD 3D emotion space, adopts the quantitative method to reveal the position and connection of various emotions in the emotion space, and indicates the emotion content mixed in the emotion speech. This study lays a foundation for subsequent research on classification estimation of complex speech emotions.

Key words: PAD 3D emotion model speech emotion estimation Mel Frequency Cepstral Coefficient time firing series firing position information correlation analysis

期刊检索

关键词检索

新闻公告MORE

友情链接LINKS