期刊检索

  • 2024年第56卷
  • 2023年第55卷
  • 2022年第54卷
  • 2021年第53卷
  • 2020年第52卷
  • 2019年第51卷
  • 2018年第50卷
  • 2017年第49卷
  • 2016年第48卷
  • 2015年第47卷
  • 2014年第46卷
  • 2013年第45卷
  • 2012年第44卷
  • 2011年第43卷
  • 2010年第42卷
  • 第1期
  • 第2期

主管单位 中华人民共和国
工业和信息化部
主办单位 哈尔滨工业大学 主编 李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码
微信公众号二维码
引用本文:陈逸灵,程艳芬,陈先桥,王红霞,李超.PAD三维情感空间中的语音情感识别[J].哈尔滨工业大学学报,2018,50(11):160.DOI:10.11918/j.issn.0367-6234.201806131
CHEN Yiling,CHENG Yanfen,CHEN Xianqiao,WANG Hongxia,LI Chao.Speech emotionestimation in PAD 3D emotion space[J].Journal of Harbin Institute of Technology,2018,50(11):160.DOI:10.11918/j.issn.0367-6234.201806131
【打印本页】   【HTML】   【下载PDF全文】   查看/发表评论  下载PDF阅读器  关闭
过刊浏览    高级检索
本文已被:浏览 1680次   下载 1308 本文二维码信息
码上扫一扫!
分享到: 微信 更多
PAD三维情感空间中的语音情感识别
陈逸灵,程艳芬,陈先桥,王红霞,李超
(武汉理工大学 计算机科学与技术学院, 武汉 430063)
摘要:
离散情感描述模型将人类情感标注为离散的形容词标签, 该类模型只能表示有限种类的、单一明确的情感类型, 而维度情感模型从情感的多个维度量化了复杂情感的隐含状态.另外, 常用的语音情感特征梅尔频率倒谱系数(MFCC)存在因分帧处理引起相邻帧谱特征之间相关性被忽略问题, 容易丢失很多有用信息.为此本文提出改进方法, 从语谱图中提取时间点火序列特征、点火位置信息特征对MFCC进行补充, 将这三种特征分别用于语音情感识别, 根据识别结果从PAD维度情感模型的三个维度(Pleasure-displeasure愉悦度、Arousal-nonarousal激活度、Dominance-submissiveness优势度)进行相关性分析得到特征的权重系数, 加权融合后获得情感语音的最终PAD值, 将其映射至PAD三维情感空间中.实验表明, 增加的时间点火序列、点火位置信息不但能探测说话人的情感状态, 同时考虑了相邻频谱间的互相关信息, 与MFCC特征形成互补, 在提升基本情感类型离散识别效果的基础上, 将识别结果表示为PAD三维情感空间中的坐标点, 采用量化的方法揭示情感空间中各种情感的定位与联系, 展示出情感语音中糅杂的情感内容, 为后续复杂的语音情感分类识别奠定研究基础.
关键词:  PAD三维情感模型  语音情感识别  梅尔频率倒谱系数  时间点火序列  点火位置信息  相关性分析
DOI:10.11918/j.issn.0367-6234.201806131
分类号:TN912.34
文献标识码:A
基金项目:国家自然科学基金(51179146)
Speech emotionestimation in PAD 3D emotion space
CHEN Yiling,CHENG Yanfen,CHEN Xianqiao,WANG Hongxia,LI Chao
(School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063, China)
Abstract:
The discrete emotional description model labels human emotions as discrete adjectives. The model can only represent limited types of single and explicit emotion. The dimensional emotional model quantifies the implied state of complex emotions from the multiple dimensions. In addition, conventional speech emotion feature, Mel Frequency Cepstral Coefficient (MFCC), has the problem of neglecting the correlation between the adjacent frame spectral features due to frame division processing, making it susceptible to loss of much useful information. To solve this problem, this paper proposes an improved method, which extracts the time firing series feature and the firing position information feature from the spectrogram to supplement the MFCC, and applies them in speech emotion estimation respectively. Based on the predicted values, the proposed method calculates the correlation coefficients of each feature from three dimensions, P (Pleasure-displeasure), A (Arousal-nonarousal), and D (Dominance-submissiveness), as feature weights and obtains the final values of PAD in emotion speech after the weighted fusion, and finally maps it to PAD 3D emotion space. The experiments showed that the two added features could not only detect the emotional state of the speaker, but also consider the correlation between the adjacent frame spectral features, complementing to MFCC features. On the basis of improving the effect of discrete estimation of basic emotional types, this method represents the estimation results as coordinate points in PAD 3D emotion space, adopts the quantitative method to reveal the position and connection of various emotions in the emotion space, and indicates the emotion content mixed in the emotion speech. This study lays a foundation for subsequent research on classification estimation of complex speech emotions.
Key words:  PAD 3D emotion model  speech emotion estimation  Mel Frequency Cepstral Coefficient  time firing series  firing position information  correlation analysis

友情链接LINKS