引用本文: | 陈逸灵,程艳芬,陈先桥,王红霞,李超.PAD三维情感空间中的语音情感识别[J].哈尔滨工业大学学报,2018,50(11):160.DOI:10.11918/j.issn.0367-6234.201806131 |
| CHEN Yiling,CHENG Yanfen,CHEN Xianqiao,WANG Hongxia,LI Chao.Speech emotionestimation in PAD 3D emotion space[J].Journal of Harbin Institute of Technology,2018,50(11):160.DOI:10.11918/j.issn.0367-6234.201806131 |
|
摘要: |
离散情感描述模型将人类情感标注为离散的形容词标签, 该类模型只能表示有限种类的、单一明确的情感类型, 而维度情感模型从情感的多个维度量化了复杂情感的隐含状态.另外, 常用的语音情感特征梅尔频率倒谱系数(MFCC)存在因分帧处理引起相邻帧谱特征之间相关性被忽略问题, 容易丢失很多有用信息.为此本文提出改进方法, 从语谱图中提取时间点火序列特征、点火位置信息特征对MFCC进行补充, 将这三种特征分别用于语音情感识别, 根据识别结果从PAD维度情感模型的三个维度(Pleasure-displeasure愉悦度、Arousal-nonarousal激活度、Dominance-submissiveness优势度)进行相关性分析得到特征的权重系数, 加权融合后获得情感语音的最终PAD值, 将其映射至PAD三维情感空间中.实验表明, 增加的时间点火序列、点火位置信息不但能探测说话人的情感状态, 同时考虑了相邻频谱间的互相关信息, 与MFCC特征形成互补, 在提升基本情感类型离散识别效果的基础上, 将识别结果表示为PAD三维情感空间中的坐标点, 采用量化的方法揭示情感空间中各种情感的定位与联系, 展示出情感语音中糅杂的情感内容, 为后续复杂的语音情感分类识别奠定研究基础.
|
关键词: PAD三维情感模型 语音情感识别 梅尔频率倒谱系数 时间点火序列 点火位置信息 相关性分析 |
DOI:10.11918/j.issn.0367-6234.201806131 |
分类号:TN912.34 |
文献标识码:A |
基金项目:国家自然科学基金(51179146) |
|
Speech emotionestimation in PAD 3D emotion space |
CHEN Yiling,CHENG Yanfen,CHEN Xianqiao,WANG Hongxia,LI Chao
|
(School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063, China)
|
Abstract: |
The discrete emotional description model labels human emotions as discrete adjectives. The model can only represent limited types of single and explicit emotion. The dimensional emotional model quantifies the implied state of complex emotions from the multiple dimensions. In addition, conventional speech emotion feature, Mel Frequency Cepstral Coefficient (MFCC), has the problem of neglecting the correlation between the adjacent frame spectral features due to frame division processing, making it susceptible to loss of much useful information. To solve this problem, this paper proposes an improved method, which extracts the time firing series feature and the firing position information feature from the spectrogram to supplement the MFCC, and applies them in speech emotion estimation respectively. Based on the predicted values, the proposed method calculates the correlation coefficients of each feature from three dimensions, P (Pleasure-displeasure), A (Arousal-nonarousal), and D (Dominance-submissiveness), as feature weights and obtains the final values of PAD in emotion speech after the weighted fusion, and finally maps it to PAD 3D emotion space. The experiments showed that the two added features could not only detect the emotional state of the speaker, but also consider the correlation between the adjacent frame spectral features, complementing to MFCC features. On the basis of improving the effect of discrete estimation of basic emotional types, this method represents the estimation results as coordinate points in PAD 3D emotion space, adopts the quantitative method to reveal the position and connection of various emotions in the emotion space, and indicates the emotion content mixed in the emotion speech. This study lays a foundation for subsequent research on classification estimation of complex speech emotions.
|
Key words: PAD 3D emotion model speech emotion estimation Mel Frequency Cepstral Coefficient time firing series firing position information correlation analysis |