Speech emotionestimation in PAD 3D emotion space

doi:10.11918/j.issn.0367-6234.201806131

Home > Archive>Volume 50, Issue 11, 2018 >160-166. DOI:10.11918/j.issn.0367-6234.201806131

Speech emotionestimation in PAD 3D emotion space
DOI:
                        10.11918/j.issn.0367-6234.201806131
                    
CSTR:
                        
Author:
                        
Affiliation:(School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063, China)
Clc Number:TN912.34
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

The discrete emotional description model labels human emotions as discrete adjectives. The model can only represent limited types of single and explicit emotion. The dimensional emotional model quantifies the implied state of complex emotions from the multiple dimensions. In addition, conventional speech emotion feature, Mel Frequency Cepstral Coefficient (MFCC), has the problem of neglecting the correlation between the adjacent frame spectral features due to frame division processing, making it susceptible to loss of much useful information. To solve this problem, this paper proposes an improved method, which extracts the time firing series feature and the firing position information feature from the spectrogram to supplement the MFCC, and applies them in speech emotion estimation respectively. Based on the predicted values, the proposed method calculates the correlation coefficients of each feature from three dimensions, P (Pleasure-displeasure), A (Arousal-nonarousal), and D (Dominance-submissiveness), as feature weights and obtains the final values of PAD in emotion speech after the weighted fusion, and finally maps it to PAD 3D emotion space. The experiments showed that the two added features could not only detect the emotional state of the speaker, but also consider the correlation between the adjacent frame spectral features, complementing to MFCC features. On the basis of improving the effect of discrete estimation of basic emotional types, this method represents the estimation results as coordinate points in PAD 3D emotion space, adopts the quantitative method to reveal the position and connection of various emotions in the emotion space, and indicates the emotion content mixed in the emotion speech. This study lays a foundation for subsequent research on classification estimation of complex speech emotions.

Reference

Cited by

Get Citation

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:June 21,2018
Revised:
Adopted:
Online: October 17,2018
Published:

Publication Statement

Journal Subscription

Get Citation

Related Videos

Share

Article Metrics

History

Article QR Code