摘要
语音情感识别领域提取情感特征时,普遍采用"不同情感类别,相同时长基准"的做法,忽略了人耳敏感的韵律段长会依情感不同而有所差异的现象。本文首先通过情感识别实验确定各类情感的最佳识别段长,作为人耳敏感韵律段长。并构造了基于韵律段特征的多重Elman网络模型,以期对不同情感基于特定敏感韵律段长进行识别和对多分类器识别结果进行有效融合,实现了对人耳情感辨识规律的模拟。结果表明,使用敏感韵律段特征的系统识别率达到67.9%,与使用定长语段特征相比有了很大的提高。
In the field of speech emotion recognition,the emotion features of different emotional utterances are commonly extracted at the same segment length level.This ignores the variation of the human ear's sensitive prosodic segment length for different emotions.In the present system the best segment length for emotion recognition of each emotion was first obtained through experiments.A multi-network model named the prosodic segment level Elman network was then proposed to identify emotions using certain sensitive prosodic segment level features and then to combine the recognition results of each sub-network.Tests show that the recognition rate of sensitive prosodic segment level features is 67.9%,much higher than the rate obtained by fixed-length segment level features.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2009年第S1期1363-1368,共6页
Journal of Tsinghua University(Science and Technology)
基金
高等学校博士学科点专项科研基金(20050213032)
国家自然科学基金资助项目(60772076)
国家"八六三"高技术项目(2006AA01Z197)
关键词
情感特征
敏感韵律段长
语段特征
ELMAN神经网络
语音情感识别
emotion features
sensitive prosodic segment length
segment level features
Elman network
speech emotion recognition