摘要
目前语音情感识别存在语音样本不足、提取的特征数据量大和无关特征多使得识别率不高的问题。针对语音样本不足的情况,在预处理阶段提出了时频域的数据增强方法,对原有的数据库进行扩充;根据传统算法中提取的特征数据量大且与情感无关的特征多的现状,提取了1582维的情感特征和10组低级描述特征。分别在支持向量机、随机森林和K最邻近3种机器学习算法上做了对比实验。实验证明:支持向量机的平均识别率比较好。在所提取的10组特征组中,LogMelFreqBand特征在3种算法上的精确度分别为74.63%、64.93%和66.42%;而pcm_fftMag_mfcc特征的精确度分别为84.33%、73.13%和58.21%。
Currently,problems in speech emotion recognition,such as insufficient speech samples and numerous extracted and irrelevant features,make the recognition rate low.To solve the problem of insufficient speech samples,a timefrequency domain data enhancement method is proposed in the preprocessing stage to expand the original database.Considering the current situation where traditional algorithms extract a large amount of feature data and many are emotion-independent,1582-dimensional emotion features and 10 groups of low-level description features were extracted.Finally,a comparative experiment was performed on three machine learning algorithms:the support vector machine,random forest,and K-nearest neighbor.Experiments showed that the average recognition rate of the support vector machine was superior.Among the ten sets of features,the accuracy of LogMelFreqBand in the three algorithms was74.63%,64.93%,and 66.42%,respectively,and the accuracy of pcm_fftMag_mfcc was 84.33%,73.13%,and 58.21%,respectively.
作者
李茜茜
沈晓燕
任福继
康鑫
LI Qianqian;SHEN Xiaoyan;REN Fuji;KANG Xin(Institute of Information Science and Technology,Nantong University,Nantong 226019,China;Department of Intelligent Information Engineering,Tokushima University,Tokushima 7708501,Japan)
出处
《智能系统学报》
CSCD
北大核心
2021年第1期170-177,共8页
CAAI Transactions on Intelligent Systems
基金
国家自然科学基金项目(61534003,81371663)
德岛大学研究集群项目(2003002)。
关键词
语音情感识别
数据增强
情感特征
支持向量机
随机森林
K最邻近
低级描述特征
机器学习
speech emotion recognition
data enhancement
emotion feature
support vector machine
random forest
Knearest neighbor
low-level description features
machine learning