摘要
为了构建高效的语音情感识别模型,提出一种利用浅层学习和深度学习优势的决策融合方法。浅层学习为传统的语音情感识别方法,即人工统计特征提取及识别;深度学习采用PCANET网络实现特征提取过程,将携带情感信息的语谱图作为网络输入。将浅层学习特征和深度学习特征分别输入到SVM模型进行分类,并采用差异性投票机制实现决策层融合。实验结果表明,该方法的识别率在自己录制的库和柏林数据库上取得明显提高,与代表性的方法相比优势明显。
In order to construct an efficient speech emotion recognition model,a decision fusion method that combines the advantages of shallow learning and deep learning is proposed.The shallow learning feature extraction adopted traditional speech emotion recognition method,containing artificial statistical feature extraction and recognition.The deep learning used PCANET network to implement feature extraction process,and its input was speech spectrogram.Then the shallow learning features and deep learning features were input into the SVM model respectively,and the differential voting mechanism was used to achieve decision fusion.The experimental results show that the recognition rate of the proposed method is significantly improved on our own library and the Berlin database,having a clear advantage over the representative methods.
作者
赵小蕾
许喜斌
Zhao Xiaolei;Xu Xibin(School of Information Science,Xinhua College of Sun Yat-sen University,Guangzhou 510520,Guangdong,China;Guangdong Engineering Polytechnic,Guangzhou 510520,Guangdong,China)
出处
《计算机应用与软件》
北大核心
2020年第12期108-112,176,共6页
Computer Applications and Software
基金
国家自然科学基金项目(61672546,61573385)
广东省新工科研究与实践项目(2017CXQX001)
广州市科技计划项目(201804010265,201707010127)
中山大学新华学院教师科研基金一般项目(2018YB013)
中山大学新华学院软件工程专业综合改革试点项目(2018Z001)。
关键词
语音情感识别
决策融合
语谱图
浅层学习
深度学习
Speech emotion recognition
Decision fusion
Spectrogram
Shallow learning
Deep learning