摘要
域自适应算法被广泛应用于跨库语音情感识别中;然而,许多域自适应算法在追求减小域差异的同时,丧失了目标域样本的鉴别性,导致其以高密度的形式存在于模型决策边界处,降低了模型的性能。基于此,提出一种基于决策边界优化域自适应(DBODA)的跨库语音情感识别方法。首先利用卷积神经网络进行特征处理,随后将特征送入最大化核范数及均值差异(MNMD)模块,在减小域间差异的同时,最大化目标域情感预测概率矩阵的核范数,从而提升目标域样本的鉴别性并优化决策边界。在以Berlin、eNTERFACE和CASIA语音库为基准库设立的六组跨库实验中,所提方法的平均识别精度领先于其他算法1.68~11.01个百分点,说明所提模型有效降低了决策边界的样本密度,提升了预测的准确性。
Domain adaptation algorithms are widely used for cross-corpus speech emotion recognition.However,many domain adaptation algorithms lose the discrimination of target domain samples while pursuing the minimization of domain discrepancy,resulting in their presence at the decision boundary of the model in a high-density form,which degrades the performance of the model.Based on the above problem,a Decision Boundary Optimized Domain Adaptation(DBODA)method based cross-corpus speech emotion recognition was proposed.Firstly,the features were processed by using convolutional neural networks.Then,the features were fed into the Maximum Nuclear-norm and Mean Discrepancy(MNMD)module to maximize the nuclear norm of the sentiment prediction probability matrix of the target domain while reducing the inter-domain discrepancy,thereby enhancing the discrimination of the target domain samples and optimize the decision boundary.In six sets of cross-corpus experiments set up on the basis of Berlin,eNTERFACE and CASIA speech databases,the average recognition accuracy of the proposed method is 1.68 to 11.01 percentage points ahead of those of the other algorithms,indicating that the proposed model effectively reduces the sample density around the decision boundary and improves the prediction accuracy.
作者
汪洋
傅洪亮
陶华伟
杨静
谢跃
赵力
WANG Yang;FU Hongliang;TAO Huawei;YANG Jing;XIE Yue;ZHAO Li(Key Laboratory of Grain Information Processing and Control,Ministry of Education(Henan University of Technology),Zhengzhou Henan 450001,China;School of Information and Communication Engineering,Nanjing Institute of Technology,Nanjing Jiangsu 211167,China;School of Information Science and Engineering,Southeast University,Nanjing Jiangsu 210096,China)
出处
《计算机应用》
CSCD
北大核心
2023年第2期374-379,共6页
journal of Computer Applications
基金
国家自然科学基金资助项目(62001215)
河南省教育厅自然科学项目(21A120003,22A520004,22A510001)
河南工业大学高层次人才启动项目(2018BS037)。
关键词
跨库语音情感识别
卷积神经网络
决策边界优化
域自适应
特征分布差异
cross-corpus speech emotion recognition
convolutional neural network
decision boundary optimization
domain adaptation
feature distribution discrepancy