摘要
针对短语音条件下,基于全局变异空间提取的身份向量存在估计不足导致性能下说话人识别降的问题,提出了一种基于核典型关联分析的方法融合全局变异空间和时滞神经网络的说话人嵌入向量.首先,分别训练全局变异空间和时滞神经网络模型.然后在注册和测试阶段,同时提取说话人在两者模型中嵌入向量.通过高斯核函数将其映射至高维空间分析其非线性关联关系,从中获得仿射向量,最后将其组合得到最终说话人嵌入向量.实验表明,10秒以下的短语音环境,该方法所提取出的说话人向量相比其余几种说话人嵌入向量在等误差率和最小检测代价上平均下降了16.29%,20.38%,2.78%以及8.03%,7.17%,0.26%.最后,与其他算法进行对比,在等误差率上均有提升.以上实验表明,该文所提出的方法有效提高短语音环境下的说话人识别性能.
Aiming at the short utterance condition,the identity vector extracted based on total variability space has the problem of underestimation,which leads to the performance degradation of Speaker Verification recognition.A Kernel Canonical Association Analysis based method combining total variability space and time delay neural network speaker embedding vector is proposed.First,train total variability space and time delay neural network models separately.Then,in the registration and testing phases,the speaker embedding vectors are extracted in both models.Through Gaussian kernel function,it is mapped to high-dimensional space to analyze its nonlinear relationship,obtain affine vectors from it,and finally combine them to obtain the final speaker embedding vector.Experiments show that in short utterance environments of less than 10 seconds,the speaker vectors extracted by this method have an average drop of 16.29%,20.38%,2.78%,and 8.03%,7.17%,0.26%in EER and minDCF compared to the other speaker embedding vectors.Finally,compared with other algorithms,it has improved on EER.The above experiments show that the method proposed in this paper can effectively improve the speaker recognition performance in short utterance environment.
作者
龙华
瞿于荃
段荧
LONG Hua;QU Yu-quan;DUAN Ying(College of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650000,China;National Key Laboratory of Computer Science of Yunnan Province,Kunming University of Science and Technology,Kunming 650000,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2021年第11期2269-2275,共7页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61761025)资助.
关键词
全局变异空间
时滞神经网络
核典型相关分析
嵌入向量
短语音
total variability space
time delay neural network(TDNN)
kernel canonical correlation analysis(KCCA)
embedding vector
short utterance