期刊文献+

一种基于核典型关联分析的短语音说话人嵌入向量算法 被引量:2

Short Utterance Speaker Embedding Vector Algorithm Based on Kernel Canonical Correlation Analysis
下载PDF
导出
摘要 针对短语音条件下,基于全局变异空间提取的身份向量存在估计不足导致性能下说话人识别降的问题,提出了一种基于核典型关联分析的方法融合全局变异空间和时滞神经网络的说话人嵌入向量.首先,分别训练全局变异空间和时滞神经网络模型.然后在注册和测试阶段,同时提取说话人在两者模型中嵌入向量.通过高斯核函数将其映射至高维空间分析其非线性关联关系,从中获得仿射向量,最后将其组合得到最终说话人嵌入向量.实验表明,10秒以下的短语音环境,该方法所提取出的说话人向量相比其余几种说话人嵌入向量在等误差率和最小检测代价上平均下降了16.29%,20.38%,2.78%以及8.03%,7.17%,0.26%.最后,与其他算法进行对比,在等误差率上均有提升.以上实验表明,该文所提出的方法有效提高短语音环境下的说话人识别性能. Aiming at the short utterance condition,the identity vector extracted based on total variability space has the problem of underestimation,which leads to the performance degradation of Speaker Verification recognition.A Kernel Canonical Association Analysis based method combining total variability space and time delay neural network speaker embedding vector is proposed.First,train total variability space and time delay neural network models separately.Then,in the registration and testing phases,the speaker embedding vectors are extracted in both models.Through Gaussian kernel function,it is mapped to high-dimensional space to analyze its nonlinear relationship,obtain affine vectors from it,and finally combine them to obtain the final speaker embedding vector.Experiments show that in short utterance environments of less than 10 seconds,the speaker vectors extracted by this method have an average drop of 16.29%,20.38%,2.78%,and 8.03%,7.17%,0.26%in EER and minDCF compared to the other speaker embedding vectors.Finally,compared with other algorithms,it has improved on EER.The above experiments show that the method proposed in this paper can effectively improve the speaker recognition performance in short utterance environment.
作者 龙华 瞿于荃 段荧 LONG Hua;QU Yu-quan;DUAN Ying(College of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650000,China;National Key Laboratory of Computer Science of Yunnan Province,Kunming University of Science and Technology,Kunming 650000,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2021年第11期2269-2275,共7页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61761025)资助.
关键词 全局变异空间 时滞神经网络 核典型相关分析 嵌入向量 短语音 total variability space time delay neural network(TDNN) kernel canonical correlation analysis(KCCA) embedding vector short utterance
  • 相关文献

参考文献7

二级参考文献28

  • 1王伟,邓辉文.基于MFCC参数和VQ的说话人识别系统[J].仪器仪表学报,2006,27(z3):2253-2255. 被引量:30
  • 2孙权森,曾生根,杨茂龙,王平安,夏德深.基于典型相关分析的组合特征抽取及脸像鉴别[J].计算机研究与发展,2005,42(4):614-621. 被引量:29
  • 3Arun Ross, Anil Jain. Muhimodal biometrics: an overview [C]//Proc of the 12th European Signal Processing Conference. Vienna : EUSIPCO ,2004 : 1221-1224.
  • 4Burge M, Burger W. Ear biometrics in computer vision [ C]//Proc of the 15th International Conference on Pattern Recognition. Barcelona : IEEE ,2000:822-826.
  • 5Hurley J D, Nixon S M, Carter N J. Force field energy functions for image feature extraction [ J]. Image and Vision Computing, 2002,20 : 311 - 317.
  • 6Yuan Li, Mu Zhi-chun, Zhang Yu, et al. Ear recognition using improved non-negative matrix factorization [ C ] // Proc of the 18th International Conference on Pattern Recognition. Hong Kong: IEEE ,2006.
  • 7Iannarelli A. Ear identification:forensic identification series [ M ]. Fremont: Paramount Publishing Company, 1989.
  • 8Faundez Zanuy. Data fusion in biometrics [J]. Aerospace and Electronic Systems Magazine,2005,20( 1 ) :34-38.
  • 9Liu Chengjun, Wechsler Harry. Face recognition using shape and texture [ C ]//Proc of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Fort Collins : IEEE, 1999:23-25.
  • 10Yang Jian, Yang Jing-yu, Zhang David, et al. Feature fusion : parallel strategy vs serial strategy [ J ]. Pattern Recognition ,2003,20 : 1961-1971.

共引文献86

同被引文献5

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部