期刊文献+

基于核函数的IVEC-SVM说话人识别系统研究 被引量:9

Speaker Recognition with Kernel Based IVEC-SVM
下载PDF
导出
摘要 在说话人识别研究中,基于身份认证向量(Identity vector,IVEC)的说话人建模方法可以有效地提取说话人信息,是目前处于国际前沿的建模方法.本文对身份认证向量后接支持向量机(Identity vector followed by support vector machine,IVEC-SVM)的说话人识别系统进行了研究,对比了该系统在十种不同核函数下的识别性能,并与文献中身份认证向量后接余弦距离打分(Identity vector followed by cosine distance scoring,IVEC-CDS)系统进行了比较.在美国国家标准技术局(American National Institute of Standards and Technology,NIST)组织的2010年电话信道—电话信道说话人识别核心评测数据库上的实验结果显示,基于核函数的IVEC-SVM系统性能明显优于IVEC-CDS的系统性能.此外,实验结果表明基于Spline核的IVEC-SVM系统可取得最好的识别性能,与IVEC-CDS系统相比,其等错点(Equal error rate,EER)在分数归一化前后分别降低了10%和3%. In the text-independent speaker recognition re- search area, identity vector (IVEC) based modeling has been recently proved to be the most efficient method of extracting speaker information. This paper explores and compares the performances of ten different kernel functions in identity vecw tor followed by support vector machines (IVEC-SVM) system and identity vector followed by cosine distance scoring (IVEC- CDS). Experiments corpora the speaker recognition evaluation data, telephone-telephone corpus released by American National Institute of Standard and Technology (NIST) in 2010, demon- strate that the kernel function based IVEC-SVM system per- forms better than the IVEC-CDS system. Among all the kernel function based IVEC-SVM systems, the spline kernel function performs the best, and it has relative decreases of 10 % and 3 % in EER compared to the IVEC-CDS system before and after doing score normalization, respectively.
出处 《自动化学报》 EI CSCD 北大核心 2014年第4期780-784,共5页 Acta Automatica Sinica
基金 国家自然科学基金(61005019 61273268 90920302 61370034)资助~~
关键词 身份认证向量后接余弦距离打分 身份认证向量后接支持向量机 Spline核 说话人识别 Identity vector followed by cosine distance scoring(IVEC-CDS), identity vector followed by support vector machine(IVEC-SVM), spline kernel, speaker recognition
  • 相关文献

参考文献14

  • 1Reynolds D A, Quatieri T F, Dunn R B. Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 2000, 10(1-3): 19-41.
  • 2Kinnunen T, Li H Z. An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 2010, 52(1): 12-40.
  • 3栗志意,何亮,张卫强,刘加.基于鉴别性i-vector局部距离保持映射的说话人识别[J].清华大学学报(自然科学版),2012,52(5):598-601. 被引量:11
  • 4Campbell W M, Campbell J P, Reynolds D A, Singer E, Torres-Carrasquillo P A. Support vector machines for speaker and language recognition. Computer Speech and Language, 2006, 20(2-3): 210-229.
  • 5Kenny P, Boulianne G, Ouellet P, Dumouchel P. Speaker and session variability in GMM-based speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(4): 1448-1460.
  • 6Kenny P, Boulianne G, Ouellet P, Dumouchel P. Joint factor analysis versus eigenchannels in speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(4): 1435-1447.
  • 7Dehak N, Kenny P J, Dehak R, Dumouchel P, Ouellet P. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4): 788-798.
  • 8Kenny P, Boulianne G, Dumouchel P. Eigenvoice modeling with sparse training data. IEEE Transactions on Speech and Audio Processing, 2005, 13(3): 345-354.
  • 9Hatch A O, Kajarekar S S, Stolcke A. Within-class covariance normalization for SVM-based speaker recognition. In: Proceedings of the International Conference on Spoken Language. Pittsburgh, PA, 2006. 1471-1474.
  • 10Bishop C M. Pattern Recognition and Machine Learning. Berlin: Springer, 2008.

二级参考文献10

  • 1Kinnunen T,Li H.An overview of text-independent speakerrecognition:From features to supervectors[].Space Communications.2010
  • 2N. Dehak,P. J. Kenny,R. Dehak,P. Dumouchel,P. Ouellet.Front-End Factor Analysis for speaker verification[].IEEE Trans Audio Speech and Languge Processing.2011
  • 3He Xiaofei,Niyogi Partha.Locality preserving projections[].Advances in Neural Information Processing Systems.2004
  • 4Kenny P,Ouellet P,Dehak N,et al.A study ofinter-speaker variability in speaker verification[].IEEE Transactions on AudioSpeech and LanguageProcessing.2008
  • 5Patrick Kenny,Boulianne G,Ouellet P,et al.Speaker andsession variability in GMM-based speaker verification[].IEEE Transactions on AudioSpeech and LanguageProcessing.2007
  • 6.NIST Speaker Recognition Evaluation[]..
  • 7Reynolds D A,Quatieri T F,Dunn R B.Speaker verification using adapted Gaussian mixture models[].Digital Signal Processing.2000
  • 8He X F,Cai D,Yan S C,et al.Neighborhood preserving embedding[].Proceedings of the Tenth IEEE International Conference on Computer Vision.2005
  • 9Ghahramani Z,Hinton GE.The EM algorithm for mixtures of factor analyzers. Technical Report CRG-TR-96-1 . 1996
  • 10何亮,栗志意,蔡猛,刘加.集合分类中的鉴别式局部信息距离保持映射[J].清华大学学报(自然科学版),2011,51(7):1010-1016. 被引量:2

共引文献10

同被引文献72

引证文献9

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部