期刊文献+

基于多特征i-vector的短语音说话人识别算法 被引量:6

Short utterance speaker recognition algorithm based on multi-featured i-vector
下载PDF
导出
摘要 当测试语音时长充足时,单一特征的信息量和区分性足够完成说话人识别任务,但是在测试语音很短的情况下,语音信号里缺乏充分的说话人信息,使得说话人识别性能急剧下降。针对短语音条件下的说话人信息不足的问题,提出一种基于多特征i-vector的短语音说话人识别算法。该算法首先提取不同的声学特征向量组合成一个高维特征向量,然后利用主成分分析(PCA)去除高维特征向量的相关性,使特征之间正交化,最后采用线性判别分析(LDA)挑选出最具区分性的特征,并且在一定程度上降低空间维度,从而实现更好的说话人识别性能。结合TIMIT语料库进行实验,同一时长的短语音(2 s)条件下,所提算法比基于i-vector的单一的梅尔频率倒谱系数(MFCC)、线性预测倒谱系数(LPCC)、感知对数面积比系数(PLAR)特征系统在等错误率(EER)上分别有相对72. 16%、69. 47%和73. 62%的下降。不同时长的短语音条件下,所提算法比基于i-vector的单一特征系统在EER和检测代价函数(DCF)上大致都有50%的降低。基于以上两种实验的结果充分表明了所提算法在短语音说话人识别系统中可以充分提取说话人的个性信息,有利地提高说话人识别性能。 When the length of the test speech is sufficient,the information and discrimination of single feature is sufficient to complete the speaker recognition task.However,when the length of the test speech was very short,the performance of speaker recognition is decreased significantly due to the small data size and insufficient discrimination.Aiming at the problem of insufficient speaker information under the short speech condition,a short utterance speaker recognition algorithm based on multi-featured i-vector was proposed.Firstly,different acoustic feature vectors were extracted and combined into a high-dimensional feature vector.Then Principal Component Analysis(PCA)was used to remove the correlation of the feature vectors,so that the features were orthogonalized.Finally,the most discriminating features were picked out by Linear Discriminant Analysis(LDA),which led to reduce the spatial dimension.Therefore,this multi-featured system can achieve a better speaker recognition performance.With the TIMIT corpus under the same short speech(2 s)condition,the experimental results showed that the Equal Error Rate(EER)of the multi-featured system decreased respectively by 72.16%,69.47%and 73.62%compared with the single-featured systems including Mel-Frequency Cepstrum Coefficient(MFCC),Linear Prediction Cepstrum Coefficient(LPCC)and Perceptual Log Area Ratio(PLAR)based on i-vector.For the different lengths of the short speech,the proposed algorithm provided rough 50%improvement on EER and Detection Cost Function(DCF)compared with the single-featured system based on i-vector.Experimental results fully indicate that the multi-featured system can make full use of the speaker s characteristic information in the short utterance speaker recognition,and improves the speaker recognition performance.
作者 孙念 张毅 林海波 黄超 SUN Nian;ZHANG Yi;LIN Haibo;HUANG Chao(School of Advanced Manufacturing Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065,China;School of Automation,University of Posts and Telecommunications,Chongqing 400065,China)
出处 《计算机应用》 CSCD 北大核心 2018年第10期2839-2843,共5页 journal of Computer Applications
基金 重庆市基础科学与前沿技术研究专项重点项目(cstc2015jcyjBX0066)~~
关键词 说话人识别 i-vector 短语音 多特征 主成分分析 线性判别分析 speaker recognition i-vector short utterance multi-feature Principal Component Analysis(PCA) Linear Discriminant Analysis(LDA)
  • 相关文献

参考文献3

二级参考文献21

  • 1王琳琳,张利鹏,徐明星.基于分数规整的发音方式鲁棒的说话人识别[J].清华大学学报(自然科学版),2009(S1):1278-1282. 被引量:1
  • 2杨行峻 迟惠生.数字语音信号处理[M].北京:电子工业出版社,1995..
  • 3Zhen B,Proceedings ICSLP Ⅱ,2000年,933页
  • 4杨行峻,数字语音信号处理,1995年
  • 5M H Yans, D J Kfiesman, N-Ahuja. I)etectins Faces in Images: A Survey[ J]. IEEE Transaction on Pattern Analysis and Machine In- telligence. 2002,24( 1 ) :34-58.
  • 6Vytautas Perlibakas. Measure.,; for PCA-based Face Recognition [ J ]. Pattern Recognition Letters. 2004,25 (6) :711-724.
  • 7Maina C W, Walsh J M. Joint speech enhancement and speaker identification using approximate Bayesian inference [C]// Proceedings of the 44th Annual Conference on Information Sciences and Systems. Princeton, NJ, USA: IEEE, 2010:1-6.
  • 8Tadj C, Gabrea M. Towards robustness in speaker verification: Enhancement and adaptation [C]// Proceedings of the 45th Midwest Symposium on Circuits and Systems. New York, USA: IEEE, 2002: 320-323.
  • 9Chow D, Abdulla W H. Robust speaker identification based on perceptual log area ratio and Gaussian mixture Models [C]// Proceedings of the 2004-ICSLP. Jeju Island, South Korea: IEEE, 2004: 1761-1764.
  • 10Kinnunen T, Saeidi R, Sedlak F, et al. Low-variance multitaper MFCC features: A case study in robust speaker verification [J]. IEEE Transactions on Audio, Speech and Language Processing, 2012, 20(7): 1990-2001.

共引文献81

同被引文献45

引证文献6

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部