期刊文献+

基于高斯混合模型移动因子补偿的说话人识别方法 被引量:2

Gaussian mixture model compensation method using shift factor for speaker recognition
下载PDF
导出
摘要 提出一种模型补偿方法,以克服基于高斯混合模型的文本无关说话人识别系统性能随目标话者训练语料长度减小而下降的问题。该方法首先构造了一个低维的移动空间,每个训练语料较充分说话人模型的自适应过程均可用该空间中的移动因子表示,然后在目标话者训练语料较不充分的条件下,从受训练语料长度影响较小的话者模型分量中学习移动因子,并依据它对受语料长度影响较大的分量进行参数补偿。和基线系统相比,该方法在相同的训练和评测集上,等错误率指标下,获得相对约7%的性能提升。 The performance of GMM-based text-independent speaker recognition systems declines rapidly when the training data is reduced. A model compensation method is proposed to address the problem. Since there is a shift between each target GMM-based model and the UBM (Universal Background Model), a low-dimensional affine space is fined, named shift space, and the shift for each model with sufficient training data is transformed to the shift factor in this space. When the training data of the target speaker is insufficient, firstly, the coordinate of the shift factor is learned from the GMM mixtures of insensitive to the amount of training data, and then it is adopted to compensate other GMM mixtures. Using the proposed method, a relative reduction of 7% in EER (equal error rate) is obtained comparing with the baseline system.
出处 《声学学报》 EI CSCD 北大核心 2011年第6期658-664,共7页 Acta Acustica
基金 973计划项目(2007CB311100) 863计划重点项目(2006AA010103)资助
  • 相关文献

参考文献18

  • 1Kinnunen T, Li H Z. An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 2010; 52(1): 12--40.
  • 2包永强,赵力,邹采荣.采用归一化补偿变换的与文本无关的说话人识别[J].声学学报,2006,31(1):55-60. 被引量:13
  • 3俞一彪,袁冬梅,薛峰.一种适于说话人识别的非线性频率尺度变换[J].声学学报,2008,33(5):450-455. 被引量:12
  • 4Reynolds D A, Rose R C. Robust text-independent speaker identification using gaussian mixture models. IEEE Transactions on Speech and Audio Processing, 1995; 3(1): 72- 83.
  • 5Reynolds D A, Quatieri T F, Dunn R B. Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 2000; 10:19-41.
  • 6Sachin S K, Nicolas S, Martin G, Elizabeth S, Andreas S, Luciana F, Tobias B. The SRI NIST 2008 speaker recognition evaluation system. Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009:4205 4208.
  • 7Campbell W M, Sturim D E, Reynolds D A. Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters, 2006; 13(5): 308--311.
  • 8Bilmes J A. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report 1CSI-TR- 97-021, 1997.
  • 9Kenny P, Boulianne G, Ouellet P, Dumouchel P. Joint factor analysis versus eigenchannels in speaker recognition. IEEE Transactions on Audio, Speech and Language Processing, 2007; 15(4): 1435--1447.
  • 10Kenny P, Boulianne G, Ouellet P, Dumouchel P. Speaker and session variability in GMM-based speaker verification. IEEE Transactions on Audio, Speech and Language Processing, 2007; 15(4): 1448--1460.

二级参考文献42

共引文献36

同被引文献18

  • 1Stylianou Y. Voice transformation: a survey. In: Proc.ICASSP, 2009:3585-3588.
  • 2Stylianou Y, Toda T, Wu C H, Kain A, Rosec O. Introduc- tion to the special section on voice transformation. IEEE Audio, Speech, and Language Processing, 2010; 18(5): 909-911.
  • 3Abe M, Nakamura S, Shikano K, Kuwabara H. Voice con- version through vector quantization. In: Proc. ICASSP, 1988:655-658.
  • 4Desai S, Black W A, Yegnanarayana B, Prahallad K. Spec- tral mapping using artificial neural networks for voice con- version. IEEE Audio, Speech, and Language Processing: 2010; 18(5): 954-964.
  • 5Hui Y, Steve Y. Perceptually weighted linear transforma- tions for voice conversion. In: Proc. Eurospeech, 2003: 2409-2412.
  • 6Stylianou Yet al. Continuous probabilistic transform for voice conversion. IEEE Transactions on Speech and Audio Processing, 1998; 6(2): 131-142.
  • 7Kain A B. High resolution voice transformation. Ph.D. dissertation, Oregon Health and Science University, 2001.
  • 8Qiao Y, Minematsu N. Mixture of probabilistic linear re- gressions: a unified view of GMM-based mapping tech- niques. In: Proc. ICASSP, 2009:3913-3916.
  • 9Toda T, Black W A, Tokuda K. Voice conversion based on maximum likelihood estimation of spectral parameter tra- jectory. IEEE Audio, Speech, and Language Processing, 2007; 15(8): 2222-2235.
  • 10Helander E, Silen H, Virtanen T, Gabbouj M. Voice con- version using dynamic kernel partial least squares regres- sion. IEEE Audio, Speech, and Language Processing, 2012; 20(3): 806-817.

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部