期刊文献+

Robust Feature Extraction for Speaker Recognition Based on Constrained Nonnegative Tensor Factorization

Robust Feature Extraction for Speaker Recognition Based on Constrained Nonnegative Tensor Factorization
原文传递
导出
摘要 How to extract robust feature is an important research topic in machine learning community. In this paper, we investigate robust feature extraction for speech signal based on tensor structure and develop a new method called constrained Nonnegative Tensor Factorization (cNTF). A novel feature extraction framework based on the cortical representation in primary auditory cortex (A1) is proposed for robust speaker recognition. Motivated by the neural firing rates model in A1, the speech signal first is represented as a general higher order tensor, cNTF is used to learn the basis functions from multiple interrelated feature subspaces and find a robust sparse representation for speech signal. Computer simulations are given to evaluate the performance of our method and comparisons with existing speaker recognition methods are also provided. The experimental results demonstrate that the proposed method achieves higher recognition accuracy in noisy environment. How to extract robust feature is an important research topic in machine learning community. In this paper, we investigate robust feature extraction for speech signal based on tensor structure and develop a new method called constrained Nonnegative Tensor Factorization (cNTF). A novel feature extraction framework based on the cortical representation in primary auditory cortex (A1) is proposed for robust speaker recognition. Motivated by the neural firing rates model in A1, the speech signal first is represented as a general higher order tensor, cNTF is used to learn the basis functions from multiple interrelated feature subspaces and find a robust sparse representation for speech signal. Computer simulations are given to evaluate the performance of our method and comparisons with existing speaker recognition methods are also provided. The experimental results demonstrate that the proposed method achieves higher recognition accuracy in noisy environment.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2010年第4期783-792,共10页 计算机科学技术学报(英文版)
基金 supported by the National Natural Science Foundation of China under Grant No.60775007 the National Basic Research 973 Program of China under Grant No.2005CB724301 the Science and Technology Commission of Shanghai Municipality under Grant No.08511501701
关键词 pattern recognition speaker recognition nonnegative tensor factorization feature extraction auditory perception pattern recognition, speaker recognition, nonnegative tensor factorization, feature extraction, auditory perception
  • 相关文献

参考文献40

  • 1Rabiner L R, Juang B. Fundamentals on Speech Recognition. New Jersey: Prentice Hall, 1996.
  • 2Hermansky H. Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 1990, 87(4): 1738-1752.
  • 3Reynolds D A, Rose R C. Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech and Audio Processing, , 1995, 3(1): 72-83.
  • 4Hermansky H, Morgan N. RASTA processing of speech. IEEE Trans. Speech and Audio Processing, 1994, 2(4): 578-589.
  • 5Reynolds D A. Experimental evaluation of features for robust speaker identification. IEEE Trans. Speech and Audio Processing, 1994, 2(4): 639-643.
  • 6Mammone R, Zhang X, Ramachandran R P. Robust speaker recognition: A feature-based approach. IEEE Signal Process. Mag, 1996, 13(5): 58-71.
  • 7Van Vuuren S. Comparison of text-independent speaker recognition methods on telephone speech with acoustic mismatch. In Proc. ICSLP1996, Oct. 3-6, 1996, Vol.3, pp.1788-1791.
  • 8Berouti M, Schwartz R, Makhoul J, Beranek B, Newman I, Cambridge M A. Enhancement of speech corrupted by acoustic noise. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing ( ICASSP 1979), Washington DC, USA, April 2-4, 1979, Vol.4, pp.208-211.
  • 9Wu M Y, Wang D L. A two-stage algorithm for one- microphone reverberant speech enhancement. IEEE Transactions on Speech and Audio Processing, 2006, 14(3): 774-784.
  • 10Hu Y, Loizou P C. A perceptually motivated subspace approach for speech enhancement. In Proc. the Seventh International Conference on Spoken Language Processing, Denver, USA, Sept. 15-20, 2002.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部