期刊文献+

基于GFCC和能量算子倒谱的语种识别 被引量:2

Language identification based on GFCC and energy operator cepstrum
下载PDF
导出
摘要 为了提高低信噪比下语种识别的准确率,引入一种新的特征提取融合方法.在前端加入有声段检测,并基于人耳听觉感知模型提取伽玛通频率倒谱系数(Gammatone Frequency Cepstrum Coefficient,GFCC)特征,通过主成分分析对特征进行压缩、降噪,融合每个有声段的Teager能量算子倒谱参数,通过高斯混合通用背景模型进行语种识别验证.实验结果表明,在信噪比为-5~0 dB时,相对于基于对数梅尔尺度滤波器组能量特征方法,融合特征集方法对5种语言的识别率,分别提升了23.7%~34.0%,其他信噪比等级下识别率也有明显的提升. In order to improve the accuracy of language recognition at low signal-to-noise ratios,a new feature extraction fusion method is introduced,which incorporates voiced segment detection at the front end of feature extraction.Based on the human ear auditory perception model,the Gammatone Frequency Cepstrum Coefficients(GFCC)are extracted as feature parameters,which are compressed and de-noised by principal component analysis and are fused with the Teager energy operator cepstrum parameters for each audible segment.Experiments with the Gaussian mixture general background model for language recognition show that the method based on the fused feature set improves the recognition accuracy of the five languages by 23.7%to 34.0%at signal-to-noise ratios ranging from-5 dB to 0 dB respectively,compared to the method based on the energy features of the log-Mel scale filter bank,and also significantly improves the recognition accuracy at other signal-to-noise levels.
作者 刘晶 邵玉斌 龙华 李一民 LIU Jing;SHAO Yu-bin;LONG Hua;LI Yi-min(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,Yunnan,China)
出处 《云南大学学报(自然科学版)》 CAS CSCD 北大核心 2022年第2期254-261,共8页 Journal of Yunnan University(Natural Sciences Edition)
基金 国家自然科学基金(61761025)。
关键词 语种识别 伽玛通频率倒谱系数 有声无声段检测 Teager能量算子倒谱参数 主成分分析 language identification Gammatone Frequency Cepstrum Coefficients(GFCC) sound and unvoiced segment detection Teager energy operator cepstrum parameter principal component analysis
  • 相关文献

参考文献5

二级参考文献36

  • 1岳倩倩,周萍,景新幸.基于非线性幂函数的听觉特征提取算法研究[J].微电子学与计算机,2015,32(6):163-166. 被引量:5
  • 2Zissman M A. Comparison of four approaches to automatic language identification of telephone speech [J]. IEEE Transactions on Speech and Audio Processing, 1996, 4(1): 31 - 44.
  • 3Li H, Ma B, Lee C H. A vector space modeling approach to spoken language identification [J]. IEEE Transactions on Audio, Speech and Language Processing, 2007, 15(1): 271 - 284.
  • 4Huang X D, Acero A, Hon H W. Spoken Language Processing [M]. Upper Saddle River, NJ: Prentice Hall PTR, 2000.
  • 5Abdulla W H. Auditory based feature vectors for speech recognition systems [J]. Advances in Communications and Software Technologies, 2002: 231- 236.
  • 6Li Q, Soong F, Siohan O. A high-performance auditory feature for robust speeeh recognition [C]//Proe 6th Int Conf on Spoken Language Processing. Beijing: China Military Friendship Publish, 2000, Ⅲ: 51- 54.
  • 7Colombi J M, Anderson T R, Rogers S K. Auditory model representation for speaker recognition [C]//Proc ICASSP. Piscataway, NJ: IEEE Press, 2006, Ⅱ:700-703.
  • 8Glasberg B R, Moore B C. Derivation of auditory filter shapes from notched-noise data [J]. Hearing Research, 1990, 47(1-2): 103-108.
  • 9Slaney M. An efficient implementation of the Patterson-Holdsworth auditory filter bank [R]. Apple Computer Inc, 1993.
  • 10Aertsen A M, Johannesma P I, Hermes D J. Spectro-temporal receptive fields of auditory neurons in the grassfrog [J]. Biological Cybernetics, 1980, 38(4) : 235 - 248.

共引文献42

同被引文献9

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部