期刊文献+

基于融合特征MGCC的语种识别方法 被引量:1

Language Identification Method Based on Fusion Feature MGCC
原文传递
导出
摘要 为了解决噪声环境下语种识别准确率低的问题,提出一种将梅尔倒谱系数和伽马频率倒谱系数融合的语种识别方法。首先提取语音的梅尔倒谱系数和伽马频率倒谱系数,并依据语种识别中的贡献度对特征进行筛选;接着将特征映射在由梅尔域-伽马域组成的空间坐标系中,以得到梅尔伽马倒谱系数(MGCC);最后,将特征输入深度神经网络中进行语种识别。实验结果表明,所提方法的识别准确率和速度远高于使用单一声学特征及其他语种特征的方法。在纯净环境下,所提方法的语种识别准确率可以达到99.38%,在-5 dB低信噪比环境下也可达到89%以上。这充分证明了所提方法的有效性和鲁棒性。 To solve the issue of low accuracy of language identification in a noisy environment,a language identification method is proposed by combining Mel-scale frequency cepstral coefficients and Gammatone frequency cepstral coefficients.First,the Mel-scale frequency cepstral coefficients and Gammatone frequency cepstral coefficients of speech are extracted,and the feature dimensions are screened based on the language contribution.Then,the feature is mapped in the spatial coordinate system composed of the Mel domain-Gammatone domain to obtain the Mel Gammatone cepstral coefficients(MGCC).Finally,the fusion feature is input into the deep bottleneck network.The experimental results show that the identification accuracy and speed of the proposed method are much higher than those of the single acoustic feature and other features.The accuracy can reach 99.38%in the clean corpus,and can still reach more than 89%under the-5 dB environment,which fully proves the effectiveness and robustness of the proposed method.
作者 王延凯 龙华 邵玉斌 杜庆治 王瑶 WANG Yankai;LONG Hua;SHAO Yubin;DU Qingzhi;WANG Yao(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)
出处 《北京邮电大学学报》 EI CAS CSCD 北大核心 2023年第2期116-121,共6页 Journal of Beijing University of Posts and Telecommunications
基金 国家自然科学基金项目(61761025)。
关键词 语种识别 融合特征 深度神经网络 低信噪比 鲁棒性 language identification fusion feature deep neural network low signal noise rate robustness
  • 相关文献

参考文献7

二级参考文献45

  • 1王伟,邓辉文.基于MFCC参数和VQ的说话人识别系统[J].仪器仪表学报,2006,27(z3):2253-2255. 被引量:30
  • 2Zissman M A. Comparison of four approaches to automatic language identification of telephone speech [J]. IEEE Transactions on Speech and Audio Processing, 1996, 4(1): 31 - 44.
  • 3Li H, Ma B, Lee C H. A vector space modeling approach to spoken language identification [J]. IEEE Transactions on Audio, Speech and Language Processing, 2007, 15(1): 271 - 284.
  • 4Huang X D, Acero A, Hon H W. Spoken Language Processing [M]. Upper Saddle River, NJ: Prentice Hall PTR, 2000.
  • 5Abdulla W H. Auditory based feature vectors for speech recognition systems [J]. Advances in Communications and Software Technologies, 2002: 231- 236.
  • 6Li Q, Soong F, Siohan O. A high-performance auditory feature for robust speeeh recognition [C]//Proe 6th Int Conf on Spoken Language Processing. Beijing: China Military Friendship Publish, 2000, Ⅲ: 51- 54.
  • 7Colombi J M, Anderson T R, Rogers S K. Auditory model representation for speaker recognition [C]//Proc ICASSP. Piscataway, NJ: IEEE Press, 2006, Ⅱ:700-703.
  • 8Glasberg B R, Moore B C. Derivation of auditory filter shapes from notched-noise data [J]. Hearing Research, 1990, 47(1-2): 103-108.
  • 9Slaney M. An efficient implementation of the Patterson-Holdsworth auditory filter bank [R]. Apple Computer Inc, 1993.
  • 10Aertsen A M, Johannesma P I, Hermes D J. Spectro-temporal receptive fields of auditory neurons in the grassfrog [J]. Biological Cybernetics, 1980, 38(4) : 235 - 248.

共引文献83

同被引文献14

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部