期刊文献+

基于听觉特性和发声特性的语种识别

Language identification based on auditory and vocal characteristics
下载PDF
导出
摘要 针对现有的方法在低信噪比环境下语种识别性能不佳,提出了一种耳蜗滤波系数和声道冲激响应频谱参数相互融合的语种识别方法.该方法表征了人的耳蜗听觉特性和发声特性,首先提取模拟人耳听觉特性的耳蜗滤波系数,再融合表征人的发声特性的声道冲激响应频谱参数,最后采用高斯混合通用背景模型对所提方法在语种识别上进行测试.实验结果表明,在4种信噪比环境下,该方法优于其他对比方法;相对于基于深度学习的对数Mel尺度滤波器能量特征,识别正确率提升了16.1%,与其他方法相比有较大程度的提升. Aiming at the poor performance of the existing methods in language identification in the low signal-to-noise ratio environment,a language identification method is proposed,which integrates the cochlear filter coefficients and the spectral parameters of the vocal tract impulse response.This method characterizes human vocalization characteristics and human hearing characteristics.Firstly,the cochlear filter coefficients that simulate the auditory characteristics of the human ear are fused.Then the spectral parameters of the vocal tract impulse response that characterize the characteristics of human vocalization are extracted.Finally,the Gaussian mixture general background model is used to test the proposed method in language identification.The experimental results show that in the four signal-to-noise ratio environments,this method is superior to other comparison methods.Compared with the logarithmic Mel-scale filter energy feature based on deep learning,the identification accuracy is improved by 16.1%,which is also very good compared to other methods.
作者 华英杰 朵琳 刘晶 邵玉斌 HUA Ying-jie;DUO Lin;LIU Jing;SHAO Yu-bin(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,Yunnan,China)
出处 《云南大学学报(自然科学版)》 CAS CSCD 北大核心 2023年第4期807-814,共8页 Journal of Yunnan University(Natural Sciences Edition)
基金 国家自然科学基金(61962032) 云南省科技厅优秀青年项目(202001AW07000).
关键词 语种识别 耳蜗滤波系数 声道冲激响应频谱参数 高斯混合通用背景模型 language identification cochlear filter coefficient vocal tract impulse response spectral parameters Gaussian mixture general background model
  • 相关文献

参考文献4

二级参考文献31

  • 1岳倩倩,周萍,景新幸.基于非线性幂函数的听觉特征提取算法研究[J].微电子学与计算机,2015,32(6):163-166. 被引量:5
  • 2Zissman M A. Comparison of four approaches to automatic language identification of telephone speech [J]. IEEE Transactions on Speech and Audio Processing, 1996, 4(1): 31 - 44.
  • 3Li H, Ma B, Lee C H. A vector space modeling approach to spoken language identification [J]. IEEE Transactions on Audio, Speech and Language Processing, 2007, 15(1): 271 - 284.
  • 4Huang X D, Acero A, Hon H W. Spoken Language Processing [M]. Upper Saddle River, NJ: Prentice Hall PTR, 2000.
  • 5Abdulla W H. Auditory based feature vectors for speech recognition systems [J]. Advances in Communications and Software Technologies, 2002: 231- 236.
  • 6Li Q, Soong F, Siohan O. A high-performance auditory feature for robust speeeh recognition [C]//Proe 6th Int Conf on Spoken Language Processing. Beijing: China Military Friendship Publish, 2000, Ⅲ: 51- 54.
  • 7Colombi J M, Anderson T R, Rogers S K. Auditory model representation for speaker recognition [C]//Proc ICASSP. Piscataway, NJ: IEEE Press, 2006, Ⅱ:700-703.
  • 8Glasberg B R, Moore B C. Derivation of auditory filter shapes from notched-noise data [J]. Hearing Research, 1990, 47(1-2): 103-108.
  • 9Slaney M. An efficient implementation of the Patterson-Holdsworth auditory filter bank [R]. Apple Computer Inc, 1993.
  • 10Aertsen A M, Johannesma P I, Hermes D J. Spectro-temporal receptive fields of auditory neurons in the grassfrog [J]. Biological Cybernetics, 1980, 38(4) : 235 - 248.

共引文献34

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部