期刊文献+

面向真实噪声环境的语种识别 被引量:4

Language Identification in Real Noisy Environments
原文传递
导出
摘要 语种识别受真实噪声环境的影响较大,识别效果不佳.为了解决真实噪声环境下语种识别的问题,提出一种基于对数灰度语谱图的图像处理语种识别方法.根据噪声能量和语音能量在语谱图上的分布规律对真实噪声中的语音信号进行带通滤波;再结合人耳听觉特性提取对数灰度语谱图;然后提取图像主成分特征作为语种特征,采用残差神经网络模型进行训练测试.实验结果表明,在掠夺者战斗机驾驶舱的环境下,所提方法的平均识别正确率相对于线性灰度语谱图方法提升了27.5%,在其他噪声环境下的平均识别正确率也有提升. Language identification is heavily influenced by the real noise environment,resulting in poor identification results.To solve this problem,an image processing method for language identification is proposed based on the logarithmic gray-scale speech spectrogram.The logarithmic gray-scale speech spectrogram is obtained by combining the human auditory properties and the voice filtered in real noise environments according to the different distribution of noise and speech on the speech spectrogram.Then,the principal component of the spectrogram is extracted as language features and,a residual neural network model is used for training and testing.Experimental results show that the average identification rate of the proposed method is improved by 27.5%in the noisy cockpit of a Blackburn Buccaneer compared to the linear grey-scale speech spectrogram method.In other noisy environments,the average identification rate is also improved.
作者 邵玉斌 刘晶 龙华 李一民 SHAO Yu-bin;LIU Jing;LONG Hua;LI Yi-min(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)
出处 《北京邮电大学学报》 EI CAS CSCD 北大核心 2021年第6期134-140,共7页 Journal of Beijing University of Posts and Telecommunications
基金 国家自然科学基金项目(61761025)。
关键词 语种识别 真实噪声环境 对数灰度语谱图 残差神经网络 图像处理 language identification real noise environment logarithmic gray-scale speech spectrogram residual neural network image processing
  • 相关文献

参考文献2

二级参考文献18

  • 1Zissman M A. Comparison of four approaches to automatic language identification of telephone speech [J]. IEEE Transactions on Speech and Audio Processing, 1996, 4(1): 31 - 44.
  • 2Li H, Ma B, Lee C H. A vector space modeling approach to spoken language identification [J]. IEEE Transactions on Audio, Speech and Language Processing, 2007, 15(1): 271 - 284.
  • 3Huang X D, Acero A, Hon H W. Spoken Language Processing [M]. Upper Saddle River, NJ: Prentice Hall PTR, 2000.
  • 4Abdulla W H. Auditory based feature vectors for speech recognition systems [J]. Advances in Communications and Software Technologies, 2002: 231- 236.
  • 5Li Q, Soong F, Siohan O. A high-performance auditory feature for robust speeeh recognition [C]//Proe 6th Int Conf on Spoken Language Processing. Beijing: China Military Friendship Publish, 2000, Ⅲ: 51- 54.
  • 6Colombi J M, Anderson T R, Rogers S K. Auditory model representation for speaker recognition [C]//Proc ICASSP. Piscataway, NJ: IEEE Press, 2006, Ⅱ:700-703.
  • 7Glasberg B R, Moore B C. Derivation of auditory filter shapes from notched-noise data [J]. Hearing Research, 1990, 47(1-2): 103-108.
  • 8Slaney M. An efficient implementation of the Patterson-Holdsworth auditory filter bank [R]. Apple Computer Inc, 1993.
  • 9Aertsen A M, Johannesma P I, Hermes D J. Spectro-temporal receptive fields of auditory neurons in the grassfrog [J]. Biological Cybernetics, 1980, 38(4) : 235 - 248.
  • 10Hermansky H, Morgan N. RASTA processing of speech [J]. IEEE Transactions on Speech and Audio Processing, 1994, 2(4); 578-589.

共引文献26

同被引文献20

引证文献4

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部