期刊文献+

动态特征联合新掩模优化神经网络语音增强

Combination of dynamic features with a new mask to optimize neural network speech enhancement
下载PDF
导出
摘要 针对神经网络语音增强算法因特征选取不能全面表示语音非线性结构导致语音质量较差的问题,提出一种动态特征联合新掩模优化神经网络语音增强的方法。首先,提取带噪语音的3种特征并进行拼接以得到静态特征,后求一阶、二阶差分导数,捕捉语音的瞬息信号,融合成动态特征,动静结合完成特征内部互补,减少语音失真。其次,为了使增强语音的可懂度和清晰度同时达到最好,提出一种新的自适应掩模,它既能自适应调整语音、噪声的能量比例,又能自适应调节传统掩模和平方根掩模的比例;并用Gammatone通道权重修改每个通道内的掩模值,模仿人类听觉系统,进一步提升语音的可懂度。最后,对不同噪声背景下的多条语音进行实验仿真。结果表明,与已有的文献中不同算法相比,该算法的信噪比、主观语音质量、短时客观可懂度值都较高,验证了该算法的有效性。 Concerning the problem that the Neural Network speech enhancement algorithm cannot fully represent the nonlinear structure of speech due to feature selection,which leads to speech distortion.This paper proposes the combination of dynamic features with a new mask to optimize neural network speech enhancement.First,three features of noisy speech are extracted and spliced to obtain static features.Then,the first and second difference derivatives are obtained to capture the instantaneous signals of speech and fuse them into dynamic features.The combination of dynamic and static features completes internal complementarity of features and reduced speech distortion.Second,in order to enhance the intelligibility and clarity of speech at the same time,an adaptive mask is proposed,which can adjust the energy ratio of speech and noise as well as the ratio of the traditional mask and the square root mask.The Gammatone channel weight is used to modify the mask value in each channel to simulate the human auditory system and further improve the speech intelligibility.Finally,the simulation of multiple voices under different noise backgrounds shows that compared with different literature algorithms,the algorithm has a higher SNR,subjective speech quality and short-term objective intelligibility,which verifies the effectiveness of the algorithm.
作者 梅淑琳 贾海蓉 王晓刚 武奕峰 MEI Shulin;JIA Hairong;WANG Xiaogang;WU Yifeng(College of Information and Computer,Taiyuan University of Technology,Taiyuan 030024,China;Network Optimization Center,China Unicom Shanxi Branch,Taiyuan 030000,China)
出处 《西安电子科技大学学报》 EI CAS CSCD 北大核心 2021年第3期91-98,共8页 Journal of Xidian University
基金 国家自然科学基金(12004275) 山西省留学回国人员科技活动择优资助(20200017) Research Project Supported by Shanxi Scholarship Council of China(2020042)。
关键词 动态特征 自适应掩模 语音增强 神经网络 dynamic characteristics adaptive mask speech enhancement Neural Network
  • 相关文献

参考文献8

二级参考文献89

  • 1岳倩倩,周萍,景新幸.基于非线性幂函数的听觉特征提取算法研究[J].微电子学与计算机,2015,32(6):163-166. 被引量:5
  • 2黄丽娜,苏轼阁,刘莎,韩娜.中文广东话版与普通话版噪声下言语测试材料的开发(英文)[J].中国耳鼻咽喉头颈外科,2005,12(1):55-60. 被引量:36
  • 3高银秋,邓宗元,杨震.数字音频产品中基于人耳听觉感知特性的水印嵌入系统设计[J].南京邮电大学学报(自然科学版),2006,26(5):56-64. 被引量:2
  • 4Kim G, Lu Y, Hu Y, Loizou P C. An algorithm that im- proves speech intelligibility in noise for normal-hearing lis- teners. The Journal of the Acoustical Society of America, 2009, 126(3): 1486-1494.
  • 5Dillon H. Hearing Aids. New York: Thieme, 2001.
  • 6Allen J B. Articulation and intelligibility. Synthesis Lectures on Speech and Audio Processing, 2005, 1(1): 1-124.
  • 7Seltzer M L, Raj B, Stern R M. A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition. Speech Communication, 2004, 43(4): 379-393.
  • 8Weninger F, Erdogan H, Watanabe S, Vincent E, Le Roux J, Hershey J R, Schuller B. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation. Liberec, Czech Republic: Springer International Publishing, 2015.91 -99.
  • 9Weng C, Yu D, Seltzer M L, Droppo J. Deep neural networks for single-channel multi-talker speech recognition. IEEE/ ACM Transactions on Audio, Speech, and Language Pro- cessing, 2015, 23(10): 1670-1679.
  • 10Boll S F. Suppression of acoustic noise in speech using spec- tral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979, 27(2): 113-120.

共引文献93

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部