期刊文献+

采用上下文相关的注意力机制及循环神经网络的语音增强方法 被引量:4

Speech enhancement method using context-sensitive attention mechanism and recurrent neural network
下载PDF
导出
摘要 提出了采用上下文相关的注意力机制及循环神经网络的语音增强方法。该方法在训练阶段联合训练计算注意力评分的多层感知机和增强语音的深度循环网络,在测试阶段计算每一帧语音的注意力向量并与该帧语音拼接输入深度循环网络增强。在不同信噪比的实验中,该方法相比基线模型能更好地提高语音质量和可懂度,-6 dB下相对带噪语音短时客观可懂度(STOI)和语音质量感知评估(PESQ)可分别提高0.16和0.77,同时在未知噪声条件下该方法性能仍最优或接近最优。因此注意力机制可以有效强化模型对上下文信息的利用能力,从而提高模型增强性能。 In order to make full use of context information to enhance speech,a speech enhancement method using context-sensitive attention mechanism and recurrent neural network is proposed.Firstly,in the training phase,a multi-layer perceptron for calculating attention weights and a deep recurrent neural network for enhancing speech are jointly trained,and in the test phase,the attention vector of each frame is calculated and spliced with this frame,then fed the concatenated frame into the deep recurrent network to realize speech enhancement.In the experiments with different signal-to-noise ratios,our method can improve speech quality and intelligibility better than the baseline model.At-6 dB,STOI(Short-Time Objective Intelligibility)and PESQ(Perceptual Evaluation of Speech Quality)can be increased by 0.16 and 0.77 respectively compared with the noisy speech.At the same time,the performance of the method is still optimal or near optimal under the condition of unknown noise.Therefore,the introduction of the attention mechanism can effectively strengthen the ability to use context information of the model,thus improving its enhanced performance.
作者 蓝天 惠国强 李萌 吕忆蓝 刘峤 LAN Tian;HUI Guoqiang;LI Meng;Lü Yilan;LIU Qiao(School of Information and Softuare Engineering.University of Electronic Science and Technology of China,Chengdu 610054;CETC Key Laboratory of Aerospace Information Applications,Shijiazhuang 050081)
出处 《声学学报》 EI CSCD 北大核心 2020年第6期897-905,共9页 Acta Acustica
基金 国家自然科学基金项目(U19B2028,61772117) 科技委创新特区项目(19-H863-01-ZT-003) 提升政府治理能力大数据应用技术国家工程实验室重点项目(10-2018039) 四川省科技服务业示范项目(2018GFW0150) 中央高校基本科研业务费项目(ZYGX2019J077)资助。
  • 相关文献

参考文献5

二级参考文献47

  • 1王晶,傅丰林,张运伟.语音增强算法综述[J].声学与电子工程,2005(1):22-26. 被引量:21
  • 2张家禄 齐士钤 宋美珍 等.汉语声调在言语可懂度中的重要作用.声学学报,1981,7:237-237.
  • 3Benesty J, Makino S, Chen J. Speech enhancement. New York: Springer, 2005.
  • 4Brandstein M, Ward D. (Eds.). Microphone arrays signal processing techniques and applications. New York: Springer, 2001.
  • 5Deller J R, Proakis J G, Hansen J H L. Discrete-time processing of speech signals. New York: Macmillan Publishing Company, 1993.
  • 6Ephraim Y, Malah D. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. on ASSP, 1985; 33(2): 443-445.
  • 7Cappe O. Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Trans. on SAP, 1994; 2(2): 345-349.
  • 8Boll S F. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. on ASSP, 1979; 27(2): 113-120.
  • 9Gustafsson H, Nordholm S E, Claesson I. Spectral Subtraction Using Reduced Delay Convolution and Adaptive Averaging. IEEE Trans. on SAP, 2001; 9(8): 799-807.
  • 10Hu Y, Loizou P C. Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Trans. on ASLP, 2004; 12(1): 59-67.

共引文献70

同被引文献20

引证文献4

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部