期刊文献+

基于幅值掩膜时频神经网络的语音频带扩展

Time frequency neural network based on amplitude mask for speech bandwidth extension
原文传递
导出
摘要 为了提高基于深度学习的语音频带扩展性能,提出一种结合幅值掩膜的时频神经网络模型.该模型既能利用语音的相位信息,又能通过幅值掩膜来优化预测语音的幅值.模型时域部分设计一种融合注意力机制的长短时记忆神经网络,该网络可以实现并行计算,当预测高频语音时充分利用距离相近的前后语音帧之间的关系,舍弃对远距离语音帧之间关系的学习,从而减少模型的计算量.主客观实验表明该方法在信噪比和可懂度等度量上优于传统方法和基于深度神经网络的语音频带扩展方法. To improve the performance of speech bandwidth extension based on deep learning,a time-frequency neural network model combined with amplitude mask was proposed.This model could not only exploit the phase information of speech,but also optimize the predicted speech amplitude through amplitude mask.In the time domain part of the model,a long short-term memory neural network integrating attention mechanism was designed.This network could realize parallel computing,and when predicting high-frequency speech,it could make full use of the relationship between the front and back speech frames with similar distance,and discard the learning of the relationship between the distant speech frames,thus reducing the calculation amount of the model.Subjective and objective experiments show that the method is superior to the traditional methods and the deep neural network based speech bandwidth extension methods in terms of signal to noise ratio and intelligibility.
作者 许春冬 谭国武 应冬文 XU Chundong;TAN Guowu;YING Dongwen(School of Information Engineering,Jiangxi University of Science and Technology,Ganzhou 341000,Jiangxi China;School of Electronic,Electrical and Communication Engineering,University of Chinese Academy of Sciences,Beijing 100049,China)
出处 《华中科技大学学报(自然科学版)》 EI CAS CSCD 北大核心 2024年第6期179-184,共6页 Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金 国家自然科学基金资助项目(11864016,11704164) 江西省科技厅重点研发计划一般项目(20202BBEL53006)。
关键词 语音频带扩展 时频神经网络 长短时记忆神经网络 幅值掩膜 注意力机制 speech bandwidth extension time frequency neural network long short-term memory network amplitude mask attention mechanism
  • 相关文献

参考文献2

二级参考文献26

  • 1郎玥,赵胜辉,匡镜明.基于矢量量化的语音信号频带扩展[J].北京理工大学学报,2005,25(3):260-264. 被引量:4
  • 2LARSEN E, AARTS R M. Audio Bandwidth Extension-Application of Psychoacoustics, Signal Processing and Loudspeaker Design[M]. UK: John Wiley & Sons Ltd, 2004.
  • 3VARY P, MARTIN R. Digital Speech Transmission-Enhancement, Coding and Error Conceahnent[M]. UK: John Wiley & Sons Ltd, 2006.
  • 4MARTIN R, HEUTE U, ANTWEILER C. Advances in Digital Speech Transmission[M]. UK: John Wiley & Sons Ltd, 2008.
  • 5SHAY T, BAO C C, JIA M S. high frequency reconstruction of audio signal based on chaotic prediction theory[A]. ICASSP2010[C]. Dallas, USA, 2010. 381-384.
  • 6LIU X, BAO C C, ZHANG L 5(. Nonlinear bandwidth extension of audio signals based on hidden markov model[A]. ISSP1T2011[C]. Bilbao, Spain, 2011. 144-149.
  • 7LIU H J, BAO C C, LIU X. Audio bandwidth extension based on RBF neural network[A]. ISSPIT2011 [C]. Bilbao, Spain, 20 l 1. 150-154.
  • 8ECKMANN J P, KAMPHORST S O, RUELLE D. Recurrence plots of dynamical systems[J]. Europhys Lett, 1987, 4(9):973-977.
  • 9ZHANG L Y, BAO C C, LIU X. Audio classification algorithm based on nonlinear characteristics analysis[A]. APSIPA ASC 2011 [C]. Xi'an, China, 2011.
  • 10ZBILUT J P, WEBBER C L. Embeddings and delays as derived from quantification of recurrence plots[J]. Phys Lett A, 1992, 171(3/4): 199-203.

共引文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部