期刊文献+

基于多尺度特征融合的语音频带扩展

Speech bandwidth extension based on multi-scale feature fusion
原文传递
导出
摘要 针对现有的深度学习模型在语音频带扩展领域数据特征利用不充分、训练周期长以及生成语音质量不高等问题,提出了一种新型的端到端神经网络模型,该模型通过融合不同数据维度特征促使网络模型利用更少的数据特征量,获取更多的低高频映射关系解,从而减少模型的整体训练周期.为了提高长时序数据中关键特征的权重占比,设计了一种残差多头自注意力机制,从而达到数据特征利用率的最大化.此外,提出了一种基于时频域和Mel频谱的混合损失函数对模型进行优化.实验结果表明:该方法重构的宽带语音在主客观的评价中均优于传统方法和近年来的一些基于神经网络的语音频带扩展方法. A new type of end-to-end neural network model was proposed by us,with the aim of addressing the issues of insufficient utilization of data features,long training periods,and low quality of generated speech by existing deep learning models.The integration of features from different data dimensions within the network model was encouraged,thereby reducing the utilization of data features and obtaining more low-and high-frequency mapping relationship solutions,resulting in a shortened overall training cycle of the model.To maximize the utilization of data features and enhance the weight ratio of key features in long time series data,a residual multi-head self-attention mechanism was designed.Additionally,a hybrid loss function based on the time-frequency domain and Mel spectrum was proposed to optimize the model.The wideband speech reconstructed by this method was evaluated subjectively and objectively,and the experimental results indicate that it is superior to traditional methods and some recent neural network-based speech bandwidth extension methods.
作者 许春冬 朱诚 应冬文 董桂官 XU Chundong;ZHU Cheng;YING Dongwen;DONG Guiguan(School of Information Engineering,Jiangxi University of Science and Technology,Ganzhou 341000,Jiangxi China;School of Electronic,Electronical,and Communication Engineering,University of Chinese Academy of Sciences,Beijing 100049,China;China Electronic Technology Standardization Institute,Beijing 100007,China)
出处 《华中科技大学学报(自然科学版)》 EI CAS CSCD 北大核心 2023年第9期132-139,共8页 Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金 国家自然科学基金资助项目(11864016) 江西省科技厅重点研发计划资助项目(20202BBEL53006).
关键词 语音频带扩展 深度学习 自注意力机制 时频感知损失函数 speech bandwidth extension deep learning self-attentional mechanism time-frequency perception loss function
  • 相关文献

参考文献5

二级参考文献38

  • 1李明节,陶洪铸,许洪强,刘金波,张强,张伟.电网调控领域人工智能技术框架与应用展望[J].电网技术,2020,44(2):393-400. 被引量:73
  • 2杜利民,谢凌云,刘斌.HMM非特定人连续语音识别的嵌入式实现[J].电子与信息学报,2005,27(1):60-63. 被引量:6
  • 3吴红卫,吴镇扬,赵力.基于多窗谱的心理声学语音增强[J].声学学报,2007,32(3):275-281. 被引量:12
  • 4姚天任.数字语音处理[M].武汉:华中科技大学出版社.2003.
  • 5Rabiner L, Juang B H. Fundamentals of speech recognition[M]. Washington: Prentice Hall, 1993.
  • 6Huang X, Acero A, Hon H. Spoken language processing: a guide to theory, algorithm and system development[M]. 1st Edition. Washington:Prentiee Hall, 2001.
  • 7Bocchieri E. Vector quantization for the efficient computation of continuous density likelihoods[C] // Proceedings of International Conference on Acoustics, Speech and Signal Processing (JCASSP). Minneapolis: [s.n.], 1993, 2: 692-695.
  • 8Pellom B L, Sarikaya R, Hansen J H L. Fast likelihood computation techniques in nearest-neighbor based search for continuous speech recognition[J]. Signal Preessing Letters, 2001, 8(8): 221-224.
  • 9Lee A, Kawahara T, Shikano K. Gaussian mixture selection using context-independent HMM[C] // Proceedings of International Conference on Acoustics,Speech and Signal Processing (ICASSP). Salt Lake City:[s.n.], 2001, 1: 69-72.
  • 10Fritsch J, Rogina I. The bucket box intersection (BBI) algorithm for fast approximative evaluation of diagonal mixture gaussians[C]// Proceedings of International Conference on Acoustics, Speech and Signal Processing(ICASSP). [s. n. ], 1996: 837-840.

共引文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部