基于幅值掩膜时频神经网络的语音频带扩展

Time frequency neural network based on amplitude mask for speech bandwidth extension

导出

摘要为了提高基于深度学习的语音频带扩展性能,提出一种结合幅值掩膜的时频神经网络模型.该模型既能利用语音的相位信息,又能通过幅值掩膜来优化预测语音的幅值.模型时域部分设计一种融合注意力机制的长短时记忆神经网络,该网络可以实现并行计算,当预测高频语音时充分利用距离相近的前后语音帧之间的关系,舍弃对远距离语音帧之间关系的学习,从而减少模型的计算量.主客观实验表明该方法在信噪比和可懂度等度量上优于传统方法和基于深度神经网络的语音频带扩展方法. To improve the performance of speech bandwidth extension based on deep learning,a time-frequency neural network model combined with amplitude mask was proposed.This model could not only exploit the phase information of speech,but also optimize the predicted speech amplitude through amplitude mask.In the time domain part of the model,a long short-term memory neural network integrating attention mechanism was designed.This network could realize parallel computing,and when predicting high-frequency speech,it could make full use of the relationship between the front and back speech frames with similar distance,and discard the learning of the relationship between the distant speech frames,thus reducing the calculation amount of the model.Subjective and objective experiments show that the method is superior to the traditional methods and the deep neural network based speech bandwidth extension methods in terms of signal to noise ratio and intelligibility.

作者许春冬谭国武应冬文 XU Chundong;TAN Guowu;YING Dongwen(School of Information Engineering,Jiangxi University of Science and Technology,Ganzhou 341000,Jiangxi China;School of Electronic,Electrical and Communication Engineering,University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区江西理工大学信息工程学院中国科学院大学电子电气与通信工程学院

出处《华中科技大学学报（自然科学版）》 EI CAS CSCD 北大核心 2024年第6期179-184,共6页 Journal of Huazhong University of Science and Technology(Natural Science Edition)

基金国家自然科学基金资助项目(11864016,11704164) 江西省科技厅重点研发计划一般项目(20202BBEL53006)。

关键词语音频带扩展时频神经网络长短时记忆神经网络幅值掩膜注意力机制 speech bandwidth extension time frequency neural network long short-term memory network amplitude mask attention mechanism

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献2

1张丽燕,鲍长春,刘鑫,张兴涛.基于非线性音频特征分类的频带扩展方法[J].通信学报,2013,34(8):120-130. 被引量：3
2张勇,刘轶.窄带语音带宽扩展算法研究[J].声学学报,2014,39(6):764-773. 被引量：5

二级参考文献26

1郎玥,赵胜辉,匡镜明.基于矢量量化的语音信号频带扩展[J].北京理工大学学报,2005,25(3):260-264. 被引量：4
2LARSEN E, AARTS R M. Audio Bandwidth Extension-Application of Psychoacoustics, Signal Processing and Loudspeaker Design[M]. UK: John Wiley & Sons Ltd, 2004.
3VARY P, MARTIN R. Digital Speech Transmission-Enhancement, Coding and Error Conceahnent[M]. UK: John Wiley & Sons Ltd, 2006.
4MARTIN R, HEUTE U, ANTWEILER C. Advances in Digital Speech Transmission[M]. UK: John Wiley & Sons Ltd, 2008.
5SHAY T, BAO C C, JIA M S. high frequency reconstruction of audio signal based on chaotic prediction theory[A]. ICASSP2010[C]. Dallas, USA, 2010. 381-384.
6LIU X, BAO C C, ZHANG L 5(. Nonlinear bandwidth extension of audio signals based on hidden markov model[A]. ISSP1T2011[C]. Bilbao, Spain, 2011. 144-149.
7LIU H J, BAO C C, LIU X. Audio bandwidth extension based on RBF neural network[A]. ISSPIT2011 [C]. Bilbao, Spain, 20 l 1. 150-154.
8ECKMANN J P, KAMPHORST S O, RUELLE D. Recurrence plots of dynamical systems[J]. Europhys Lett, 1987, 4(9):973-977.
9ZHANG L Y, BAO C C, LIU X. Audio classification algorithm based on nonlinear characteristics analysis[A]. APSIPA ASC 2011 [C]. Xi'an, China, 2011.
10ZBILUT J P, WEBBER C L. Embeddings and delays as derived from quantification of recurrence plots[J]. Phys Lett A, 1992, 171(3/4): 199-203.

共引文献5

1何昕,蒋豪,韩丹.管制指令特征参数提取研究[J].科学技术与工程,2015,35(20):89-94. 被引量：6
2林胜义,肖政宏.基于线性源滤波器的语音频带扩展方法研究[J].自动化与信息工程,2016,37(1):37-42.
3郭雷勇,李宇,林胜义,谭洪舟.用于隐马尔可夫模型语音带宽扩展的激励分段扩展方法[J].计算机应用,2017,37(8):2416-2420. 被引量：5
4秦炎炎,王树才,李赛飞.基于声波信号递归图的鸡蛋裂纹检测[J].华中农业大学学报,2019,38(2):102-108. 被引量：8
5郑昌艳,杨吉斌,张雄伟,孙蒙.在波形网络中融合相位信息的骨导语音增强[J].声学学报,2021,46(2):309-320. 被引量：5

1许春冬,朱诚,应冬文,董桂官.基于多尺度特征融合的语音频带扩展[J].华中科技大学学报（自然科学版）,2023,51(9):132-139.
2曹志国,刘晓萍.建构·解构·重构——“角的度量”教学实录与评析[J].小学数学教育,2023(24):61-63.
3陈文杰,曲建升,黄珂敏.基于超网络的核心技术识别方法[J].图书情报工作,2024,68(9):65-75.
4乔健.基于大数据的自然灾害预测与预警系统研究[J].中文科技期刊数据库（全文版）自然科学,2024(7):0005-0008.
5杨汶静,汪明艳.基于混合模型的开放式创新社区用户生成内容质量预测[J].智能计算机与应用,2024,14(5):179-185.
6袁峰,李晓晖,田卫东,周官群,汪金菊,葛粲,国显正,郑超杰.三维成矿预测关键问题[J].地学前缘,2024,31(4):119-128. 被引量：1
7王帅,王利众,朱丽平,孙媛.基于改进YOLOv5s的苹果病害检测技术研究[J].山西农业大学学报（自然科学版）,2024,44(4):118-129. 被引量：2
8邓斌,王玲,何军,尹龙斌,蒋昌波,陈杰,伍志元.基于SSA-CNN模型的双排开孔圆筒防波堤透射系数预测[J].海洋学报,2024,46(4):122-132.
9孙林,梁娜,徐久成.基于邻域互信息与K-means特征聚类的特征选择[J].智能系统学报,2024,19(4):983-996. 被引量：1

华中科技大学学报（自然科学版）

2024年第6期

浏览历史

内容加载中请稍等...

基于幅值掩膜时频神经网络的语音频带扩展

参考文献2

二级参考文献26

共引文献5

相关作者

相关机构

相关主题

浏览历史