摘要
为了提高基于深度学习的语音频带扩展性能,提出一种结合幅值掩膜的时频神经网络模型.该模型既能利用语音的相位信息,又能通过幅值掩膜来优化预测语音的幅值.模型时域部分设计一种融合注意力机制的长短时记忆神经网络,该网络可以实现并行计算,当预测高频语音时充分利用距离相近的前后语音帧之间的关系,舍弃对远距离语音帧之间关系的学习,从而减少模型的计算量.主客观实验表明该方法在信噪比和可懂度等度量上优于传统方法和基于深度神经网络的语音频带扩展方法.
To improve the performance of speech bandwidth extension based on deep learning,a time-frequency neural network model combined with amplitude mask was proposed.This model could not only exploit the phase information of speech,but also optimize the predicted speech amplitude through amplitude mask.In the time domain part of the model,a long short-term memory neural network integrating attention mechanism was designed.This network could realize parallel computing,and when predicting high-frequency speech,it could make full use of the relationship between the front and back speech frames with similar distance,and discard the learning of the relationship between the distant speech frames,thus reducing the calculation amount of the model.Subjective and objective experiments show that the method is superior to the traditional methods and the deep neural network based speech bandwidth extension methods in terms of signal to noise ratio and intelligibility.
作者
许春冬
谭国武
应冬文
XU Chundong;TAN Guowu;YING Dongwen(School of Information Engineering,Jiangxi University of Science and Technology,Ganzhou 341000,Jiangxi China;School of Electronic,Electrical and Communication Engineering,University of Chinese Academy of Sciences,Beijing 100049,China)
出处
《华中科技大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2024年第6期179-184,共6页
Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金
国家自然科学基金资助项目(11864016,11704164)
江西省科技厅重点研发计划一般项目(20202BBEL53006)。
关键词
语音频带扩展
时频神经网络
长短时记忆神经网络
幅值掩膜
注意力机制
speech bandwidth extension
time frequency neural network
long short-term memory network
amplitude mask
attention mechanism