摘要
提出了采用上下文相关的注意力机制及循环神经网络的语音增强方法。该方法在训练阶段联合训练计算注意力评分的多层感知机和增强语音的深度循环网络,在测试阶段计算每一帧语音的注意力向量并与该帧语音拼接输入深度循环网络增强。在不同信噪比的实验中,该方法相比基线模型能更好地提高语音质量和可懂度,-6 dB下相对带噪语音短时客观可懂度(STOI)和语音质量感知评估(PESQ)可分别提高0.16和0.77,同时在未知噪声条件下该方法性能仍最优或接近最优。因此注意力机制可以有效强化模型对上下文信息的利用能力,从而提高模型增强性能。
In order to make full use of context information to enhance speech,a speech enhancement method using context-sensitive attention mechanism and recurrent neural network is proposed.Firstly,in the training phase,a multi-layer perceptron for calculating attention weights and a deep recurrent neural network for enhancing speech are jointly trained,and in the test phase,the attention vector of each frame is calculated and spliced with this frame,then fed the concatenated frame into the deep recurrent network to realize speech enhancement.In the experiments with different signal-to-noise ratios,our method can improve speech quality and intelligibility better than the baseline model.At-6 dB,STOI(Short-Time Objective Intelligibility)and PESQ(Perceptual Evaluation of Speech Quality)can be increased by 0.16 and 0.77 respectively compared with the noisy speech.At the same time,the performance of the method is still optimal or near optimal under the condition of unknown noise.Therefore,the introduction of the attention mechanism can effectively strengthen the ability to use context information of the model,thus improving its enhanced performance.
作者
蓝天
惠国强
李萌
吕忆蓝
刘峤
LAN Tian;HUI Guoqiang;LI Meng;Lü Yilan;LIU Qiao(School of Information and Softuare Engineering.University of Electronic Science and Technology of China,Chengdu 610054;CETC Key Laboratory of Aerospace Information Applications,Shijiazhuang 050081)
出处
《声学学报》
EI
CSCD
北大核心
2020年第6期897-905,共9页
Acta Acustica
基金
国家自然科学基金项目(U19B2028,61772117)
科技委创新特区项目(19-H863-01-ZT-003)
提升政府治理能力大数据应用技术国家工程实验室重点项目(10-2018039)
四川省科技服务业示范项目(2018GFW0150)
中央高校基本科研业务费项目(ZYGX2019J077)资助。