Based on the idea of adaptive noise cancellation (ANC), a second order architecture is proposed for speech enhancement. According as the Information Maximization theory, the corresponding gradient descend algorithm is...Based on the idea of adaptive noise cancellation (ANC), a second order architecture is proposed for speech enhancement. According as the Information Maximization theory, the corresponding gradient descend algorithm is proposed. With real speech signals in the simulation, the new algorithm demonstrates its good performance in speech enhancement. The main advantage of the new architecture is that clean speech signals can be got with less distortion.展开更多
参考独立分量分析(independen t com ponen t ana lys is w ith reference,ICA-R)将源信号的先验知识以参考信号的形式引入学习算法中,可以从混合信号中仅抽取期望的源信号.基于ICA-R提出了一种语音增强新方法.通过比较语音信号和多种...参考独立分量分析(independen t com ponen t ana lys is w ith reference,ICA-R)将源信号的先验知识以参考信号的形式引入学习算法中,可以从混合信号中仅抽取期望的源信号.基于ICA-R提出了一种语音增强新方法.通过比较语音信号和多种噪声信号的特点,合理地构造了具有语音信号重要特性的参考信号,进而应用ICA-R从多种加性噪声中抽取了期望增强的语音信号.计算机仿真和性能分析结果均表明了该方法的有效性.展开更多
Nonnegative matrix factorization(NMF)has shown good performances on blind audio source separation(BASS).While the NMF analysis is a non-convex optimization problem when both the basis and encoding matrices need to be ...Nonnegative matrix factorization(NMF)has shown good performances on blind audio source separation(BASS).While the NMF analysis is a non-convex optimization problem when both the basis and encoding matrices need to be estimated simultaneously,the source separation step of the NMF-based BASS with a fixed basis matrix has been considered convex.However,because the basis matrix for the BASS is typically constructed by concatenating the basis matrices trained with individual source signals,the subspace spanned by the basis vectors for one source may overlap with that for other sources.In this paper,we have shown that the resulting encoding vector is not unique when the subspaces spanned by basis vectors for the sources overlap,which implies that the initialization of the encoding vector in the source separation stage is not trivial.Furthermore,we propose a novel method to initialize the encoding vector for the separation step based on the prior model of the encoding vector.Experimental results showed that the proposed method outperformed the uniform random initialization by 1.09 and 2.21dB in the source-to-distortion ratio,and 0.20 and 0.23 in PESQ scores for supervised and semi-supervised cases,respectively.展开更多
喉振传声器以其优良的抗噪声特性已在多种强噪声场景中得到应用,但其产生的语音尚存在着中频成份厚重、高频成份缺失等问题,严重影响了语音的清晰度和可懂度。为改善喉振传声器的语音质量,本文提出了一种基于长短时记忆递归神经网络(Lon...喉振传声器以其优良的抗噪声特性已在多种强噪声场景中得到应用,但其产生的语音尚存在着中频成份厚重、高频成份缺失等问题,严重影响了语音的清晰度和可懂度。为改善喉振传声器的语音质量,本文提出了一种基于长短时记忆递归神经网络(Long short term memory recurrent neuralnetworks,LSTM-RNN)的喉振传声器语音盲增强算法。与基于低维的谱包络特征估计算法不同,该算法首先利用LSTM-RNN对喉振传声器语音与空气传导语音的高维对数幅度谱之间的转换关系进行建模,能有效捕捉上下文信息实现语音幅度谱的重构,然后采用非负矩阵分解(Non-negative matrixfactorization,NMF)对估计出的语音幅度谱进行处理,有效抑制了过平滑问题,进一步提高了语音质量。仿真实验得到的LLR,LSD,PESQ性能指标表明,该算法可有效改善喉振传声器的语音质量。展开更多
文摘Based on the idea of adaptive noise cancellation (ANC), a second order architecture is proposed for speech enhancement. According as the Information Maximization theory, the corresponding gradient descend algorithm is proposed. With real speech signals in the simulation, the new algorithm demonstrates its good performance in speech enhancement. The main advantage of the new architecture is that clean speech signals can be got with less distortion.
文摘参考独立分量分析(independen t com ponen t ana lys is w ith reference,ICA-R)将源信号的先验知识以参考信号的形式引入学习算法中,可以从混合信号中仅抽取期望的源信号.基于ICA-R提出了一种语音增强新方法.通过比较语音信号和多种噪声信号的特点,合理地构造了具有语音信号重要特性的参考信号,进而应用ICA-R从多种加性噪声中抽取了期望增强的语音信号.计算机仿真和性能分析结果均表明了该方法的有效性.
基金supported by the research fund of Signal Intelligence Research Center supervised by the Defense Acquisition Program Administration and Agency for Defense Development of Korea
文摘Nonnegative matrix factorization(NMF)has shown good performances on blind audio source separation(BASS).While the NMF analysis is a non-convex optimization problem when both the basis and encoding matrices need to be estimated simultaneously,the source separation step of the NMF-based BASS with a fixed basis matrix has been considered convex.However,because the basis matrix for the BASS is typically constructed by concatenating the basis matrices trained with individual source signals,the subspace spanned by the basis vectors for one source may overlap with that for other sources.In this paper,we have shown that the resulting encoding vector is not unique when the subspaces spanned by basis vectors for the sources overlap,which implies that the initialization of the encoding vector in the source separation stage is not trivial.Furthermore,we propose a novel method to initialize the encoding vector for the separation step based on the prior model of the encoding vector.Experimental results showed that the proposed method outperformed the uniform random initialization by 1.09 and 2.21dB in the source-to-distortion ratio,and 0.20 and 0.23 in PESQ scores for supervised and semi-supervised cases,respectively.
文摘喉振传声器以其优良的抗噪声特性已在多种强噪声场景中得到应用,但其产生的语音尚存在着中频成份厚重、高频成份缺失等问题,严重影响了语音的清晰度和可懂度。为改善喉振传声器的语音质量,本文提出了一种基于长短时记忆递归神经网络(Long short term memory recurrent neuralnetworks,LSTM-RNN)的喉振传声器语音盲增强算法。与基于低维的谱包络特征估计算法不同,该算法首先利用LSTM-RNN对喉振传声器语音与空气传导语音的高维对数幅度谱之间的转换关系进行建模,能有效捕捉上下文信息实现语音幅度谱的重构,然后采用非负矩阵分解(Non-negative matrixfactorization,NMF)对估计出的语音幅度谱进行处理,有效抑制了过平滑问题,进一步提高了语音质量。仿真实验得到的LLR,LSD,PESQ性能指标表明,该算法可有效改善喉振传声器的语音质量。