Masking-based and spectrum mapping-based methods are the two main algorithms of speech enhancement with deep neural network(DNN).But the mapping-based methods only utilizes the phase of noisy speech,which limits the u...Masking-based and spectrum mapping-based methods are the two main algorithms of speech enhancement with deep neural network(DNN).But the mapping-based methods only utilizes the phase of noisy speech,which limits the upper bound of speech enhancement performance.Maskingbased methods need to accurately estimate the masking which is still the key problem.Combining the advantages of above two types of methods,this paper proposes the speech enhancement algorithm MM-RDN(maskingmapping residual dense network)based on masking-mapping(MM)and residual dense network(RDN).Using the logarithmic power spectrogram(LPS)of consecutive frames,MM estimates the ideal ratio masking(IRM)matrix of consecutive frames.RDN can make full use of feature maps of all layers.Meanwhile,using the global residual learning to combine the shallow features and deep features,RDN obtains the global dense features from the LPS,thereby improves estimated accuracy of the IRM matrix.Simulations show that the proposed method achieves attractive speech enhancement performance in various acoustic environments.Specifically,in the untrained acoustic test with limited priors,e.g.,unmatched signal-to-noise ratio(SNR)and unmatched noise category,MM-RDN can still outperform the existing convolutional recurrent network(CRN)method in themeasures of perceptual evaluation of speech quality(PESQ)and other evaluation indexes.It indicates that the proposed algorithm is more generalized in untrained conditions.展开更多
Graph filtering,which is founded on the theory of graph signal processing,is proved as a useful tool for image denoising.Most graph filtering methods focus on learning an ideal lowpass filter to remove noise,where cle...Graph filtering,which is founded on the theory of graph signal processing,is proved as a useful tool for image denoising.Most graph filtering methods focus on learning an ideal lowpass filter to remove noise,where clean images are restored from noisy ones by retaining the image components in low graph frequency bands.However,this lowpass filter has limited ability to separate the low-frequency noise from clean images such that it makes the denoising procedure less effective.To address this issue,we propose an adaptive weighted graph filtering(AWGF)method to replace the design of traditional ideal lowpass filter.In detail,we reassess the existing low-rank denoising method with adaptive regularizer learning(ARLLR)from the view of graph filtering.A shrinkage approach subsequently is presented on the graph frequency domain,where the components of noisy image are adaptively decreased in each band by calculating their component significances.As a result,it makes the proposed graph filtering more explainable and suitable for denoising.Meanwhile,we demonstrate a graph filter under the constraint of subspace representation is employed in the ARLLR method.Therefore,ARLLR can be treated as a special form of graph filtering.It not only enriches the theory of graph filtering,but also builds a bridge from the low-rank methods to the graph filtering methods.In the experiments,we perform the AWGF method with a graph filter generated by the classical graph Laplacian matrix.The results show our method can achieve a comparable denoising performance with several state-of-the-art denoising methods.展开更多
Speaker separation in complex acoustic environment is one of challenging tasks in speech separation.In practice,speakers are very often unmoving or moving slowly in normal communication.In this case,the spatial featur...Speaker separation in complex acoustic environment is one of challenging tasks in speech separation.In practice,speakers are very often unmoving or moving slowly in normal communication.In this case,the spatial features among the consecutive speech frames become highly correlated such that it is helpful for speaker separation by providing additional spatial information.To fully exploit this information,we design a separation system on Recurrent Neural Network(RNN)with long short-term memory(LSTM)which effectively learns the temporal dynamics of spatial features.In detail,a LSTM-based speaker separation algorithm is proposed to extract the spatial features in each time-frequency(TF)unit and form the corresponding feature vector.Then,we treat speaker separation as a supervised learning problem,where a modified ideal ratio mask(IRM)is defined as the training function during LSTM learning.Simulations show that the proposed system achieves attractive separation performance in noisy and reverberant environments.Specifically,during the untrained acoustic test with limited priors,e.g.,unmatched signal to noise ratio(SNR)and reverberation,the proposed LSTM based algorithm can still outperforms the existing DNN based method in the measures of PESQ and STOI.It indicates our method is more robust in untrained conditions.展开更多
Graph filtering is an important part of graph signal processing and a useful tool for image denoising.Existing graph filtering methods,such as adaptive weighted graph filtering(AWGF),focus on coefficient shrinkage str...Graph filtering is an important part of graph signal processing and a useful tool for image denoising.Existing graph filtering methods,such as adaptive weighted graph filtering(AWGF),focus on coefficient shrinkage strategies in a graph-frequency domain.However,they seldom consider the image attributes in their graph-filtering procedure.Consequently,the denoising performance of graph filtering is barely comparable with that of other state-of-the-art denoising methods.To fully exploit the image attributes,we propose a guided intra-patch smoothing AWGF(AWGF-GPS)method for single-image denoising.Unlike AWGF,which employs graph topology on patches,AWGF-GPS learns the topology of superpixels by introducing the pixel smoothing attribute of a patch.This operation forces the restored pixels to smoothly evolve in local areas,where both intra-and inter-patch relationships of the image are utilized during patch restoration.Meanwhile,a guided-patch regularizer is incorporated into AWGF-GPS.The guided patch is obtained in advance using a maximum-a-posteriori probability estimator.Because the guided patch is considered as a sketch of a denoised patch,AWGF-GPS can effectively supervise patch restoration during graph filtering to increase the reliability of the denoised patch.Experiments demonstrate that the AWGF-GPS method suitably rebuilds denoising images.It outperforms most state-of-the-art single-image denoising methods and is competitive with certain deep-learning methods.In particular,it has the advantage of managing images with significant noise.展开更多
基金supported by the National Key Research and Development Program of China under Grant 2020YFC2004003 and Grant 2020YFC2004002the National Nature Science Foundation of China(NSFC)under Grant No.61571106.
文摘Masking-based and spectrum mapping-based methods are the two main algorithms of speech enhancement with deep neural network(DNN).But the mapping-based methods only utilizes the phase of noisy speech,which limits the upper bound of speech enhancement performance.Maskingbased methods need to accurately estimate the masking which is still the key problem.Combining the advantages of above two types of methods,this paper proposes the speech enhancement algorithm MM-RDN(maskingmapping residual dense network)based on masking-mapping(MM)and residual dense network(RDN).Using the logarithmic power spectrogram(LPS)of consecutive frames,MM estimates the ideal ratio masking(IRM)matrix of consecutive frames.RDN can make full use of feature maps of all layers.Meanwhile,using the global residual learning to combine the shallow features and deep features,RDN obtains the global dense features from the LPS,thereby improves estimated accuracy of the IRM matrix.Simulations show that the proposed method achieves attractive speech enhancement performance in various acoustic environments.Specifically,in the untrained acoustic test with limited priors,e.g.,unmatched signal-to-noise ratio(SNR)and unmatched noise category,MM-RDN can still outperform the existing convolutional recurrent network(CRN)method in themeasures of perceptual evaluation of speech quality(PESQ)and other evaluation indexes.It indicates that the proposed algorithm is more generalized in untrained conditions.
基金This work is supported by National Natural Science Foundation of China[61673108,41706103]The initials of authors who received these grants are LZ and YZ,respectively.It is also supported by Natural Science Foundation of Jiangsu Province,China[BK20170306]The initials of author who received this grant are YZ.
文摘Graph filtering,which is founded on the theory of graph signal processing,is proved as a useful tool for image denoising.Most graph filtering methods focus on learning an ideal lowpass filter to remove noise,where clean images are restored from noisy ones by retaining the image components in low graph frequency bands.However,this lowpass filter has limited ability to separate the low-frequency noise from clean images such that it makes the denoising procedure less effective.To address this issue,we propose an adaptive weighted graph filtering(AWGF)method to replace the design of traditional ideal lowpass filter.In detail,we reassess the existing low-rank denoising method with adaptive regularizer learning(ARLLR)from the view of graph filtering.A shrinkage approach subsequently is presented on the graph frequency domain,where the components of noisy image are adaptively decreased in each band by calculating their component significances.As a result,it makes the proposed graph filtering more explainable and suitable for denoising.Meanwhile,we demonstrate a graph filter under the constraint of subspace representation is employed in the ARLLR method.Therefore,ARLLR can be treated as a special form of graph filtering.It not only enriches the theory of graph filtering,but also builds a bridge from the low-rank methods to the graph filtering methods.In the experiments,we perform the AWGF method with a graph filter generated by the classical graph Laplacian matrix.The results show our method can achieve a comparable denoising performance with several state-of-the-art denoising methods.
基金This work is supported by the National Nature Science Foundation of China(NSFC)under Grant Nos.61571106,61501169,41706103the Fundamental Research Funds for the Central Universities under Grant No.2242013K30010.
文摘Speaker separation in complex acoustic environment is one of challenging tasks in speech separation.In practice,speakers are very often unmoving or moving slowly in normal communication.In this case,the spatial features among the consecutive speech frames become highly correlated such that it is helpful for speaker separation by providing additional spatial information.To fully exploit this information,we design a separation system on Recurrent Neural Network(RNN)with long short-term memory(LSTM)which effectively learns the temporal dynamics of spatial features.In detail,a LSTM-based speaker separation algorithm is proposed to extract the spatial features in each time-frequency(TF)unit and form the corresponding feature vector.Then,we treat speaker separation as a supervised learning problem,where a modified ideal ratio mask(IRM)is defined as the training function during LSTM learning.Simulations show that the proposed system achieves attractive separation performance in noisy and reverberant environments.Specifically,during the untrained acoustic test with limited priors,e.g.,unmatched signal to noise ratio(SNR)and reverberation,the proposed LSTM based algorithm can still outperforms the existing DNN based method in the measures of PESQ and STOI.It indicates our method is more robust in untrained conditions.
基金This work is supported by Natural Science Foundation of Jiangsu Province,China[BK20170306]National Key R&D Program,China[2017YFC0306100].The initials of authors who received these grants are YZ and JL,respectively.It is also supported by Fundamental Research Funds for Central Universities,China[B200202217]Changzhou Science and Technology Program,China[CJ20200065].The initials of author who received these grants are YT.
文摘Graph filtering is an important part of graph signal processing and a useful tool for image denoising.Existing graph filtering methods,such as adaptive weighted graph filtering(AWGF),focus on coefficient shrinkage strategies in a graph-frequency domain.However,they seldom consider the image attributes in their graph-filtering procedure.Consequently,the denoising performance of graph filtering is barely comparable with that of other state-of-the-art denoising methods.To fully exploit the image attributes,we propose a guided intra-patch smoothing AWGF(AWGF-GPS)method for single-image denoising.Unlike AWGF,which employs graph topology on patches,AWGF-GPS learns the topology of superpixels by introducing the pixel smoothing attribute of a patch.This operation forces the restored pixels to smoothly evolve in local areas,where both intra-and inter-patch relationships of the image are utilized during patch restoration.Meanwhile,a guided-patch regularizer is incorporated into AWGF-GPS.The guided patch is obtained in advance using a maximum-a-posteriori probability estimator.Because the guided patch is considered as a sketch of a denoised patch,AWGF-GPS can effectively supervise patch restoration during graph filtering to increase the reliability of the denoised patch.Experiments demonstrate that the AWGF-GPS method suitably rebuilds denoising images.It outperforms most state-of-the-art single-image denoising methods and is competitive with certain deep-learning methods.In particular,it has the advantage of managing images with significant noise.