Masking-based and spectrum mapping-based methods are the two main algorithms of speech enhancement with deep neural network(DNN).But the mapping-based methods only utilizes the phase of noisy speech,which limits the u...Masking-based and spectrum mapping-based methods are the two main algorithms of speech enhancement with deep neural network(DNN).But the mapping-based methods only utilizes the phase of noisy speech,which limits the upper bound of speech enhancement performance.Maskingbased methods need to accurately estimate the masking which is still the key problem.Combining the advantages of above two types of methods,this paper proposes the speech enhancement algorithm MM-RDN(maskingmapping residual dense network)based on masking-mapping(MM)and residual dense network(RDN).Using the logarithmic power spectrogram(LPS)of consecutive frames,MM estimates the ideal ratio masking(IRM)matrix of consecutive frames.RDN can make full use of feature maps of all layers.Meanwhile,using the global residual learning to combine the shallow features and deep features,RDN obtains the global dense features from the LPS,thereby improves estimated accuracy of the IRM matrix.Simulations show that the proposed method achieves attractive speech enhancement performance in various acoustic environments.Specifically,in the untrained acoustic test with limited priors,e.g.,unmatched signal-to-noise ratio(SNR)and unmatched noise category,MM-RDN can still outperform the existing convolutional recurrent network(CRN)method in themeasures of perceptual evaluation of speech quality(PESQ)and other evaluation indexes.It indicates that the proposed algorithm is more generalized in untrained conditions.展开更多
Generative adversarial networks(GANs)are paid more attention to dealing with the end-to-end speech enhancement in recent years.Various GANbased enhancement methods are presented to improve the quality of reconstructed...Generative adversarial networks(GANs)are paid more attention to dealing with the end-to-end speech enhancement in recent years.Various GANbased enhancement methods are presented to improve the quality of reconstructed speech.However,the performance of these GAN-based methods is worse than those of masking-based methods.To tackle this problem,we propose speech enhancement method with a residual dense generative adversarial network(RDGAN)contributing to map the log-power spectrum(LPS)of degraded speech to the clean one.In detail,a residual dense block(RDB)architecture is designed to better estimate the LPS of clean speech,which can extract rich local features of LPS through densely connected convolution layers.Meanwhile,sequential RDB connections are incorporated on various scales of LPS.It significantly increases the feature learning flexibility and robustness in the time-frequency domain.Simulations show that the proposed method achieves attractive speech enhancement performance in various acoustic environments.Specifically,in the untrained acoustic test with limited priors,e.g.,unmatched signal-to-noise ratio(SNR)and unmatched noise category,RDGAN can still outperform the existing GAN-based methods and masking-based method in the measures of PESQ and other evaluation indexes.It indicates that our method is more generalized in untrained conditions.展开更多
In this study,an underwater image enhancement method based on multi-scale adversarial network was proposed to solve the problem of detail blur and color distortion in underwater images.Firstly,the local features of ea...In this study,an underwater image enhancement method based on multi-scale adversarial network was proposed to solve the problem of detail blur and color distortion in underwater images.Firstly,the local features of each layer were enhanced into the global features by the proposed residual dense block,which ensured that the generated images retain more details.Secondly,a multi-scale structure was adopted to extract multi-scale semantic features of the original images.Finally,the features obtained from the dual channels were fused by an adaptive fusion module to further optimize the features.The discriminant network adopted the structure of the Markov discriminator.In addition,by constructing mean square error,structural similarity,and perceived color loss function,the generated image is consistent with the reference image in structure,color,and content.The experimental results showed that the enhanced underwater image deblurring effect of the proposed algorithm was good and the problem of underwater image color bias was effectively improved.In both subjective and objective evaluation indexes,the experimental results of the proposed algorithm are better than those of the comparison algorithm.展开更多
Many networks are designed to stack a large number of residual blocks,deepen the network and improve network performance through short residual connec-tion,long residual connection,and dense connection.However,without...Many networks are designed to stack a large number of residual blocks,deepen the network and improve network performance through short residual connec-tion,long residual connection,and dense connection.However,without consider-ing different contributions of different depth features to the network,these de-signs have the problem of evaluating the importance of different depth features.To solve this problem,this paper proposes an adaptive densely residual net-work(ADRNet)for the single image super resolution.ADRN realizes the evalua-tion of distributions of different depth features and learns more representative features.An adaptive densely residual block(ADRB)was designed,combining 3 residual blocks(RB)and dense connection was added.It learned the attention score of each dense connection through adaptive dense connections,and the at-tention score reflected the importance of the features of each RB.To further en-hance the performance of ADRB,a multi-direction attention block(MDAB)was introduced to obtain multidirectional context information.Through comparative experiments,it is proved that theproposed ADRNet is superior to the existing methods.Through ablation experiments,it is proved that evaluating features of different depths helps to improve network performance.展开更多
To address the problems of lack of high-frequency information and texture details and unstable training in superresolution generative adversarial net-works,this paper optimizes the generator and discriminator based on...To address the problems of lack of high-frequency information and texture details and unstable training in superresolution generative adversarial net-works,this paper optimizes the generator and discriminator based on the SRGAN model.First,the residual dense block is used as the basic structural unit of the gen-erator to improve the network’s feature extraction capability.Second,enhanced lightweight coordinate attention is incorporated to help the network more precisely concentrate on high-frequency location information,thereby allowing the gener-ator to produce more realistic image reconstruction results.Then,we propose a symmetric and efficient pyramidal segmentation attention discriminator network in which the attention mechanism is capable of derivingfiner-grained multiscale spatial information and creating long-term dependencies between multiscale chan-nel attentions,thus enhancing the discriminative ability of the network.Finally,a Charbonnier loss function and a gradient variance loss function with improved robustness are used to better realize the image’s texture structure and enhance the model’s stability.Thefindings from the experiments reveal that the reconstructed image quality enhances the average peak signal-to-noise ratio(PSNR)by 1.59 dB and the structural similarity index(SSIM)by 0.045 when compared to SRGAN on the three test sets.Compared with the state-of-the-art methods,the reconstructed images have a clearer texture structure,richer high-frequency details,and better visual effects.展开更多
The superresolution(SR)method based on generative adversarial networks(GANs)cannot adequately capture enough diversity from training data,resulting in misalignment between input low resolution(LR)images and output hig...The superresolution(SR)method based on generative adversarial networks(GANs)cannot adequately capture enough diversity from training data,resulting in misalignment between input low resolution(LR)images and output high resolution(HR)images.GAN training has difficulty converging.Based on this,an advanced GAN-based image SR reconstructionmethod is presented.First,the dense connection residual block and attention mechanism are integrated into the GAN generator to improve high-frequency feature extraction.Meanwhile,an added discriminator is added into the GAN discriminant network,which forms a dual discriminator to ensure that the process of training is stable.Second,the more robust Charbonnier loss is used instead of the mean square error(MSE)loss to compare similarities between the obtained image and actual image,and the total variation(TV)loss is employed to smooth the training results.Finally,the experimental results indicate that global structures can be better reconstructed using the method of this paper and texture details of images compared with other SOTA methods.The peak signal-to-noise ratio(PSNR)values by the method of this paper are improved by an average of 2.24 dB,and the structural similarity index measure(SSIM)values are improved by an average of 0.07.展开更多
基金supported by the National Key Research and Development Program of China under Grant 2020YFC2004003 and Grant 2020YFC2004002the National Nature Science Foundation of China(NSFC)under Grant No.61571106.
文摘Masking-based and spectrum mapping-based methods are the two main algorithms of speech enhancement with deep neural network(DNN).But the mapping-based methods only utilizes the phase of noisy speech,which limits the upper bound of speech enhancement performance.Maskingbased methods need to accurately estimate the masking which is still the key problem.Combining the advantages of above two types of methods,this paper proposes the speech enhancement algorithm MM-RDN(maskingmapping residual dense network)based on masking-mapping(MM)and residual dense network(RDN).Using the logarithmic power spectrogram(LPS)of consecutive frames,MM estimates the ideal ratio masking(IRM)matrix of consecutive frames.RDN can make full use of feature maps of all layers.Meanwhile,using the global residual learning to combine the shallow features and deep features,RDN obtains the global dense features from the LPS,thereby improves estimated accuracy of the IRM matrix.Simulations show that the proposed method achieves attractive speech enhancement performance in various acoustic environments.Specifically,in the untrained acoustic test with limited priors,e.g.,unmatched signal-to-noise ratio(SNR)and unmatched noise category,MM-RDN can still outperform the existing convolutional recurrent network(CRN)method in themeasures of perceptual evaluation of speech quality(PESQ)and other evaluation indexes.It indicates that the proposed algorithm is more generalized in untrained conditions.
基金This work is supported by the National Key Research and Development Program of China under Grant 2020YFC2004003 and Grant 2020YFC2004002the National Nature Science Foundation of China(NSFC)under Grant No.61571106。
文摘Generative adversarial networks(GANs)are paid more attention to dealing with the end-to-end speech enhancement in recent years.Various GANbased enhancement methods are presented to improve the quality of reconstructed speech.However,the performance of these GAN-based methods is worse than those of masking-based methods.To tackle this problem,we propose speech enhancement method with a residual dense generative adversarial network(RDGAN)contributing to map the log-power spectrum(LPS)of degraded speech to the clean one.In detail,a residual dense block(RDB)architecture is designed to better estimate the LPS of clean speech,which can extract rich local features of LPS through densely connected convolution layers.Meanwhile,sequential RDB connections are incorporated on various scales of LPS.It significantly increases the feature learning flexibility and robustness in the time-frequency domain.Simulations show that the proposed method achieves attractive speech enhancement performance in various acoustic environments.Specifically,in the untrained acoustic test with limited priors,e.g.,unmatched signal-to-noise ratio(SNR)and unmatched noise category,RDGAN can still outperform the existing GAN-based methods and masking-based method in the measures of PESQ and other evaluation indexes.It indicates that our method is more generalized in untrained conditions.
文摘In this study,an underwater image enhancement method based on multi-scale adversarial network was proposed to solve the problem of detail blur and color distortion in underwater images.Firstly,the local features of each layer were enhanced into the global features by the proposed residual dense block,which ensured that the generated images retain more details.Secondly,a multi-scale structure was adopted to extract multi-scale semantic features of the original images.Finally,the features obtained from the dual channels were fused by an adaptive fusion module to further optimize the features.The discriminant network adopted the structure of the Markov discriminator.In addition,by constructing mean square error,structural similarity,and perceived color loss function,the generated image is consistent with the reference image in structure,color,and content.The experimental results showed that the enhanced underwater image deblurring effect of the proposed algorithm was good and the problem of underwater image color bias was effectively improved.In both subjective and objective evaluation indexes,the experimental results of the proposed algorithm are better than those of the comparison algorithm.
文摘Many networks are designed to stack a large number of residual blocks,deepen the network and improve network performance through short residual connec-tion,long residual connection,and dense connection.However,without consider-ing different contributions of different depth features to the network,these de-signs have the problem of evaluating the importance of different depth features.To solve this problem,this paper proposes an adaptive densely residual net-work(ADRNet)for the single image super resolution.ADRN realizes the evalua-tion of distributions of different depth features and learns more representative features.An adaptive densely residual block(ADRB)was designed,combining 3 residual blocks(RB)and dense connection was added.It learned the attention score of each dense connection through adaptive dense connections,and the at-tention score reflected the importance of the features of each RB.To further en-hance the performance of ADRB,a multi-direction attention block(MDAB)was introduced to obtain multidirectional context information.Through comparative experiments,it is proved that theproposed ADRNet is superior to the existing methods.Through ablation experiments,it is proved that evaluating features of different depths helps to improve network performance.
基金This work was supported in part by the Basic Scientific Research Project of Liaoning Provincial Department of Education under Grant Nos.LJKQZ2021152 and LJ2020JCL007in part by the National Science Foundation of China(NSFC)under Grant No.61602226in part by the PhD Startup Foundation of Liaoning Technical University of China under Grant Nos.18-1021.
文摘To address the problems of lack of high-frequency information and texture details and unstable training in superresolution generative adversarial net-works,this paper optimizes the generator and discriminator based on the SRGAN model.First,the residual dense block is used as the basic structural unit of the gen-erator to improve the network’s feature extraction capability.Second,enhanced lightweight coordinate attention is incorporated to help the network more precisely concentrate on high-frequency location information,thereby allowing the gener-ator to produce more realistic image reconstruction results.Then,we propose a symmetric and efficient pyramidal segmentation attention discriminator network in which the attention mechanism is capable of derivingfiner-grained multiscale spatial information and creating long-term dependencies between multiscale chan-nel attentions,thus enhancing the discriminative ability of the network.Finally,a Charbonnier loss function and a gradient variance loss function with improved robustness are used to better realize the image’s texture structure and enhance the model’s stability.Thefindings from the experiments reveal that the reconstructed image quality enhances the average peak signal-to-noise ratio(PSNR)by 1.59 dB and the structural similarity index(SSIM)by 0.045 when compared to SRGAN on the three test sets.Compared with the state-of-the-art methods,the reconstructed images have a clearer texture structure,richer high-frequency details,and better visual effects.
基金supported in part by the Basic Scientific Research Project of Liaoning Provincial Department of Education under Grant No.LJKQZ2021152in part by the National Science Foundation of China (NSFC)under Grant No.61602226in part by the PhD Startup Foundation of Liaoning Technical University of China under Grant No.18-1021.
文摘The superresolution(SR)method based on generative adversarial networks(GANs)cannot adequately capture enough diversity from training data,resulting in misalignment between input low resolution(LR)images and output high resolution(HR)images.GAN training has difficulty converging.Based on this,an advanced GAN-based image SR reconstructionmethod is presented.First,the dense connection residual block and attention mechanism are integrated into the GAN generator to improve high-frequency feature extraction.Meanwhile,an added discriminator is added into the GAN discriminant network,which forms a dual discriminator to ensure that the process of training is stable.Second,the more robust Charbonnier loss is used instead of the mean square error(MSE)loss to compare similarities between the obtained image and actual image,and the total variation(TV)loss is employed to smooth the training results.Finally,the experimental results indicate that global structures can be better reconstructed using the method of this paper and texture details of images compared with other SOTA methods.The peak signal-to-noise ratio(PSNR)values by the method of this paper are improved by an average of 2.24 dB,and the structural similarity index measure(SSIM)values are improved by an average of 0.07.