Masking-based and spectrum mapping-based methods are the two main algorithms of speech enhancement with deep neural network(DNN).But the mapping-based methods only utilizes the phase of noisy speech,which limits the u...Masking-based and spectrum mapping-based methods are the two main algorithms of speech enhancement with deep neural network(DNN).But the mapping-based methods only utilizes the phase of noisy speech,which limits the upper bound of speech enhancement performance.Maskingbased methods need to accurately estimate the masking which is still the key problem.Combining the advantages of above two types of methods,this paper proposes the speech enhancement algorithm MM-RDN(maskingmapping residual dense network)based on masking-mapping(MM)and residual dense network(RDN).Using the logarithmic power spectrogram(LPS)of consecutive frames,MM estimates the ideal ratio masking(IRM)matrix of consecutive frames.RDN can make full use of feature maps of all layers.Meanwhile,using the global residual learning to combine the shallow features and deep features,RDN obtains the global dense features from the LPS,thereby improves estimated accuracy of the IRM matrix.Simulations show that the proposed method achieves attractive speech enhancement performance in various acoustic environments.Specifically,in the untrained acoustic test with limited priors,e.g.,unmatched signal-to-noise ratio(SNR)and unmatched noise category,MM-RDN can still outperform the existing convolutional recurrent network(CRN)method in themeasures of perceptual evaluation of speech quality(PESQ)and other evaluation indexes.It indicates that the proposed algorithm is more generalized in untrained conditions.展开更多
The main task of magnetic resonance imaging (MRI) automatic brain tumor segmentation is to automaticallysegment the brain tumor edema, peritumoral edema, endoscopic core, enhancing tumor core and nonenhancingtumor cor...The main task of magnetic resonance imaging (MRI) automatic brain tumor segmentation is to automaticallysegment the brain tumor edema, peritumoral edema, endoscopic core, enhancing tumor core and nonenhancingtumor core from 3D MR images. Because the location, size, shape and intensity of brain tumors vary greatly, itis very difficult to segment these brain tumor regions automatically. In this paper, by combining the advantagesof DenseNet and ResNet, we proposed a new 3D U-Net with dense encoder blocks and residual decoder blocks.We used dense blocks in the encoder part and residual blocks in the decoder part. The number of output featuremaps increases with the network layers in contracting path of encoder, which is consistent with the characteristicsof dense blocks. Using dense blocks can decrease the number of network parameters, deepen network layers,strengthen feature propagation, alleviate vanishing-gradient and enlarge receptive fields. The residual blockswere used in the decoder to replace the convolution neural block of original U-Net, which made the networkperformance better. Our proposed approach was trained and validated on the BraTS2019 training and validationdata set. We obtained dice scores of 0.901, 0.815 and 0.766 for whole tumor, tumor core and enhancing tumorcore respectively on the BraTS2019 validation data set. Our method has the better performance than the original3D U-Net. The results of our experiment demonstrate that compared with some state-of-the-art methods, ourapproach is a competitive automatic brain tumor segmentation method.展开更多
Generative adversarial networks(GANs)are paid more attention to dealing with the end-to-end speech enhancement in recent years.Various GANbased enhancement methods are presented to improve the quality of reconstructed...Generative adversarial networks(GANs)are paid more attention to dealing with the end-to-end speech enhancement in recent years.Various GANbased enhancement methods are presented to improve the quality of reconstructed speech.However,the performance of these GAN-based methods is worse than those of masking-based methods.To tackle this problem,we propose speech enhancement method with a residual dense generative adversarial network(RDGAN)contributing to map the log-power spectrum(LPS)of degraded speech to the clean one.In detail,a residual dense block(RDB)architecture is designed to better estimate the LPS of clean speech,which can extract rich local features of LPS through densely connected convolution layers.Meanwhile,sequential RDB connections are incorporated on various scales of LPS.It significantly increases the feature learning flexibility and robustness in the time-frequency domain.Simulations show that the proposed method achieves attractive speech enhancement performance in various acoustic environments.Specifically,in the untrained acoustic test with limited priors,e.g.,unmatched signal-to-noise ratio(SNR)and unmatched noise category,RDGAN can still outperform the existing GAN-based methods and masking-based method in the measures of PESQ and other evaluation indexes.It indicates that our method is more generalized in untrained conditions.展开更多
基金supported by the National Key Research and Development Program of China under Grant 2020YFC2004003 and Grant 2020YFC2004002the National Nature Science Foundation of China(NSFC)under Grant No.61571106.
文摘Masking-based and spectrum mapping-based methods are the two main algorithms of speech enhancement with deep neural network(DNN).But the mapping-based methods only utilizes the phase of noisy speech,which limits the upper bound of speech enhancement performance.Maskingbased methods need to accurately estimate the masking which is still the key problem.Combining the advantages of above two types of methods,this paper proposes the speech enhancement algorithm MM-RDN(maskingmapping residual dense network)based on masking-mapping(MM)and residual dense network(RDN).Using the logarithmic power spectrogram(LPS)of consecutive frames,MM estimates the ideal ratio masking(IRM)matrix of consecutive frames.RDN can make full use of feature maps of all layers.Meanwhile,using the global residual learning to combine the shallow features and deep features,RDN obtains the global dense features from the LPS,thereby improves estimated accuracy of the IRM matrix.Simulations show that the proposed method achieves attractive speech enhancement performance in various acoustic environments.Specifically,in the untrained acoustic test with limited priors,e.g.,unmatched signal-to-noise ratio(SNR)and unmatched noise category,MM-RDN can still outperform the existing convolutional recurrent network(CRN)method in themeasures of perceptual evaluation of speech quality(PESQ)and other evaluation indexes.It indicates that the proposed algorithm is more generalized in untrained conditions.
基金This was supported partially by Sichuan Science and Technology Program under Grants 2019YJ0356,21ZDYF2484,21GJHZ0061Scientific Research Foundation of Education Department of Sichuan Province under Grant 18ZB0117.
文摘The main task of magnetic resonance imaging (MRI) automatic brain tumor segmentation is to automaticallysegment the brain tumor edema, peritumoral edema, endoscopic core, enhancing tumor core and nonenhancingtumor core from 3D MR images. Because the location, size, shape and intensity of brain tumors vary greatly, itis very difficult to segment these brain tumor regions automatically. In this paper, by combining the advantagesof DenseNet and ResNet, we proposed a new 3D U-Net with dense encoder blocks and residual decoder blocks.We used dense blocks in the encoder part and residual blocks in the decoder part. The number of output featuremaps increases with the network layers in contracting path of encoder, which is consistent with the characteristicsof dense blocks. Using dense blocks can decrease the number of network parameters, deepen network layers,strengthen feature propagation, alleviate vanishing-gradient and enlarge receptive fields. The residual blockswere used in the decoder to replace the convolution neural block of original U-Net, which made the networkperformance better. Our proposed approach was trained and validated on the BraTS2019 training and validationdata set. We obtained dice scores of 0.901, 0.815 and 0.766 for whole tumor, tumor core and enhancing tumorcore respectively on the BraTS2019 validation data set. Our method has the better performance than the original3D U-Net. The results of our experiment demonstrate that compared with some state-of-the-art methods, ourapproach is a competitive automatic brain tumor segmentation method.
基金This work is supported by the National Key Research and Development Program of China under Grant 2020YFC2004003 and Grant 2020YFC2004002the National Nature Science Foundation of China(NSFC)under Grant No.61571106。
文摘Generative adversarial networks(GANs)are paid more attention to dealing with the end-to-end speech enhancement in recent years.Various GANbased enhancement methods are presented to improve the quality of reconstructed speech.However,the performance of these GAN-based methods is worse than those of masking-based methods.To tackle this problem,we propose speech enhancement method with a residual dense generative adversarial network(RDGAN)contributing to map the log-power spectrum(LPS)of degraded speech to the clean one.In detail,a residual dense block(RDB)architecture is designed to better estimate the LPS of clean speech,which can extract rich local features of LPS through densely connected convolution layers.Meanwhile,sequential RDB connections are incorporated on various scales of LPS.It significantly increases the feature learning flexibility and robustness in the time-frequency domain.Simulations show that the proposed method achieves attractive speech enhancement performance in various acoustic environments.Specifically,in the untrained acoustic test with limited priors,e.g.,unmatched signal-to-noise ratio(SNR)and unmatched noise category,RDGAN can still outperform the existing GAN-based methods and masking-based method in the measures of PESQ and other evaluation indexes.It indicates that our method is more generalized in untrained conditions.