Voice portrait technology has explored and established the relationship between speakers’ voices and their facialfeatures, aiming to generate corresponding facial characteristics by providing the voice of an unknown ...Voice portrait technology has explored and established the relationship between speakers’ voices and their facialfeatures, aiming to generate corresponding facial characteristics by providing the voice of an unknown speaker.Due to its powerful advantages in image generation, Generative Adversarial Networks (GANs) have now beenwidely applied across various fields. The existing Voice2Face methods for voice portraits are primarily based onGANs trained on voice-face paired datasets. However, voice portrait models solely constructed on GANs facelimitations in image generation quality and struggle to maintain facial similarity. Additionally, the training processis relatively unstable, thereby affecting the overall generative performance of the model. To overcome the abovechallenges,wepropose a novel deepGenerativeAdversarialNetworkmodel for audio-visual synthesis, namedAVPGAN(Attention-enhanced Voice Portrait Model using Generative Adversarial Network). This model is based ona convolutional attention mechanism and is capable of generating corresponding facial images from the voice ofan unknown speaker. Firstly, to address the issue of training instability, we integrate convolutional neural networkswith deep GANs. In the network architecture, we apply spectral normalization to constrain the variation of thediscriminator, preventing issues such as mode collapse. Secondly, to enhance the model’s ability to extract relevantfeatures between the two modalities, we propose a voice portrait model based on convolutional attention. Thismodel learns the mapping relationship between voice and facial features in a common space from both channeland spatial dimensions independently. Thirdly, to enhance the quality of generated faces, we have incorporated adegradation removal module and utilized pretrained facial GANs as facial priors to repair and enhance the clarityof the generated facial images. Experimental results demonstrate that our AVP-GAN achieved a cosine similarity of0.511, outperforming the performance of our comparison model, and effectively achieved the generation of highqualityfacial images corresponding to a speaker’s voice.展开更多
Ships and other mobile pollution sources emitted massive ultrafine and low-resistivity particles containing black carbon(BC),which were harmful to human health and were difficult to capture by conventional electrostat...Ships and other mobile pollution sources emitted massive ultrafine and low-resistivity particles containing black carbon(BC),which were harmful to human health and were difficult to capture by conventional electrostatic precipitators(ESPs).In this study,nanoscale carbon black was adopted as simulated particles(SP)with similar physicochemical properties for black carbon emitted from ships(SP-BC)to investigate the feasibility of using an ESP with square-grooved collecting plates for the removal of SP-BC at low backpressures.The increased applied voltage significantly improved the total collection of SP-BC whereas may also promote the conversion of relatively larger particle size SP-BC into nano-size below 20nm.The outlet number concentration of SP-BC under 27 kV at 130℃was three times that of the inlet.While the reduction of the flow rate could strengthen the capture of SP-BC below20 nm,and under the combined action of low flow rate and maximum applied voltage,the collection efficiency of 20-100 nm SP-BC could exceed 90%.In addition,the escape and capture characteristics of SP-BC under long-term rapping were revealed.The square-grooved collecting plate could effectively restrain the re-entrainment of collected SP-BC generated by rapping,and the nanoscale SP-BC was trapped in the grooves after rapping.The results could provide insights into the profound removal of massive nanoscale black carbon emissions from mobile sources.展开更多
基金the Double First-Class Innovation Research Projectfor People’s Public Security University of China (No. 2023SYL08).
文摘Voice portrait technology has explored and established the relationship between speakers’ voices and their facialfeatures, aiming to generate corresponding facial characteristics by providing the voice of an unknown speaker.Due to its powerful advantages in image generation, Generative Adversarial Networks (GANs) have now beenwidely applied across various fields. The existing Voice2Face methods for voice portraits are primarily based onGANs trained on voice-face paired datasets. However, voice portrait models solely constructed on GANs facelimitations in image generation quality and struggle to maintain facial similarity. Additionally, the training processis relatively unstable, thereby affecting the overall generative performance of the model. To overcome the abovechallenges,wepropose a novel deepGenerativeAdversarialNetworkmodel for audio-visual synthesis, namedAVPGAN(Attention-enhanced Voice Portrait Model using Generative Adversarial Network). This model is based ona convolutional attention mechanism and is capable of generating corresponding facial images from the voice ofan unknown speaker. Firstly, to address the issue of training instability, we integrate convolutional neural networkswith deep GANs. In the network architecture, we apply spectral normalization to constrain the variation of thediscriminator, preventing issues such as mode collapse. Secondly, to enhance the model’s ability to extract relevantfeatures between the two modalities, we propose a voice portrait model based on convolutional attention. Thismodel learns the mapping relationship between voice and facial features in a common space from both channeland spatial dimensions independently. Thirdly, to enhance the quality of generated faces, we have incorporated adegradation removal module and utilized pretrained facial GANs as facial priors to repair and enhance the clarityof the generated facial images. Experimental results demonstrate that our AVP-GAN achieved a cosine similarity of0.511, outperforming the performance of our comparison model, and effectively achieved the generation of highqualityfacial images corresponding to a speaker’s voice.
基金supported by the National Natural Science Foundation (No.52076191)Key Research&Development Plan of Shandong Province (No.2020CXGC011401)。
文摘Ships and other mobile pollution sources emitted massive ultrafine and low-resistivity particles containing black carbon(BC),which were harmful to human health and were difficult to capture by conventional electrostatic precipitators(ESPs).In this study,nanoscale carbon black was adopted as simulated particles(SP)with similar physicochemical properties for black carbon emitted from ships(SP-BC)to investigate the feasibility of using an ESP with square-grooved collecting plates for the removal of SP-BC at low backpressures.The increased applied voltage significantly improved the total collection of SP-BC whereas may also promote the conversion of relatively larger particle size SP-BC into nano-size below 20nm.The outlet number concentration of SP-BC under 27 kV at 130℃was three times that of the inlet.While the reduction of the flow rate could strengthen the capture of SP-BC below20 nm,and under the combined action of low flow rate and maximum applied voltage,the collection efficiency of 20-100 nm SP-BC could exceed 90%.In addition,the escape and capture characteristics of SP-BC under long-term rapping were revealed.The square-grooved collecting plate could effectively restrain the re-entrainment of collected SP-BC generated by rapping,and the nanoscale SP-BC was trapped in the grooves after rapping.The results could provide insights into the profound removal of massive nanoscale black carbon emissions from mobile sources.