摘要
针对语音-人脸图像重建方法缺乏来自不同维度的监督约束及未利用人脸先验信息,导致生成图像和真实图像相似度不高的问题,提出结合年龄监督和人脸先验信息的语音-人脸图像重建方法.通过预训练的年龄评估模型为当前数据集扩充年龄数据,弥补来自年龄监督信息的缺乏.通过语音-人脸图像跨模态身份匹配方法,为给定语音检索接近真实人脸的面部图像,将得到的图像作为人脸先验信息使用.该方法通过定义结合交叉熵损失和对抗损失的联合损失函数,从年龄感、低频内容和局部纹理等方面均衡提升重建图像质量.基于数据集Voxceleb 1,通过人脸检索实验的方式进行测试,与当前主流方法进行比较和分析.结果表明,该方法能有效提升生成图像与真实图像的相似度,所生成的图像具有更好的主客观评价结果.
Previous voice-face image reconstruction methods lack effective supervised constraints from different dimensions and face prior information,which may lead to a low similarity between reconstructed and real images.Thus,a face reconstruction method based on age-supervised learning and face prior information was proposed.Age related data were provided for the present dataset through a pre-trained age estimation model,which strengthened age supervision.For given voice samples,voice-face cross-modal identity matching was applied to retrieve images similar to real speakers,where the retrieved results were considered as face prior information.A joint loss function that consists of the cross entropy loss and the adversarial loss was defined to improve age coincidence,lowfrequency content and high-frequency textures of the reconstructed images.Results of face retrieval experiments conducted with dataset Voxceleb 1 showed that the proposed method can improve the similarity between generated and ground truth images.The images generated by the proposed method have better subjective and objective evaluation results than that of the compared methods.
作者
何立
庞善民
HE Li;PANG Shan-min(School of Software Engineering,Xi’an Jiaotong University,Xi’an 710049,China)
出处
《浙江大学学报(工学版)》
EI
CAS
CSCD
北大核心
2022年第5期1006-1016,共11页
Journal of Zhejiang University:Engineering Science
基金
国家自然科学基金资助项目(61972312)
陕西省重点研发计划一般工业资助项目(2020GY-002)。
关键词
深度学习
图像重建
卷积神经网络
生成对抗网络
人脸先验信息
deep learning
image reconstruction
convolutional neural network
generative adversarial network
face prior information