期刊文献+

结合年龄监督和人脸先验的语音-人脸图像重建

Face reconstruction from voice based on age-supervised learning and face prior information
下载PDF
导出
摘要 针对语音-人脸图像重建方法缺乏来自不同维度的监督约束及未利用人脸先验信息,导致生成图像和真实图像相似度不高的问题,提出结合年龄监督和人脸先验信息的语音-人脸图像重建方法.通过预训练的年龄评估模型为当前数据集扩充年龄数据,弥补来自年龄监督信息的缺乏.通过语音-人脸图像跨模态身份匹配方法,为给定语音检索接近真实人脸的面部图像,将得到的图像作为人脸先验信息使用.该方法通过定义结合交叉熵损失和对抗损失的联合损失函数,从年龄感、低频内容和局部纹理等方面均衡提升重建图像质量.基于数据集Voxceleb 1,通过人脸检索实验的方式进行测试,与当前主流方法进行比较和分析.结果表明,该方法能有效提升生成图像与真实图像的相似度,所生成的图像具有更好的主客观评价结果. Previous voice-face image reconstruction methods lack effective supervised constraints from different dimensions and face prior information,which may lead to a low similarity between reconstructed and real images.Thus,a face reconstruction method based on age-supervised learning and face prior information was proposed.Age related data were provided for the present dataset through a pre-trained age estimation model,which strengthened age supervision.For given voice samples,voice-face cross-modal identity matching was applied to retrieve images similar to real speakers,where the retrieved results were considered as face prior information.A joint loss function that consists of the cross entropy loss and the adversarial loss was defined to improve age coincidence,lowfrequency content and high-frequency textures of the reconstructed images.Results of face retrieval experiments conducted with dataset Voxceleb 1 showed that the proposed method can improve the similarity between generated and ground truth images.The images generated by the proposed method have better subjective and objective evaluation results than that of the compared methods.
作者 何立 庞善民 HE Li;PANG Shan-min(School of Software Engineering,Xi’an Jiaotong University,Xi’an 710049,China)
出处 《浙江大学学报(工学版)》 EI CAS CSCD 北大核心 2022年第5期1006-1016,共11页 Journal of Zhejiang University:Engineering Science
基金 国家自然科学基金资助项目(61972312) 陕西省重点研发计划一般工业资助项目(2020GY-002)。
关键词 深度学习 图像重建 卷积神经网络 生成对抗网络 人脸先验信息 deep learning image reconstruction convolutional neural network generative adversarial network face prior information
  • 相关文献

参考文献6

二级参考文献53

  • 1王立媛,刘玉萍,肖青,祁金刚.胎儿心率信号的替代数据分析[J].长春理工大学学报(自然科学版),2007,30(1):72-75. 被引量:2
  • 2KRZESIMOWSKI D, CIOTA Z. Voice signal process- ing for patients with stroke hospitalization [C]ff Mixed Design of Integrated Circuits and Systems, 2009. MIX- DES ' 09. MIXDES-16th International Conference. Poland: IEEE, 2009: 693- 696.
  • 3SHIOMI K. Voice processing technique for human cere- bral activity measurement [C]// IEEE International Con- ference on Systems, Man and Cybernetics. Singapore: IEEE, 2008:3343-3347.
  • 4SHIOMI K, HIROSE S. Fatigue and drowsiness predic- tor for pilots and air traffic controllers [C]// Proceed- ings of 45th Annual ATCA Conference. Atlantic City: Air Traffic Control Association, 2000 : 95 - 98.
  • 5DIBAZAR A A, PARK H O,BERGER T W. Nonlinear dynamic modeling of impaired voice [C] //32nd Annual International Conference of the IEEE EMBS. Buenos: IEEE, 2010.
  • 6OROZCO J R, VARGAS J F, ALONSO J B. Voice pathology detection in continuous speech using nonlin- ear dynamics [C]/2012 llth International Conference on Information Science, Signal Processing and their Applications ( ISSPA ). Montreal: IEEE, 2012:1030 -1033.
  • 7FLORIS T. Detecting strange attractors in turbulence [M]. New York: Springer, 1981: 366-381.
  • 8CAO L. Practical method for determining the minimum embedding dimension of a scalar time series [J]. Pbysi- ca D:Nonlinear Phenomena, 1997, 1(10) : 43 - 50.
  • 9FRASER A M, SWINNEY H L. Independent coordi- nates for strange attractors from mutual information [J]. American Physical Society, 1986, 33 ( 2 ) : 1134 - 1140.
  • 10GRASSBERGER P, PROCACCIA I. Measuring the strangeness of strange attractors [J]. Physica D: Non- linear Phenomena, 1983,9 (1/2):189 - 208.

共引文献29

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部