期刊文献+

基于语音与人脸参数化表示的跨模态稠密深度网络学习方法 被引量:2

Cross-modal learning based on speech and parameterized facerepresentation using densely deep networks
下载PDF
导出
摘要 为了提高跨模态人脸表示与合成的性能,针对语音与人脸图像2种模态数据,提出一种基于人脸参数化表示与稠密深度网络相结合的面部生成方法。针对输入语音模态,通过对信号进行频谱变换,将一维时域信号转换到二维频率域,可提取频域上稳健的特征描述;针对输出图像模态,利用主动外观模型对不同面部区域独立建模以降低区域间的相关性,并提取紧凑的人脸参数化特征;为了获得有效的跨模态学习性能,提出采用稠密连接的深度卷积神经网络学习语音、图像2种模态的回归预测,并通过预测的人脸参数进行面部重构,所采用的深度网络模型可以加强特征传播与特征复用,有利于增强面部细节的合成。在2组音视频数据集上验证了提出方法的有效性。 To improve the performance of cross-modal face representation and synthesis,a facial synthesis method is proposed based on two modalities of speech and face image using densely deep network and cross-modal learning.First,frequency domain feature description is obtained by performing spectral transformation on the speech modal,which transforms one-dimensional time domain signal into two-dimensional frequency domain.Secondly,the active appearance model is applied to different facial regions to reduce the region correlations of the output image modal;the compact face parameterized features can then be extracted.Finally,in order to obtain effective cross-modal performance,a densely connected deep convolutional neural network is proposed to learn the regression prediction between speech and image modalities,followed by face reconstruction on the predicted parameters.The proposed deep learning model helps enhance feature communication and feature reuse,which is conducive to enhance the synthesis of facial details.Experiments on two audio and video datasets demonstrate the effectiveness of the proposed method.
作者 唐俊 牟海明 冷洁 李清都 刘娜 TANG Jun;MOU Haiming;LENG Jie;LI Qingdu;LIU Na(University of Shanghai for Science and Technology,Institute of Machine Intelligence,Shanghai,200093,P.R.China;Chongqing University of Posts and Telecommunications,Institute of Automation,Chongqing,400065,P.R.China)
出处 《重庆邮电大学学报(自然科学版)》 CSCD 北大核心 2020年第5期867-873,共7页 Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)
关键词 跨模态学习 深度学习 卷积神经网络 参数化表示 语音 图像 cross-modal learning deep leaning convolutional neural network parameters description speech image
  • 相关文献

同被引文献19

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部