期刊文献+

基于BLSTM-RNN的语音驱动逼真面部动画合成 被引量:4

Speech-driven video-realistic talking head synthesis using BLSTM-RNN
原文传递
导出
摘要 双向长短时记忆(bidirectional lorg short term memory,BLSTM)是一种特殊的递归神经网络(recurrent neural network,RNN),能够有效地对语音的长时上下文进行建模。该文提出一种基于深度BLSTM的语音驱动面部动画合成方法,利用说话人的音视频双模态信息训练BLSTM-RNN神经网络,采用主动外观模型(active appearance model,AAM)对人脸图像进行建模,将AAM模型参数作为网络输出,研究网络结构和不同语音特征输入对动画合成效果的影响。基于LIPS2008标准评测库的实验结果表明:具有BLSTM层的网络效果明显优于前向网络的,基于BLSTM-前向-BLSTM 256节点(BFB256)的三层模型结构的效果最佳,FBank、基频和能量组合可以进一步提升动画合成效果。 This paper describes a deep bidirectional long short term memory (BLSTM) approach for speech-driven photo-realistic talking head animations. Long short-term memory (LSTM) is a recurrent neural network (RNN) architecture that is designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs. The deep BLSTM-RNN model is applied using a speaker's audio-visual bimodal data. The active appearance model (AAM) is used to model the facial movements with AAM parameters as the prediction targets of the neural network. This paper studies the impacts of different network architectures and acoustic features. Tests on the LIPS2008 audio-visual corpus show that networks with BLSTM layer(s) consistently outperform those having only feed-forward layers. The results show that the best network has a feed-forward layer inserted into two BLSTM layers with 256 nodes (BFB256) in the dataset. The combination of FBank, pitch and energy gives the best performance feature set for the speech-driven talking head animation task.
作者 阳珊 樊博 谢磊 王丽娟 宋謌平 YANG Shan FAN Bo XlE Lei WANG Lijuan SONG Geping(Shaanxi Provincial Key Laboratory of Speech and Image Information Processing, School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China Microsoft Research Asia, Beijing 100080, China)
出处 《清华大学学报(自然科学版)》 EI CAS CSCD 北大核心 2017年第3期250-256,共7页 Journal of Tsinghua University(Science and Technology)
基金 国家自然科学基金资助项目(61571363)
关键词 虚拟说话人 面部动画 双向长短时记忆(BLSTM) 递归神经网络(RNN) 主动外观模型(AAM) talking avatar facial animation bidirectional long shortterm memory (BLSTM) recurrent neural network(RNN) active appearance model (AAM)
  • 相关文献

同被引文献31

引证文献4

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部