期刊文献+

基于状态异步DBN的语音驱动面部动画合成 被引量:1

Speech Driven Facial Animation Synthesis Based on State Asynchronous DBN
下载PDF
导出
摘要 提出一种基于状态异步动态贝叶斯网络模型(SA-DBN)的语音驱动面部动画合成方法。提取音视频语音数据库中音频的感知线性预测特征和面部图像的主动外观模型(AAM)特征来训练模型参数,对于给定的输入语音,基于极大似然估计原理学习得到对应的最优AAM特征序列,并由此合成面部图像序列和面部动画。对合成面部动画的主观评测结果表明,与听视觉状态同步的DBN模型相比,通过限制听觉语音状态和视觉语音状态间的最大异步程度,SA-DBN可以得到清晰自然并且嘴部运动与输入语音高度一致的面部动画。 An audio visual Dynamic Bayesian Network model with State Asynchrony(SA-DBN) transforming acoustic speech to photo realistic facial animation is proposed. Perceptual Linear Prediction(PLP) features from audio speech, as well as Active Appearance ModeI(AAM) features from face images of an audio visual speech database, are adopted to train the model parameters of the proposed SA-DBN. Based on the SADBN model, an input audio stream is given, the optimal A.AM visual features are learned by the Maximum Likelihood Estimation(MLE) criterion, which are used to construct facial images for the animation. Subjective evaluation is presented to compare the proposed constrained state asynchrony DBN with a state synchronous audio visual DBN model. Experimental results show that with the SA-DBN model, high quality facial animations can be obtained with mouth movements matching the input speech.
出处 《计算机工程》 CAS CSCD 2014年第2期180-183,188,共5页 Computer Engineering
基金 国家自然科学基金资助项目(61273265) 陕西省国际科技合作基金资助重点项目(2011KW-04)
关键词 面部动画合成 状态异步动态贝叶斯网络模型 异步约束 主动外观模型 感知线性预测 极大似然估计 facial animation synthesis Dynamic Bayesian Network model with State Asynchrony(SA-DBN) asynchrony constraint Active Appearance Model(A_AM) Perceptual Linear Prediction(PLP) Maximum Likelihood Estimation(MLE)
  • 相关文献

参考文献2

二级参考文献16

  • 1杨丹宁,郭峰,文成义.由文本至口形的媒体变换技术的研究[J].电子学报,1996,24(1):122-125. 被引量:1
  • 2Cosatto E, Ostermann J, Graf H P, et al. Lifelike talking faces for interactive services [J].Proceedings of the IEEE, 2003, 91(9) : 1406 - 1429.
  • 3TANG Hao, FU Yun, TU Jilin, et al. Humanoid audio-visual avatar with emotive text-to-speech synthesis [J]. IEEE Transactions on Multimedia, 2008, 10(6) : 969 -981.
  • 4WU Zhiyong, ZHANG Shen, CAI Lianhong, et al. Real-time Synthesis of Chinese Visual Speech and Facial Expressions using MPEG-4 FAP Features in a Three-dimensional Avatar [C]//The International Conference on Spoken Language Processing, Pittsburgh, 2006 : 1802-1805.
  • 5Pandzic I S, Forehheimer R. MPEG-4 Facial Animation [M]. New York: Wiley, 2002.
  • 6HUANG Fujie, Cosatto E, Graf H. Triphone based units election for concatenative visual speech synthesis [C]// IEEE international conference on acoustics, speech, and signal processing. NJ: IEEE Press, 2002: 2037- 2040.
  • 7Brand M. Voice puppetry [C]// Proceedings of the SIGGRAPH'99. NY: ACMPress, 1999:21-28.
  • 8XIE Lei, LIU Zhiqiang. Realistic mouth synching for speech-driven talking face using articulatory modeling [J]. IEEE Transactions on Multimedia, 2007, 9(3) : 500 - 510.
  • 9Young S, Evermann G, Kershaw D, et al. The HTK Book [M]. Cambridge University Engineering Department, 2009.
  • 10王理嘉,林焘.语音学教程[M].北京大学出版社,1992.

共引文献16

同被引文献10

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部