摘要
提出一种基于状态异步动态贝叶斯网络模型(SA-DBN)的语音驱动面部动画合成方法。提取音视频语音数据库中音频的感知线性预测特征和面部图像的主动外观模型(AAM)特征来训练模型参数,对于给定的输入语音,基于极大似然估计原理学习得到对应的最优AAM特征序列,并由此合成面部图像序列和面部动画。对合成面部动画的主观评测结果表明,与听视觉状态同步的DBN模型相比,通过限制听觉语音状态和视觉语音状态间的最大异步程度,SA-DBN可以得到清晰自然并且嘴部运动与输入语音高度一致的面部动画。
An audio visual Dynamic Bayesian Network model with State Asynchrony(SA-DBN) transforming acoustic speech to photo realistic facial animation is proposed. Perceptual Linear Prediction(PLP) features from audio speech, as well as Active Appearance ModeI(AAM) features from face images of an audio visual speech database, are adopted to train the model parameters of the proposed SA-DBN. Based on the SADBN model, an input audio stream is given, the optimal A.AM visual features are learned by the Maximum Likelihood Estimation(MLE) criterion, which are used to construct facial images for the animation. Subjective evaluation is presented to compare the proposed constrained state asynchrony DBN with a state synchronous audio visual DBN model. Experimental results show that with the SA-DBN model, high quality facial animations can be obtained with mouth movements matching the input speech.
出处
《计算机工程》
CAS
CSCD
2014年第2期180-183,188,共5页
Computer Engineering
基金
国家自然科学基金资助项目(61273265)
陕西省国际科技合作基金资助重点项目(2011KW-04)
关键词
面部动画合成
状态异步动态贝叶斯网络模型
异步约束
主动外观模型
感知线性预测
极大似然估计
facial animation synthesis
Dynamic Bayesian Network model with State Asynchrony(SA-DBN)
asynchrony constraint
Active Appearance Model(A_AM)
Perceptual Linear Prediction(PLP)
Maximum Likelihood Estimation(MLE)