摘要
该文实现了一个实时语音驱动的虚拟说话人面部动画方案。随着语音信号的输入,同步生成对应的面部动画。这种实时语音驱动的虚拟说话人在可视电话、虚拟会议、音视频聊天等即时通讯与娱乐媒体领域具有巨大的应用潜力。由于音素是最小的可分发音单元,因此构建音素识别器,对输入语音信号进行实时音素识别。为提高语音与口型的同步效果,改进了音素识别与输出算法。考虑协同发音影响,利用动态视素生成算法,将识别得到的音素转化为对应的面部动画参数序列。最后用参数序列驱动按照MPEG-4面部动画标准参数化的3-D头部模型,实现面部动画的同步生成。主观MOS评测结果表明:本文所实现的实时语音驱动虚拟说话人在的同步性和逼真度上的MOS评分分别达到了3.42和3.50。
This paper presents a real-time speech driven talking avatar.Unlike most talking avatars in which the speech-synchronized facial animation is generated offline,this talking avatar is able to speak with live speech input.This life-like talking avatar has many potential applications in videophones,virtual conferences,audio/video chats and entertainment.Since phonemes are the smallest units of pronunciation,a real-time phoneme recognizer was built.The synchronization between the input live speech and the facial motion used a phoneme recognition and output algorithm.The coarticulation effects are included in a dynamic viseme generation algorithm to coordinate the facial animation parameters(FAPs) from the input phonemes.The MPEG-4 compliant avatar model is driven by the generated FAPs.Tests show that the avatar motion is synchronized and natural with MOS values of 3.42 and 3.5.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2011年第9期1180-1186,共7页
Journal of Tsinghua University(Science and Technology)
基金
国家自然科学基金青年基金资助项目(60802085)
国家自然科学基金面上项目(61175018)
陕西省科技计划青年科技新星项目(2011KJXX29)
陕西省自然科学基础研究计划(2011JM8009)
关键词
可视语音合成
虚拟说话人
面部动画
visual speech synthesis
talking avatar
facial animation