摘要
视觉语音参数估计在视觉语音的研究中占有重要的地位.从MPEG4定义的人脸动画参数FAP中选择24个与发音有直接关系的参数来描述视觉语音,将统计学习方法和基于规则的方法结合起来,利用人脸颜色概率分布信息和先验形状及边缘知识跟踪嘴唇轮廓线和人脸特征点,取得了较为精确的跟踪效果.在滤除参考点跟踪中的高频噪声后,利用人脸上最为突出的4个参考点估计出主要的人脸运动姿态,从而消除了全局运动的影响,最后根据这些人脸特征点的运动计算出准确的视觉语音参数,并得到了实际应用.
Visual speech parameter estimation has an important role in the study of visual speech. In this paper, 24 speech correlating parameters are selected from MPEG-4 defined facial animation parameter (FAP) to describe visual speech. Combining the statistic learning method and rule based method, precise tracking results are obtained for mouth contour and facial feature points based on facial color probability distribution and priori knowledge on shape and edge. High frequency noise in reference points tracking is eliminated by low-pass filter, and main face pose is estimated from the four most evident reference points to remove the overall movements of the face. Finally, precise visual speech parameters are computed from the movement of these facial feature points, and these parameters have already been used in some related applications.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2005年第7期1185-1190,共6页
Journal of Computer Research and Development
基金
高等学校博士学科点专项科研基金项目(20010003049)
北京科技大学校基金项目(20040509190)
高等学校博士学科点专项科研基金项目(20010003049)